How to Implement Grid Search Using GridSearchCV in Python
In machine learning, model performance depends on the choice of hyperparameters which are set before training and guide the learning process. Grid Search is a popular method for finding the best hyperparameter combination.
In this tutorial, we'll learn how to use GridSearchCV to determine the optimal parameters for the AdaBoostRegressor model using the California housing dataset in Python. This tutorial will cover the following steps:
Introduction to Grid Search
Preparing data, base estimator, and parameters
Exracting the best hyperparameters
Source code listing
Let's get started.
Introduction to Grid Search
Grid Search is a method used to exhaustively search for
the best combination of hyperparameter values in a provided grid for a given
estimator (model). It is particularly useful when we want to identify
the optimal hyperparameters for a model based on a specific dataset.
The GridSearchCV class in scikit-learn is used for hyperparameter tuning in machine learning models. It exhaustively searches over a specified grid of hyperparameter values, evaluating model performance for each combination using cross-validation. By dividing the data into training and validation sets multiple times, it identifies the best hyperparameters that optimize model performance on unseen data. GridSearchCV automates and simplifies the process of finding the optimal settings, enhancing model accuracy and robustness.
Preparing data, base estimator, and parameters
We'll start by loading the necessary libraries for this tutorial.
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
In this tutorial we use California Housing dataset as target regression data. After loading the dataset, we extract the features (X) and the labels (y), then split the data into training and testing sets. Here, we’ll reserve 15 percent of the dataset as the test data.
# Load the California housing dataset
california = fetch_california_housing()
X, y = california.data, california.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
We apply standard scaling to the feature data using StandardScaler().
# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
We define a base estimator. In this case, we use the AdaBoostRegressor class from scikit-learn. Candidate hyperparameter (number of estimators, learning rate, and loss function) values for this regressor model are defined below.
# Define the AdaBoost Regressor
abreg = AdaBoostRegressor(random_state=42)
# Define the parameter grid for hyperparameter tuning
params = {
'n_estimators': [50, 100],
'learning_rate': [0.01, 0.1],
'loss': ['linear', 'square']
}
Extracting the best hyperparameters Next, we initialize the GridSearchCV with the estimator and parameter grid defined earlier. We set the cross-validation parameter to 5-fold. To fit the model on the training data, we use the fit() method.
# Initialize GridSearchCV with the model, grid, and cross-validation strategy
No comments:
Post a Comment