In this article, we'll learn how to use the sklearn's GridSearchCV class to find out the best parameters of AdaBoostRegressor model for Boston housing-price dataset in Python. The tutorial covers:
- Preparing data, base estimator, and parameters
- Fitting the model and getting the best estimator
- Prediction and accuracy check
- Source code listing
from sklearn.datasets import load_boston from sklearn.model_selection import GridSearchCV from sklearn.model_selection import train_test_split from sklearn.ensemble import AdaBoostRegressor from sklearn.metrics import mean_squared_error, make_scorer, r2_score import matplotlib.pyplot as plt
Preparing data, base estimator, and parameters
We use Boston house-price dataset as regression data in this tutorial. After loading the dataset, first, we'll separate it into the x - feature and y - label, then split into the train and test parts. Here, we'll extract 15 percent of the dataset as test data.
boston = load_boston() x, y = boston.data, boston.target xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
As a base estimator, we'll use AdaBoostRegressor.
abreg = AdaBoostRegressor()
The parameters for this estimator need to be provided. We can find out AdaBoostRegressor class's parameter list on this page. We create params object to include target parameters.
params = { 'n_estimators': [50, 100], 'learning_rate' : [0.01, 0.05, 0.1, 0.5], 'loss' : ['linear', 'square', 'exponential'] }
We can also set the scoring parameter into the GridSearchCV model as a following. By default, it checks the R-squared metrics score.
score = make_scorer(mean_squared_error)
Fitting the model and getting the best estimator
Next, we'll define the GridSearchCV model with the above estimator and parameters. For cross-validation fold parameter, we'll set 10 and fit it with all dataset data.
gridsearch=GridSearchCV(abreg, params, cv=5, return_train_score=True) gridsearch.fit(x, y)
GridSearchCV(cv=5, error_score='raise', estimator=AdaBoostRegressor(base_estimator=None, learning_rate=1.0,
loss='linear', n_estimators=50, random_state=None), fit_params=None, iid=True, n_jobs=1, param_grid={'n_estimators': [50, 100],
'learning_rate': [0.01, 0.05, 0.1, 0.5],
'loss': ['linear', 'square', 'exponential']}, pre_dispatch='2*n_jobs', refit=True, return_train_score=True, scoring=None, verbose=0)
If you want to change the scoring method, you can also set the scoring parameter.
gridsearch=GridSearchCV(abreg,params,scoring=score,cv=5,return_train_score=True)
After fitting the model we can get best parameters.
print(gridsearch.best_params_)
{'learning_rate': 0.5, 'loss': 'exponential', 'n_estimators': 50}
print(gridsearch.best_score_)
0.5913769411856192
Now, we can get the best estimator from the gird search result and call it best_estim model for further use.
best_estim=gridsearch.best_estimator_
print(best_estim)
AdaBoostRegressor(base_estimator=None, learning_rate=0.5, loss='exponential',
n_estimators=50, random_state=None)
Prediction and accuracy check
We've extracted the best estimator model and now we can use as a predictive model. We'll fit again with train data and check the accuracy metrics.
best_estim.fit(xtrain,ytrain) ytr_pred=best_estim.predict(xtrain) mse = mean_squared_error(ytr_pred,ytrain) r2 = r2_score(ytr_pred,ytrain) print("MSE: %.2f" % mse)
MSE: 7.54
print("R2: %.2f" % r2)
R2: 0.89
Next, we'll predict test data and check the accuracy metrics.
ypred=best_estim.predict(xtest) mse = mean_squared_error(ytest, ypred) r2 = r2_score(ytest, ypred) print("MSE: %.2f" % mse)
MSE: 11.51
print("R2: %.2f" % r2)
R2: 0.85
Finally, we'll visualize the results in a plot.
x_ax = range(len(ytest)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.legend() plt.show()
In this article, we've briefly learned gird search method with GridSearchCV class and applied it into the regression data in Python. The full source code is listed below. Thank you for reading!
Source code listing
from sklearn.datasets import load_boston from sklearn.model_selection import GridSearchCV from sklearn.model_selection import train_test_split from sklearn.ensemble import AdaBoostRegressor from sklearn.metrics import mean_squared_error, make_scorer, r2_score import matplotlib.pyplot as plt boston = load_boston() x, y = boston.data, boston.target xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15) abreg = AdaBoostRegressor() params = { 'n_estimators': [50, 100], 'learning_rate' : [0.01, 0.05, 0.1, 0.5], 'loss' : ['linear', 'square', 'exponential'] } score = make_scorer(mean_squared_error) gridsearch = GridSearchCV(abreg, params, cv=5, return_train_score=True) gridsearch.fit(xtrain, ytrain) print(gridsearch.best_params_) best_estim=gridsearch.best_estimator_ print(best_estim) best_estim.fit(xtrain,ytrain) ytr_pred=best_estim.predict(xtrain) mse = mean_squared_error(ytr_pred,ytrain) r2 = r2_score(ytr_pred,ytrain) print("MSE: %.2f" % mse) print("R2: %.2f" % r2) ypred=best_estim.predict(xtest) mse = mean_squared_error(ytest, ypred) r2 = r2_score(ytest, ypred) print("MSE: %.2f" % mse) print("R2: %.2f" % r2) x_ax = range(len(ytest)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.legend() plt.show()
No comments:
Post a Comment