In this tutorial, we'll learn how to predict regression data with the Gradient Boosting Regressor (comes in sklearn.ensemble module) class in Python. The post covers:
- Preparing data
- Defining the model
- Predicting test data and visualizing the result
from sklearn.ensemble import GradientBoostingRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt
Preparing data
We use Boston house-price dataset as regression dataset in this tutorial. After loading the dataset, first, we'll separate data into x and y parts.
boston = load_boston() x, y = boston.data, boston.target
Then we'll split it into train and test parts. Here, we'll extract 15 percent of the data as a test.
xtrain, xtest, ytrain, ytest=train_test_split(x, y, random_state=12, test_size=0.15)
Defining the model
We can define the model with its default parameters or set the new parameter values.
# with new parameters gbr = GradientBoostingRegressor(n_estimators=600, max_depth=5, learning_rate=0.01, min_samples_split=3) # with default parameters gbr = GradientBoostingRegressor()
print(gbr) GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None, learning_rate=0.1, loss='ls', max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, presort='auto', random_state=None, subsample=1.0, verbose=0, warm_start=False)
Next, we'll fit the model with train data.
gbr.fit(xtrain, ytrain)
Predicting test data and visualizing the result
We can predict the test data and check the error rate as a following.
ypred = gbr.predict(xtest) mse = mean_squared_error(ytest,ypred)
print("MSE: %.2f" % mse)
MSE: 10.41
Finally, we'll visualize the original and predicted values in a plot.
x_ax = range(len(ytest)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.legend() plt.show()
In this post, we've briefly learned how to use Gradient Boosting Regressor to predict regression data in Python. Thank you for reading!
The full source code is listed below.
from sklearn.ensemble import GradientBoostingRegressor from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt boston = load_boston() x, y = boston.data, boston.target xtrain, xtest, ytrain, ytest=train_test_split(x, y, random_state=12, test_size=0.15) # with new parameters gbr = GradientBoostingRegressor(n_estimators=600, max_depth=5, learning_rate=0.01, min_samples_split=3) # with default parameters gbr = GradientBoostingRegressor() gbr.fit(xtrain, ytrain) ypred = gbr.predict(xtest) mse = mean_squared_error(ytest,ypred) print("MSE: %.2f" % mse) x_ax = range(len(ytest)) plt.scatter(x_ax, ytest, s=5, color="blue", label="original") plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted") plt.legend() plt.show()
No comments:
Post a Comment