Gradient Boosting Regression Example in Python

   The idea of gradient boosting is to improve weak learners and create a final combined prediction model. Decision trees are mainly used as base learners in this algorithm. The weak learner is identified by the gradient in the loss function. The prediction of a weak learner is compared to actual value and error is calculated. Based on this error, the model can determine the gradient and change the parameters to decrease the error rate in the next training.
   In this tutorial, we'll learn how to predict regression data with the Gradient Boosting Regressor (comes in sklearn.ensemble module) class in Python. The post covers:
  1. Preparing data
  2. Defining the model
  3. Predicting test data and visualizing the result
We'll start by loading the required libraries.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt


Preparing data

   We use Boston house-price dataset as regression dataset in this tutorial. After loading the dataset, first, we'll separate data into x and y parts.

boston = load_boston()
x, y = boston.data, boston.target

Then we'll split it into train and test parts. Here, we'll extract 15 percent of the data as a test.

xtrain, xtest, ytrain, ytest=train_test_split(x, y, random_state=12, 
             test_size=0.15)


Defining the model

We can define the model with its default parameters or set the new parameter values.

# with new parameters
gbr = GradientBoostingRegressor(n_estimators=600, 
    max_depth=5, 
    learning_rate=0.01, 
    min_samples_split=3)
# with default parameters
gbr = GradientBoostingRegressor()
 
print(gbr)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=100, presort='auto', random_state=None,
             subsample=1.0, verbose=0, warm_start=False)

Next, we'll fit the model with train data.

gbr.fit(xtrain, ytrain)


Predicting test data and visualizing the result

We can predict the test data and check the error rate as a following.

ypred = gbr.predict(xtest)
mse = mean_squared_error(ytest,ypred)
 
print("MSE: %.2f" % mse)
MSE: 10.41

Finally, we'll visualize the original and predicted values in a plot.

x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()



   In this post, we've briefly learned how to use Gradient Boosting Regressor to predict regression data in Python. Thank you for reading!

The full source code is listed below.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

boston = load_boston()
x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, random_state=12, 
             test_size=0.15)
# with new parameters
gbr = GradientBoostingRegressor(n_estimators=600, 
    max_depth=5, 
    learning_rate=0.01, 
    min_samples_split=3)
# with default parameters
gbr = GradientBoostingRegressor()

gbr.fit(xtrain, ytrain)

ypred = gbr.predict(xtest)
mse = mean_squared_error(ytest,ypred)
print("MSE: %.2f" % mse)

x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()


No comments:

Post a Comment