## Pages

### Regression Example with RandomForestRegressor in Python

Random forest is an ensemble learning algorithm based on decision tree learners. The estimator fits multiple decision trees on randomly extracted subsets from the dataset and averages their prediction.

Scikit-learn API provides the RandomForestRegressor class included in ensemble module to implement the random forest for regression problem.

In this tutorial, we'll briefly learn how to fit and predict regression data by using the RandomForestRegressor class in Python. The tutorial covers:

1. Preparing the data
2. Training the model
3. Predicting and accuracy check
4. Boston dataset prediction
5. Source code listing

```from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt
from sklearn import set_config ```
```
```

Preparing the data

First, we'll generate random regression data with make_regression() function. The dataset contains 10 features and 5000 samples.

```x, y = make_regression(n_samples=5000, n_features=10)
print(x[0:2])
print(y[0:2])```
` `
`[[ 1.773  2.534  0.693 -1.11   1.492  0.631 -0.577  0.085 -1.308  1.024]`
`[ 1.953 -1.362  1.294  1.025  0.463 -0.485 -1.849  1.858  0.483 -0.52 ]][120.105 262.69 ] `

To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 10 percent of the samples as test data.

```x = scale(x)
y = scale(y)xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.10)
```

Training the model

Next, we'll define the regressor model by using the RandomForestRegressor class. Here, we can use default parameters of the RandomForestRegressor class. The default values can be seen in below.

`set_config(print_changed_only=False) `
` `
```rfr = RandomForestRegressor()
print(rfr)```
` `
`RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',                      max_depth=None, max_features='auto', max_leaf_nodes=None,                      max_samples=None, min_impurity_decrease=0.0,                      min_impurity_split=None, min_samples_leaf=1,                      min_samples_split=2, min_weight_fraction_leaf=0.0,                      n_estimators=100, n_jobs=None, oob_score=False,                      random_state=None, verbose=0, warm_start=False) `

Then, we'll fit the model on train data and check the model accuracy score.

```rfr.fit(xtrain, ytrain)

score = rfr.score(xtrain, ytrain)
print("R-squared:", score) ```
` `
`R-squared: 0.9796146270086489 `
` `

Predicting and accuracy check

Now, we can predict the test data by using the trained model. We can check the accuracy of predicted data by using MSE and RMSE metrics.

```ypred = rfr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0)) ```

`MSE:  0.130713987032462RMSE:  0.065356993516231 `
` `

Finally, we'll visualize the original and predicted data in a plot.

```x_ax = range(len(ytest))
plt.plot(x_ax, ytest, linewidth=1, label="original")
plt.plot(x_ax, ypred, linewidth=1.1, label="predicted")
plt.title("y-test and y-predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show() ```

Boston housing dataset prediction

We'll apply the same method we've learned above to the Boston housing price regression dataset. We'll load it by using load_boston() function, scale and split into train and test parts. Then, we'll define model by changing some of the parameter values, check training accuracy, and predict test data.

` `
```print("Boston housing dataset prediction.")
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

rfr = RandomForestRegressor()
rfr.fit(xtrain, ytrain)

score = rfr.score(xtrain, ytrain)
print("R-squared:", score)

ypred = rfr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show()```
` `
`Boston housing dataset prediction.R-squared: 0.9834125970221356MSE:  0.2157465095558568RMSE:  0.1078732547779284  `
` `
In this tutorial, we've briefly learned how to fit and predict regression data by using Scikit-learn API's RandomForestRegressor class in Python. The full source code is listed below.

Source code listing

```from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt
from sklearn import set_config

x, y = make_regression(n_samples=5000, n_features=10)
print(x[0:2])
print(y[0:2])

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.10)

set_config(print_changed_only=False)
rfr = RandomForestRegressor()
print(rfr)

rfr.fit(xtrain, ytrain)

score = rfr.score(xtrain, ytrain)
print("R-squared:", score)

ypred = rfr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, linewidth=1, label="original")
plt.plot(x_ax, ypred, linewidth=1.1, label="predicted")
plt.title("y-test and y-predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.grid(True)
plt.show()

print("Boston housing dataset prediction.")
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

rfr = RandomForestRegressor()
rfr.fit(xtrain, ytrain)

score = rfr.score(xtrain, ytrain)
print("R-squared:", score)

ypred = rfr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
` `