- Preparing data
- Using the gbm method
- Using the gbm with a caret

library(gbm) library(caret)

**Preparing data**

In this tutorial, we'll use Boston housing dataset as regression data.

We'll load the dataset it is in MASS packages.

`boston = MASS::Boston`

You can check the data content.

str(boston) 'data.frame': 506 obs. of 14 variables: $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ... $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ... $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ... $ chas : int 0 0 0 0 0 0 0 0 0 0 ... $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ... $ rm : num 6.58 6.42 7.18 7 7.15 ... $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ... $ dis : num 4.09 4.97 4.97 6.06 6.06 ... $ rad : int 1 2 2 3 3 3 5 5 5 5 ... $ tax : num 296 242 242 222 222 222 311 311 311 311 ... $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ... $ black : num 397 397 393 395 397 ... $ lstat : num 4.98 9.14 4.03 2.94 5.33 ... $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

Here, 'medv' is a target - y output and the others are the features - x input data.

Next, we'll split the dataset into the train and test parts.

indexes = createDataPartition(boston$medv, p = .90, list = F) train = boston[indexes, ] test = boston[-indexes, ]

**Using the gbm method**

We'll define the gbm model with gaussiann distribution and I set the other parameters as below. And, we'll include train data to fit the model.

model_gbm = gbm(train$medv ~., data = train, distribution = "gaussian", cv.folds = 10, shrinkage = .01, n.minobsinnode = 10, n.trees = 500)

print(model_gbm) gbm(formula = train$medv ~ ., distribution = "gaussian", data = train, n.trees = 500, n.minobsinnode = 10, shrinkage = 0.01, cv.folds = 10) A gradient boosted model with gaussian loss function. 500 iterations were performed. The best cross-validation iteration was 500. There were 13 predictors of which 9 had non-zero influence.

Next, we'll predict test data and visialize the result in a plot.

test_x = test[, -14] test_y = test[, 14] pred_y = predict.gbm(model_gbm, test_x) x_ax = 1:length(pred_y) plot(x_ax, test_y, col="blue", pch=20, cex=.9) lines(x_ax, pred_y, col="red", pch=20, cex=.9)

**Using the gbm with caret**

We can also use gbm with caret train method.

tc = trainControl(method = "cv", number=10) model = train(medv ~., data=train, method="gbm", trControl=tc)

We'll predict test data and visualize in a plot.

pred_y = predict(model, test_x) x_ax = 1:length(pred_y) plot(x_ax, test_y, col="blue", pch=20, cex=.9) lines(x_ax, pred_y, col="red", pch=20, cex=.9)

In this post, we've briefly learned how to use gradient boosting method with gbm package for regression data. Thank you for reading.

The full source code is listed below.

```
library(gbm)
library(caret)
boston = MASS::Boston
str(boston)
indexes = createDataPartition(boston$medv, p = .90, list = F)
train = boston[indexes, ]
test = boston[-indexes, ]
model_gbm = gbm(train$medv ~.,
data = train,
distribution = "gaussian",
cv.folds = 10,
shrinkage = .01,
n.minobsinnode = 10,
n.trees = 500)
print(model_gbm)
test_x = test[, -14]
test_y = test[, 14]
pred_y = predict.gbm(model_gbm, test_x)
caret::R2(test_y, pred_y)
RMSE(test_y, pred_y)
x_ax = 1:length(pred_y)
plot(x_ax, test_y, col="blue", pch=20, cex=.9)
lines(x_ax, pred_y, col="red", pch=20, cex=.9)
# with caret method
tc = trainControl(method = "cv", number=10)
model = train(medv ~., data=train, method="gbm", trControl=tc)
print(model)
pred_y = predict(model, test_x)
RMSE(test_y, pred_y)
x_ax = 1:length(pred_y)
plot(x_ax, test_y, col="blue", pch=20, cex=.9)
lines(x_ax, pred_y, col="red", pch=20, cex=.9)
```

## No comments:

## Post a Comment