## Pages

### Regression Example with XGBoost in R

The XGBoost stands for "Extreme Gradient Boosting" and it is an implementation of gradient boosting trees algorithm. It is a popular supervised machine learning method with characteristics like computation speed, parallelization, and performance. XGBoost is an open-source software library and you can use it in the R development environment by downloading the xgboost R package.
In this tutorial, we'll briefly learn how to fit and predict regression data with the 'xgboost' function. The tutorial covers:
1. Preparing the data
2. Fitting the model and prediction
3. Accuracy checking
4. Source code listing

`library(xgboost)library(caret)`

Preparing the data

We use Boston house-price dataset as a regression dataset in this tutorial. After loading the dataset, first, we'll split them into the train and test parts, and extract x-input and y-label parts. Here, I'll extract 15 percent of the dataset as test data. The xgboost uses matrix data so that we need to convert our data into the xgb matrix type.

`boston = MASS::Bostonstr(boston)set.seed(12)indexes = createDataPartition(boston\$medv, p = .85, list = F)train = boston[indexes, ]test = boston[-indexes, ]train_x = data.matrix(train[, -13])train_y = train[,13]test_x = data.matrix(test[, -13])test_y = test[, 13]xgb_train = xgb.DMatrix(data = train_x, label = train_y)xgb_test = xgb.DMatrix(data = test_x, label = test_y)`

Fitting the model and prediction

We'll define the model by using the xgboost() function of xgboost package. Here, we'll set 'max_depth' and 'nrounds' parameters. A 'max_depth' defines the depth of trees that the higher value is the more complex the model is. An 'nrounds' is the maximum number of iteration.
The calling the function is enough to train the model with included data. You can check the summary of the model by using the print() and str() functions.

`xgbc = xgboost(data = xgb_train, max.depth = 2, nrounds = 50)print(xgbc)`
`##### xgb.Boosterraw: 22.2 Kb call:  xgb.train(params = params, data = dtrain, nrounds = nrounds,     watchlist = watchlist, verbose = verbose, print_every_n = print_every_n,     early_stopping_rounds = early_stopping_rounds, maximize = maximize,     save_period = save_period, save_name = save_name, xgb_model = xgb_model,     callbacks = callbacks, max_depth = 2)params (as set within xgb.train):  max_depth = "2", validate_parameters = "1"xgb.attributes:  nitercallbacks:  cb.print.evaluation(period = print_every_n)  cb.evaluation.log()# of features: 13 niter: 50nfeatures : 13 evaluation_log:    iter train_rmse       1  10.288543       2   7.710918---                      49   2.007022      50   1.997438`

Next, we'll predict the x test data with the xgbc model.

`pred_y = predict(xgbc, xgb_test)`

Accuracy check

Next, we'll check the prediction accuracy with MSE, MAE, and RMSE metrics.

`mse = mean((test_y - pred_y)^2)mae = caret::MAE(test_y, pred_y)rmse = caret::RMSE(test_y, pred_y)cat("MSE: ", mse, "MAE: ", mae, " RMSE: ", rmse)`
`MSE:  11.99942 MAE:  2.503739  RMSE:  3.464018`

Finally, we'll visualize y original test and y predicted data in a plot.

`x = 1:length(test_y)plot(x, test_y, col = "red", type = "l")lines(x, pred_y, col = "blue", type = "l")legend(x = 1, y = 38,  legend = c("original test_y", "predicted test_y"),        col = c("red", "blue"), box.lty = 1, cex = 0.8, lty = c(1, 1))`

In this tutorial, we've learned how to fit and predict regression data with xgboost in R. The full source code is listed below.

Source code listing

`library(xgboost)library(caret)boston = MASS::Bostonstr(boston)set.seed(12)indexes = createDataPartition(boston\$medv, p = .85, list = F)train = boston[indexes, ]test = boston[-indexes, ]train_x = data.matrix(train[, -13])train_y = train[,13]test_x = data.matrix(test[, -13])test_y = test[, 13]xgb_train = xgb.DMatrix(data = train_x, label = train_y)xgb_test = xgb.DMatrix(data = test_x, label = test_y)xgbc = xgboost(data = xgb_train, max.depth = 2, nrounds = 50)print(xgbc)pred_y = predict(xgbc, xgb_test)mse = mean((test_y - pred_y)^2)mae = caret::MAE(test_y, pred_y)rmse = caret::RMSE(test_y, pred_y)cat("MSE: ", mse, "MAE: ", mae, " RMSE: ", rmse)x = 1:length(test_y)plot(x, test_y, col = "red", type = "l")lines(x, pred_y, col = "blue", type = "l")legend(x = 1, y = 38,  legend = c("original test_y", "predicted test_y"),        col = c("red", "blue"), box.lty = 1, cex = 0.8, lty = c(1, 1))`

Reference: