AdaBoost (Adaptive Boosting) is another widely used boosting algorithm in machine learning. Improving week learners and creating an aggregated model to improve model accuracy is a key concept of boosting algorithms. A weak learner is defined as the one with poor performance or slightly better than a random guess classifier. Adaboost improves those classifiers by increasing their weights and gets their votes to create the final combined model.
In this post, we'll learn how to use the adabag package's boosting function to classify data in R. The tutorial covers:
- Preparing data
- Classification with boosting
- Classification with boosting.cv
library(adabag) library(caret) |
Preparing data
In this tutorial, we'll use iris dataset as the classification data. First, we'll split the dataset into the train and test parts. Here we'll use 10 percent of a dataset as a test data.
indexes=createDataPartition(iris$Species, p=.90, list = F) train = iris[indexes, ] test = iris[-indexes, ]
Classification with boosting
We'll define the model with boosting function and train it with train data. The 'boosting' function applies the AdaBoost.M1 and SAMME algorithms using classification trees. A 'boos' is a bootstrap uses the weights for each observation in an iteration if it is TRUE. Otherwise, each observation is used with its weight. A 'mfinal' is the number of iterations or trees to use.
model = boosting(Species~., data=train, boos=TRUE, mfinal=50)
We can check the model properties
print(names(model))
[1] "formula" "trees" "weights" "votes" "prob" "class"
[7] "importance" "terms" "call"
print(model$trees[1]) [[1]] n= 135 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 135 88 versicolor (0.3185185 0.3481481 0.3333333) 2) Petal.Length< 2.7 43 0 setosa (1.0000000 0.0000000 0.0000000) * 3) Petal.Length>=2.7 92 45 versicolor (0.0000000 0.5108696 0.4891304) 6) Petal.Width< 1.75 50 3 versicolor (0.0000000 0.9400000 0.0600000) * 7) Petal.Width>=1.75 42 0 virginica (0.0000000 0.0000000 1.0000000) *
The model is ready and we can predict test data. Predicted data accuracy is also included in output data.
pred = predict(model, test)
print(pred$confusion) Observed Class Predicted Class setosa versicolor virginica setosa 5 0 0 versicolor 0 5 0 virginica 0 0 5
print(pred$error) [1] 0
We can also print the probability of each class in test data.
result = data.frame(test$Species, pred$prob, pred$class) print(result) test.Species X1 X2 X3 pred.class 1 setosa 0.92897958 0.07102042 0.00000000 setosa 2 setosa 0.90999935 0.07693250 0.01306815 setosa 3 setosa 0.88902756 0.09790429 0.01306815 setosa 4 setosa 0.92897958 0.07102042 0.00000000 setosa 5 setosa 0.88902756 0.09790429 0.01306815 setosa 6 versicolor 0.01288461 0.91943143 0.06768396 versicolor 7 versicolor 0.01288461 0.84235917 0.14475622 versicolor 8 versicolor 0.03205498 0.95093238 0.01701263 versicolor 9 versicolor 0.03205498 0.95093238 0.01701263 versicolor 10 versicolor 0.03205498 0.95093238 0.01701263 versicolor 11 virginica 0.00000000 0.04468596 0.95531404 virginica 12 virginica 0.00000000 0.01577596 0.98422404 virginica 13 virginica 0.00000000 0.05561801 0.94438199 virginica 14 virginica 0.00000000 0.05561801 0.94438199 virginica 15 virginica 0.00000000 0.33446425 0.66553575 virginica
Classification with boosting.cv
The boosting.cv function provides cross-validation method. The training data is divided into multiple subsets to apply boosting and prediction is performed for the entire dataset. To train the model we use entire dataset and get prediction result. Here, v is cross-validation subsets numbers.
cvmodel = boosting.cv(Species~., data=iris, boos=TRUE, mfinal=10, v=5)
We'll check the accuracy.
print(cvmodel[-1]) $confusion Observed Class Predicted Class setosa versicolor virginica setosa 50 0 0 versicolor 0 45 3 virginica 0 5 47 $error [1] 0.05333333
You can compare the original and predicted classes.
data.frame(iris$Species, cvmodel$class)
In this post, we've briefly learned how to classify data with the adabag boosting model in R. Thank you for reading!
The full source code is listed below.
library(adabag)
library(caret)
indexes=createDataPartition(iris$Species, p=.90, list = F)
train = iris[indexes, ]
test = iris[-indexes, ]
model = boosting(Species~., data=train, boos=TRUE, mfinal=50)
print(names(model))
print(model$trees[1])
pred = predict(model, test)
print(pred$confusion)
print(pred$error)
result = data.frame(test$Species, pred$prob, pred$class)
print(result)
# cross-validataion method
cvmodel = boosting.cv(Species~., data=iris, boos=TRUE, mfinal=10, v=5)
print(cvmodel[-1])
print(data.frame(iris$Species, cvmodel$class))
No comments:
Post a Comment