Support Vector Regression Example with SVM in R

   Support Vector Machine is a supervised learning method and it can be used for regression and classification problems. An 'e1071' package provides 'svm' function to build support vector machines model to apply for regression problem in R. In this post, we'll briefly learn how to use 'svm' function for regression problem in R. The tutorial covers:
  1. Preparing the data
  2. Fitting the model and predicting test data
  3. Accuracy checking
  4. Source code listing
   We'll start by loading the required libraries for this tutorial. You can install them by typing the command install.packages(c("e1071", "caret")) if they are not available on your machine.

library(e1071)
library(caret)

Preparing the data

   We'll use the Boston housing price dataset as a target regression data in this tutorial. We'll prepare data by splitting it into the train and test parts.

boston = MASS::Boston
set.seed(123)
indexes = createDataPartition(boston$medv, p = .9, list = F)
train = boston[indexes, ]
test = boston[-indexes, ]



Fitting the model and predicting test data

   Train and test data are ready. Now, we can define the svm model with default parameters and fit it with train data. Here, we can change the kernel type into 'linear', 'polynomial', and 'sigmoid' for training and predicting. The default is a 'radial' kernel.

model_reg = svm(medv~., data=train)
print(model_reg)

Call:
svm(formula = medv ~ ., data = train)


Parameters:
   SVM-Type:  eps-regression 
 SVM-Kernel:  radial 
       cost:  1 
      gamma:  0.07692308 
    epsilon:  0.1 


Number of Support Vectors:  306

Next, we'll predict the test data and plot the results to compare visually.

pred = predict(model_reg, test)
 
x = 1:length(test$medv)
plot(x, test$medv, pch=18, col="red")
lines(x, pred, lwd="1", col="blue")



Accuracy checking

Finally, we'll check the prediction accuracy with the MSE, MAE, RMSE, and R-squared metrics.

mse = MSE(test$medv, pred)
mae = MAE(test$medv, pred)
rmse = RMSE(test$medv, pred)
r2 = R2(test$medv, pred, form = "traditional")
 
cat(" MAE:", mae, "\n", "MSE:", mse, "\n", 
     "RMSE:", rmse, "\n", "R-squared:", r2)
 MAE: 1.877403 
 MSE: 6.028015 
 RMSE: 2.455202 
 R-squared: 0.914078


  In this tutorial, we have briefly learned how to use an 'e1071' package's svm function for the regression problem. Thank you for reading and the full source code is listed below.


Source code listing

library(e1071)
library(caret)
 
# Regression example
boston = MASS::Boston
set.seed(123)
indexes = createDataPartition(boston$medv, p = .9, list = F)
train = boston[indexes, ]
test = boston[-indexes, ]
 
model_reg = svm(medv~., data=train)
print(model_reg)
 
pred = predict(model_reg, test)
 
x=1:length(test$medv)
plot(x, test$medv, pch=18, col="red")
lines(x, pred, lwd="1", col="blue")
 
# accuracy check 
mse = MSE(test$medv, pred)
mae = MAE(test$medv, pred)
rmse = RMSE(test$medv, pred)
r2 = R2(test$medv, pred, form = "traditional")
 
cat(" MAE:", mae, "\n", "MSE:", mse, "\n", 
    "RMSE:", rmse, "\n", "R-squared:", r2)
 


4 comments:

  1. http://www.analyticspath.com
    This information you have shared is really a lot helpful. Was searching for this info from a while. Looking forward for further such interesting postings from you

    ReplyDelete
  2. I am copying this program and it is working, the seed is correct and I also tried changing it, but my image with the regression line is different, and the R-squared is <0.8. The number of Support vectors is 302. Gamma and Epsilon are the same (since I copied the Source code). Maybe the dataset is different in 2021? Doesn't look like it from the graph. There's a significant difference between Rsquared <0.8 and yours >0.9

    ReplyDelete
  3. me too get R2= .79 why?

    ReplyDelete
  4. I have been working with SVR, and I just learned (and figured I would share) the R-squared is not an appropriate performance metric for goodness-of-fit for nonlinear models like SVR.
    Here is a blurb explaining why: https://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/

    And here is the suggestion for using a pesudo R-squared that compares the SVR model to just an intercept model: https://towardsdatascience.com/the-complete-guide-to-r-squared-adjusted-r-squared-and-pseudo-r-squared-4136650fc06c

    I am still using RMSE, MAE, and MAPE for my other performance metrics, though. They appear to work fine for nonlinear models.

    ReplyDelete