Support Vector Regression Example in Python


   Support Vector Regression (SVR) is a regression algorithm, and it applies a similar technique of Support Vector Machines (SVM) for regression analysis. As we know, regression data contains continuous real numbers. To fit such type of data, the SVR model approximates the best values with a given margin called ε-tube (epsilon-tube, ε identifies a tube width) with considering the model complexity and error rate. In this post, we'll learn how to fit and predict regression data with SVR in python.
   First, we add the required libraries into our source code.

import random
import math
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR 
from sklearn.metrics import mean_squared_error 

Next, we create sample data.

random.seed(123)
def getData(N):
 x,y = [],[]
 for i in range(N):  
  a = i/10+random.uniform(-1,1)
  yy = math.sin(a)+3+random.uniform(-1,1)
  x.append([a])
  y.append([yy])  
 return np.array(x), np.array(y)

x,y = getData(200)

Test data is ready. To create the SVR model, we use SVR() function with default parameters that match well with our test data.

model = SVR()
print(model)
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False) 

Here, kernel, C, and epsilon are important parameters, and they can be changed according to regression data characteristics. Kernel identifies kernel type in an algorithm. An 'rbf' (default kernel), 'linear', 'poly', and 'sigmoid' can be used.
We fit the model with x, y data, then predict x data.

model.fit(x,y)
pred_y = model.predict(x)

Finally, we check the results and visualize outputs. A 'score' identifies the R-squared value.

for yo, yp in zip(y[1:15,:], pred_y[1:15]):
 print(yo,yp)
[2.12998819] 2.3688522493273485
[2.91907141] 3.285632204333334
[3.02825117] 2.953252316970487
[3.21241735] 3.3448365096752717
[2.84114287] 2.7413569211507602
[2.09354503] 2.629728633229279
[2.71700547] 3.092804036168382
[2.97862119] 3.2818706188759346
[3.40296856] 3.0690924469559113
[3.15686687] 3.8962639841272315
[3.95510045] 2.963577687955483
[4.06240409] 2.8227461040611517
[3.52296771] 3.623387735802008
[4.41282252] 3.8982877638029247 

x_ax=range(200)
plt.scatter(x_ax, y, s=5, color="blue", label="original")
plt.plot(x_ax, pred_y, lw=1.5, color="red", label="predicted")
plt.legend()
plt.show() 
 

score=model.score(x,y)
print(score)
0.6066306757957185 
 
mse =mean_squared_error(y, pred_y)
print("Mean Squared Error:",mse)
Mean Squared Error: 0.30499845231798917 
 
rmse = math.sqrt(mse)
print("Root Mean Squared Error:", rmse)
Root Mean Squared Error: 0.5522666496521306 

   In this post, we have briefly learned how to use SVR and predict data in Python. Thank you for reading! Below is a full source code.

import random
import math
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR 
from sklearn.metrics import mean_squared_error 

random.seed(123)
def getData(N):
 x,y =[],[]
 for i in range(N):  
  a = i/10+random.uniform(-1,1)
  yy =math.sin(a)+3+random.uniform(-1,1)
  x.append([a])
  y.append([yy])
  
 return np.array(x), np.array(y)

x,y=getData(200)
model = SVR()
print(model)

model.fit(x,y)
pred_y = model.predict(x)
for yo, yp in zip(y[1:15,:], pred_y[1:15]):
 print(yo,yp)

x_ax=range(200)
plt.scatter(x_ax, y, s=5, color="blue", label="original")
plt.plot(x_ax, pred_y, lw=1.5, color="red", label="predicted")
plt.legend()
plt.show()

score=model.score(x,y)
print(score)

mse =mean_squared_error(y, pred_y)
print("Mean Squared Error:",mse)

rmse = math.sqrt(mse)
print("Root Mean Squared Error:", rmse)

5 comments:
  1. Hi, why is the red line is called predicted, isn't this line an approximation? Can SVR actually can be used to predict? let's say I have 2 minutes of data, can I apply SVR to predict how is this data going to behave for the next 20 minutes?

    ReplyDelete
    Replies
    1. no because train should always be larger than size of data you try to predict

      Delete