DataTechNotes: Regression Example with Linear SVR Method in Python

Based on support vector machines method, the Linear SVR is an algorithm to solve the regression problems. The Linear SVR algorithm applies linear kernel method and it works well with large datasets. L1 or L2 method can be specified as a loss function in this model.

In this tutorial, we'll briefly learn how to fit and predict regression data by using Scikit-learn's LinearSVR class in Python. The tutorial covers:

Preparing the data
Training the model
Predicting and accuracy check
Boston dataset prediction
Source code listing

We'll start by loading the required libraries.

from sklearn.svm import LinearSVR
from sklearn.datasets import load_boston
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt

Preparing the data

First, we'll generate random regression data with make_regression() function. The dataset contains 10 features and 1000 samples.

x, y = make_regression(n_samples=1000, n_features=10)
print(x[0:2])
print(y[0:2])

[[ 0.07940349 -0.62826076  1.35829589 -0.94757278  0.4330519   0.06052787
  -0.59091938  0.14826325 -0.76850621 -0.84848105]
 [-0.2728921  -0.63341441 -0.86528475  0.56128328 -0.34668921  1.30640379
  -0.18253121 -0.05468702  0.41798946  0.30962429]]
[-131.66928697  -38.6226293 ]

To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

Training the model

Next, we'll define the regressor model by using the LinearSVR class. Here, we can use default parameters of the LinearSVR class.

lsvr = LinearSVR(verbose=0, dual=True)
print(lsvr)

LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,
          intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
          random_state=None, tol=0.0001, verbose=0)

Then, we'll fit the model on train data and check the model accuracy score.

lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

R-squared: 1.0

We can also apply a cross-validation method to the model and check the training accuracy.

cv_score = cross_val_score(lsvr, x, y, cv = 10)
print("CV mean score: ", cv_score.mean())

CV mean score:  1.0

Predicting and accuracy check

Now, we can predict the test data by using the trained model. We can check the accuracy of predicted data by using MSE and RMSE metrics.

ypred = nsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse**(1/2.0))

MSE:  0.01787051983592968
RMSE:  0.00893525991796484

Finally, we'll visualize the original and predicted data in a plot.

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Test and predicted data")
plt.legend()
plt.show()

Boston housing dataset prediction

We'll apply the same method we've learned above to the Boston housing price regression dataset. We'll load it by using load_boston() function, scale and split into train and test parts. Then, we'll define model, check accuracy, and predict test data.

print("Boston housing dataset prediction.")
boston = load_boston()
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

lsvr = LinearSVR(verbose=0)
lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

cv_score = cross_val_score(lsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = lsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse**(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.legend()
plt.show()

Boston housing dataset prediction.
R-squared: 0.6938345064487695
CV mean score:  0.2838069239279085
MSE:  0.2388146523953546
RMSE:  0.1194073261976773

In this tutorial, we've briefly learned how to fit and predict regression data by using Scikit-learn API's LinearSVR class in Python. The full source code is listed below.

Source code listing

from sklearn.svm import LinearSVR
from sklearn.datasets import load_boston
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt

x, y = make_regression(n_samples=1000, n_features=30)
print(x[0:2])
print(y[0:2])

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

lsvr = LinearSVR()
print(lsvr)

lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

cv_score = cross_val_score(lsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = lsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse**(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, linewidth=1, label="original")
plt.plot(x_ax, ypred, linewidth=1.1, label="predicted")
plt.title("y-test and y-predicted data")
plt.legend()
plt.show()


print("Boston housing dataset prediction.")
boston = load_boston()
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

lsvr = LinearSVR()
lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

cv_score = cross_val_score(lsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = lsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse**(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.legend()
plt.show()

References:

Scikit learn API

DataTechNotes

Pages

Regression Example with Linear SVR Method in Python

No comments:

Post a Comment