DataTechNotes: Regression Example with Nu Support Vector Regression Method in Python

Based on support vector machines method, Nu Support Vector Regression (NuSVR) is an algorithm to solve the regression problems. The NuSVR algorithm applies nu parameter by replacing the the epsilon parameter of SVR method. The Scikit-learn explains that the parameter nu is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors¹.

In this tutorial, we'll briefly learn how to fit and predict regression data by using Scikit-learn's NuSVR class in Python. The tutorial covers:

Preparing the data
Training the model
Predicting and accuracy check
Boston dataset prediction
Source code listing

We'll start by loading the required libraries.

from sklearn.svm import NuSVR
from sklearn.datasets import load_boston
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt

Preparing the data

First, we'll generate random regression data with make_regression() function. The dataset contains 10 features and 1000 samples.

x, y = make_regression(n_samples=1000, n_features=10)
print(x[0:2])
print(y[0:2])

[[ 1.01646401 -0.41404149 -0.33426236 -2.31816799 -0.60889924  0.80205365
   0.50961324  2.21412708 -0.04765094 -1.29481218]
 [-0.01471556 -1.22287924 -0.4500027   0.8349292  -1.74252028 -0.71654997
   0.58212652  2.1221269  -1.71193889 -0.16591502]]
[-289.88769812 -373.7687416 ]

To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

Training the model

Next, we'll define the regressor by using the NuSVR class. Here, we can use default parameters of the model.

nsvr = NuSVR()
print(nsvr)

NuSVR(C=1.0, cache_size=200, coef0=0.0, degree=3, gamma='scale', kernel='rbf',
      max_iter=-1, nu=0.5, shrinking=True, tol=0.001, verbose=False)

Then, we'll fit the model on train data and check the model accuracy score.

nsvr.fit(xtrain, ytrain)

score = nsvr.score(xtrain, ytrain)
print("R-squaered:", score)

R-squaered: 0.99581178159984

We can also apply a cross-validation method to the model and check the training accuracy.

cv_score = cross_val_score(nsvr, x, y, cv = 10)
print("CV mean score: ", cv_score.mean())

CV mean score:  0.9743050797057672

Predicting and accuracy check

Now, we can predict the test data by using the trained model. We can check the accuracy of predicted data by using MSE and RMSE metrics.

ypred = nsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

MSE:  0.01787051983592968
RMSE:  0.00893525991796484

Finally, we'll visualize the original and predicted data in a plot.

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Test and predicted data")
plt.legend()
plt.show()

Boston housing dataset prediction

We'll apply the same method we've learned above to the Boston housing price regression dataset. We'll load it by using load_boston() function, scale and split into train and test parts. Then, we'll define model, check accuracy, and predict test data.

print("Boston housing dataset prediction.")
boston = load_boston()
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

nsvr = NuSVR()
nsvr.fit(xtrain, ytrain)

score = nsvr.score(xtrain, ytrain)
print("R-squaered:", score)

cv_score = cross_val_score(nsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = nsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.legend()
plt.show()

Boston housing dataset prediction.
R-squaered: 0.8829677625633515
CV mean score:  0.5229267100173134
MSE:  0.101282412378955
RMSE:  0.0506412061894775

In this tutorial, we've briefly learned how to fit and predict regression data by using Scikit-learn API's NuSVR class in Python. The full source code is listed below.

Source code listing

from sklearn.svm import NuSVR
from sklearn.datasets import load_boston
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt


x, y = make_regression(n_samples=1000, n_features=10)
print(x[0:2])
print(y[0:2])

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

nsvr = NuSVR()
print(nsvr)

nsvr.fit(xtrain, ytrain)

score = nsvr.score(xtrain, ytrain)
print("R-squaered:", score)

cv_score = cross_val_score(nsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = nsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Test and predicted y data")
plt.legend()
plt.show()


print("Boston housing dataset prediction.")
boston = load_boston()
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

nsvr = NuSVR()
nsvr.fit(xtrain, ytrain)

score = nsvr.score(xtrain, ytrain)
print("R-squaered:", score)

cv_score = cross_val_score(nsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = nsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.legend()
plt.show()

References:

Scikit learn API

DataTechNotes

Pages

Regression Example with Nu Support Vector Regression Method in Python

No comments:

Post a Comment