## Pages

### Regression Example with Linear SVR Method in Python

Based on support vector machines method, the Linear SVR is an algorithm to solve the regression problems. The Linear SVR algorithm applies linear kernel method and it works well with large datasets. L1 or L2 method can be specified as a loss function in this model.

In this tutorial, we'll briefly learn how to fit and predict regression data by using Scikit-learn's LinearSVR class in Python. The tutorial covers:
1. Preparing the data
2. Training the model
3. Predicting and accuracy check
4. Boston dataset prediction
5. Source code listing

```from sklearn.svm import LinearSVR
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt
```

Preparing the data

First, we'll generate random regression data with make_regression() function. The dataset contains 10 features and 1000 samples.

```x, y = make_regression(n_samples=1000, n_features=10)
print(x[0:2])
print(y[0:2])
[[ 0.07940349 -0.62826076  1.35829589 -0.94757278  0.4330519   0.06052787  -0.59091938  0.14826325 -0.76850621 -0.84848105] [-0.2728921  -0.63341441 -0.86528475  0.56128328 -0.34668921  1.30640379  -0.18253121 -0.05468702  0.41798946  0.30962429]][-131.66928697  -38.6226293 ]
```

To improve the model accuracy we'll scale both x and y data then, split them into train and test parts. Here, we'll extract 15 percent of the samples as test data.

```x = scale(x)
y = scale(y)xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
```

Training the model

Next, we'll define the regressor model by using the LinearSVR class. Here, we can use default parameters of the LinearSVR class.

```lsvr = LinearSVR(verbose=0, dual=True)
print(lsvr)

LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True,          intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,          random_state=None, tol=0.0001, verbose=0)```

Then, we'll fit the model on train data and check the model accuracy score.

```lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)
R-squared: 1.0
```

We can also apply a cross-validation method to the model and check the training accuracy.

```cv_score = cross_val_score(lsvr, x, y, cv = 10)
print("CV mean score: ", cv_score.mean())
CV mean score:  1.0```

Predicting and accuracy check

Now, we can predict the test data by using the trained model. We can check the accuracy of predicted data by using MSE and RMSE metrics.

```ypred = nsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

MSE:  0.01787051983592968
RMSE:  0.00893525991796484```

Finally, we'll visualize the original and predicted data in a plot.

```x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Test and predicted data")
plt.legend()
plt.show()```

Boston housing dataset prediction

We'll apply the same method we've learned above to the Boston housing price regression dataset. We'll load it by using load_boston() function, scale and split into train and test parts. Then, we'll define model, check accuracy, and predict test data.

```print("Boston housing dataset prediction.")
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

lsvr = LinearSVR(verbose=0)
lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

cv_score = cross_val_score(lsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = lsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.legend()
plt.show()Boston housing dataset prediction.
R-squared: 0.6938345064487695
CV mean score:  0.2838069239279085MSE:  0.2388146523953546RMSE:  0.1194073261976773```

In this tutorial, we've briefly learned how to fit and predict regression data by using Scikit-learn API's LinearSVR class in Python. The full source code is listed below.

Source code listing

```from sklearn.svm import LinearSVR
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt

x, y = make_regression(n_samples=1000, n_features=30)
print(x[0:2])
print(y[0:2])

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

lsvr = LinearSVR()
print(lsvr)

lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

cv_score = cross_val_score(lsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = lsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, linewidth=1, label="original")
plt.plot(x_ax, ypred, linewidth=1.1, label="predicted")
plt.title("y-test and y-predicted data")
plt.legend()
plt.show()

print("Boston housing dataset prediction.")
x, y = boston.data, boston.target

x = scale(x)
y = scale(y)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=.15)

lsvr = LinearSVR()
lsvr.fit(xtrain, ytrain)

score = lsvr.score(xtrain, ytrain)
print("R-squared:", score)

cv_score = cross_val_score(lsvr, x, y, cv=10)
print("CV mean score: ", cv_score.mean())

ypred = lsvr.predict(xtest)

mse = mean_squared_error(ytest, ypred)
print("MSE: ", mse)
print("RMSE: ", mse*(1/2.0))

x_ax = range(len(ytest))
plt.plot(x_ax, ytest, label="original")
plt.plot(x_ax, ypred, label="predicted")
plt.title("Boston test and predicted data")
plt.legend()
plt.show()
```

References: