DataTechNotes: Lasso Regression Example in Python

LASSO (Least Absolute Shrinkage and Selection Operator) is a regularization method to minimize overfitting in a regression model. It reduces large coefficients by applying the L1 regularization which is the sum of their absolute values. In this post, we'll learn how to use Lasso and LassoCV classes for regression analysis in Python. The post covers:

Preparing data
Regression with Lasso
Regression with LassoCV
Source code listing

We'll start by loading the required libraries.

from sklearn.datasets import load_boston
from sklearn.linear_model import Lasso, LassoCV
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

Preparing data

We use Boston house-price dataset as regression data in this tutorial. After loading the dataset, first, we'll separate it into the x - feature and y - label, then split into the train and test parts. Here, we'll extract 15 percent of the dataset as test data.

boston = load_boston()
x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)

Regression with Lasso

Lasso regularization in a model can described,

L1 = (wx + b - y) + a|w|

w - weight, b - bias, y - label (original), a - alpha constant. If we set 0 value into a, it becomes a linear regression model. Thus for Lasso, alpha should be a > 0.
To define the model we use default parameters of Lasso class ( default alpha is 1). Then we'll fit the model with training data.

model=Lasso().fit(x, y)

print(model)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

Next, we'll check the score (R-squared), predict test data, check the accuracy, and print all the metrics.

score = model.score(x, y)
ypred = model.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("Alpha:{0:.2f}, R2:{1:.2f}, MSE:{2:.2f}, RMSE:{3:.2f}"
    .format(model.alpha, score, mse, np.sqrt(mse)))

Alpha:1.00, R2:0.68, MSE:27.10, RMSE:5.21

Here, we can change the alpha value to improve model accuracy. To find out what value works well with the model, we'll find out with the LassoCV class.

Regression with LassoCV

LassoCV applies cross-validation method to find out the best model. We'll set multiple alpha values and to train the model.

alphas = [0.1,0.3, 0.5, 0.8, 1]
lassocv = LassoCV(alphas=alphas, cv=5).fit(x,y)
print(lassocv)

LassoCV(alphas=[0.1, 0.3, 0.5, 0.8, 1], copy_X=True, cv=5, eps=0.001,
    fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1,
    normalize=False, positive=False, precompute='auto', random_state=None,
    selection='cyclic', tol=0.0001, verbose=False)

Next, we'll check the score (R-squared), predict test data, check the accuracy, and print all the metrics.

score = lassocv.score(x,y)
ypred = lassocv.predict(xtest)
mse = mean_squared_error(ytest,ypred)
print("Alpha:{0:.2f}, R2:{1:.3f}, MSE:{2:.2f}, RMSE:{3:.2f}"
    .format(lassocv.alpha_, score, mse, np.sqrt(mse)))

Alpha:0.30, R2:0.721, MSE:20.24, RMSE:4.50

Finally, we can visualize the result in a plot.

x_ax = range(len(xtest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred,lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()

In this post, we've briefly learned how to use Ridge and RidgeCV classes for regression data analysis in Python. The full source code is listed below. Thank you for reading!

Source code listing

from sklearn.datasets import load_boston
from sklearn.linear_model import Lasso, LassoCV
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt

boston = load_boston()
x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.15)

model = Lasso().fit(x, y) 
print(model)
score = model.score(x, y)
ypred = model.predict(xtest)
mse = mean_squared_error(ytest,ypred)
print("Alpha:{0:.2f}, R2:{1:.2f}, MSE:{2:.2f}, RMSE:{3:.2f}"
    .format(model.alpha, score, mse, np.sqrt(mse)))

x_ax = range(len(ypred))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()

alphas = [0.1,0.3, 0.5, 0.8, 1]
lassocv = LassoCV(alphas=alphas, cv=5).fit(x,y)
print(lassocv)
score = lassocv.score(x,y)
ypred = lassocv.predict(xtest)
mse = mean_squared_error(ytest,ypred)
print("Alpha:{0:.2f}, R2:{1:.3f}, MSE:{2:.2f}, RMSE:{3:.2f}"
    .format(lassocv.alpha_, score, mse, np.sqrt(mse)))

x_ax = range(len(xtest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()

DataTechNotes

Pages

Lasso Regression Example in Python

No comments:

Post a Comment