Least Angle Regression Example in Python

   Regression algorithm Least Angle Regression (LARS) provides the response by the linear combination of variables for high-dimensional data. It relates to forward stepwise regression. In this method, the most correlated variable is selected in each step in a direction that is equiangular between the two predictors.
   In this tutorial, we'll learn how to fit regression data with LARS and Lasso Lars algorithms in Python. We'll use the sklearn's Lars and LarsLasso estimators and the Boston housing dataset in this tutorial. The post covers:
  1. Preparing the data
  2. How to use LARS
  3. How to use Lasso LARS
  4. Source code listing
 Let's get started by loading the required packages.

from sklearn import linear_model
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from numpy import sqrt


Preparing the data

We'll load the Boston dataset and split it into the train and test parts.

boston = load_boston()
x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)


How to use LARS

We'll define the model with Lars() class (with default parameters) and fit it with train data.

lars = linear_model.Lars().fit(xtrain, ytrain)
print(lars)
Lars(copy_X=True, eps=2.220446049250313e-16, fit_intercept=True,
   fit_path=True, n_nonzero_coefs=500, normalize=True, positive=False,
   precompute='auto', verbose=False)
 
And check the model coefficients.

print(lars.coef_)
[-1.16800795e-01  1.02016954e-02 -2.99472206e-01  4.21380667e+00
 -2.18450214e+01  4.01430635e+00 -9.90351759e-03 -1.60916999e+00
 -2.32195752e-01  2.80140313e-02 -1.08077980e+00  1.07377184e-02
 -5.02331702e-01]

Next, we'll predict test data and check the MSE and RMSE metrics.

ypred = lars.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: %.2f" % mse)
MSE: 36.96
print("RMSE: %.2f" % sqrt(mse))
RMSE: 6.08 

Finally, we'll create the plot to visualize the original and predicted data.

x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()

 



How to use Lasso Lars

LassoLars is an implementation of the LARS algorithm with the Lasso model. We'll define the model with LassoLars() class by setting 0.1 to the alpha parameter and fit the model on train data.

assolars = linear_model.LassoLars(alpha =.1).fit(xtrain, ytrain)
print(lassolars)
LassoLars(alpha=0.1, copy_X=True, eps=2.220446049250313e-16,
     fit_intercept=True, fit_path=True, max_iter=500, normalize=True,
     positive=False, precompute='auto', verbose=False)
 
We can check the coefficients.

print(lassolars.coef_)
[ 0.          0.          0.          0.          0.          3.00873485
  0.          0.          0.          0.         -0.28423008  0.
 -0.42849354] 

Next, we'll predict test data and check the MSE and RMSE metrics.

ypred = lassolars.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: %.2f" % mse)
MSE: 45.59
print("RMSE: %.2f" % sqrt(mse))
RMSE: 6.75

Finally, we'll create the plot to visualize the original and predicted data.

x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()


   In this tutorial, we've briefly learned how to fit and predict regression data with LARS and Lasso Lars algorithms.


Source code listing

from sklearn import linear_model
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from numpy import sqrt

boston = load_boston()
x, y = boston.data, boston.target
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

lars = linear_model.Lars().fit(xtrain, ytrain)
print(lars)
print(lars.coef_)

ypred = lars.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: %.2f" % mse)
print("RMSE: %.2f" % sqrt(mse))

x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()

lassolars = linear_model.LassoLars(alpha =.1).fit(xtrain, ytrain)
print(lassolars) 
print(lassolars.coef_)

ypred = lassolars.predict(xtest)
mse = mean_squared_error(ytest, ypred)
print("MSE: %.2f" % mse)
print("RMSE: %.2f" % sqrt(mse))

x_ax = range(len(ytest))
plt.scatter(x_ax, ytest, s=5, color="blue", label="original")
plt.plot(x_ax, ypred, lw=0.8, color="red", label="predicted")
plt.legend()
plt.show()


References and further reading:
  1. Least Angle Regression, by Efron Bradley; Hastie Trevor; Johnstone Iain; Tibshirani Robert (2004)
  2. Least-Angel Regression, Wikipedia
  3. sklearn.linear_model.Lars
  4. sklearn.linear_model.LassoLars

No comments:

Post a Comment