In this post, we'll briefly learn how to use Lasso regularization in R. A 'glmnet' package provides regularization functions for Lasso. The tutorial covers:

- Preparing the data
- Defining the model
- Predicting test data
- Source code listing

library(glmnet) library(caret)

**Preparing the data**

We'll use Boston housing dataset in this tutorial, we'll load it, split into the train test parts.

set.seed(123) boston = MASS::Boston indexes = createDataPartition(boston$medv, p=.85, list=F) train = boston[indexes, ] test = boston[-indexes, ]

You can check the dataset with 'str(boston)' command. Here, the 'medv' variable is y, label column, and remainings are the x, feature data.

Now, we'll separate x and y parts of the train and test data. An x input data should be in matrix type and we'll convert it.

xtrain = as.matrix(train)[,-14] ytrain = train[,14] xtest = as.matrix(test)[,-14] ytest = test[,14]

**Defining the model**

Next, we'll find out the lambda factor which defines the amount of shrinkage, with the 'glmnet' cross-validation function. We'll run cv.glmnet function with the alpha=1 parameter that defines the Lasso method.

`lasso_cv = cv.glmnet(xtrain, ytrain, family="gaussian", alpha=1)`

We can check the coefficients.

coef(lasso_cv) 14 x 1 sparse Matrix of class "dgCMatrix" 1 (Intercept) 21.7893929324 crim -0.0122937347 zn 0.0098930474 indus . chas 2.5311725502 nox -5.7841042218 rm 3.9974018400 age . dis -0.4873179429 rad . tax -0.0003229633 ptratio -0.8262576443 black 0.0068882072 lstat -0.5161864470

The model can also be plotted.

`plot(lasso_cv)`

best_lambda = lasso_cv$lambda.min cat(best_lambda) 0.01014079

Next, we'll fit the model again by using the best-lambda value.

lasso_mod = glmnet(xtrain, ytrain, family = "gaussian", alpha = 1, lambda = best_lambda)

We'll check the coefficients again.

coef(lasso_mod) 14 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 37.021345010 crim -0.086726277 zn 0.050935275 indus 0.011735446 chas 3.073769966 nox -18.075419693 rm 3.676920080 age 0.004979978 dis -1.446401516 rad 0.270290914 tax -0.011341606 ptratio -0.940283913 black 0.008557049 lstat -0.521339466

**Predicting test data**

Finally, we'll predict the xtest data with the trained model and check the accuracy with MSE, MAE, RMSE, and R-squared metrics.

yhat = predict(lasso_mod, xtest) mse = mean((ytest - yhat) ^ 2) mae = MAE(ytest, yhat) rmse = RMSE(ytest, yhat) r2 = R2(ytest, yhat, form = "traditional") cat(" MAE:", mae, "\n", "MSE:", mse, "\n", "RMSE:", rmse, "\n", "R-squared:", r2) MAE: 3.889155 MSE: 24.04217 RMSE: 4.903281 R-squared: 0.5536334

We can visualize the results in a plot.

x = 1:length(ytest) plot(x, ytest, ylim=c(min(yhat), max(ytest)), pch=20, col="red") lines(x, yhat, lwd="1", col="blue") legend("topleft", legend=c("medv", "pred-medv"), col=c("red", "blue"), lty=1,cex = 0.8, lwd=1, bty='n')

In this tutorial, we've briefly learned how to use the glmnet lasso method to fit and predict the regression data. The full source code is listed below.

**Source code listing**

library(glmnet) library(caret)

` `

```
set.seed(123)
boston = MASS::Boston
indexes = createDataPartition(boston$medv, p=.85, list=F)
train = boston[indexes, ]
test = boston[-indexes, ]
```

` `

```
xtrain = as.matrix(train)[,-14]
ytrain = train[,14]
xtest = as.matrix(test)[,-14]
ytest = test[,14]
```

` `

`lasso_cv = cv.glmnet(xtrain, ytrain, family="gaussian", alpha=1)`

` `

`coef(lasso_cv)`

`plot(lasso_cv) `

` `

```
best_lambda = lasso_cv$lambda.min
cat(best_lambda)
```

```
lasso_mod = glmnet(xtrain, ytrain, family = "gaussian",
alpha = 1, lambda = best_lambda)
```

` `

`coef(lasso_mod) `

` `

```
yhat = predict(lasso_mod, xtest)
mse = mean((ytest - yhat) ^ 2)
mae = MAE(ytest, yhat)
rmse = RMSE(ytest, yhat)
r2 = R2(ytest, yhat, form = "traditional")
cat(" MAE:", mae, "\n", "MSE:", mse, "\n",
"RMSE:", rmse, "\n", "R-squared:", r2)
```

` `

```
x = 1:length(ytest)
plot(x, ytest, ylim=c(min(yhat), max(ytest)), pch=20, col="red")
lines(x, yhat, lwd="1", col="blue")
legend("topleft", legend=c("medv", "pred-medv"),
col=c("red", "blue"), lty=1,cex = 0.8, lwd=1, bty='n')
```

## No comments:

## Post a Comment