## Pages

### Lasso Regression Example with R

LASSO (Least Absolute Shrinkage and Selection Operator) is a regularization method to minimize overfitting in a model. It reduces large coefficients with L1-norm regularization which is the sum of their absolute values. The penalty pushes the coefficients with lower value to be zero, to reduce the model complexity.
In this post, we'll briefly learn how to use Lasso regularization in R. A 'glmnet' package provides regularization functions for Lasso. The tutorial covers:
1. Preparing the data
2. Defining the model
3. Predicting test data
4. Source code listing

```library(glmnet)
library(caret)```

Preparing the data

We'll use Boston housing dataset in this tutorial, we'll load it, split into the train test parts.

```set.seed(123)
boston = MASS::Boston
indexes = createDataPartition(boston\$medv, p=.85, list=F)
train = boston[indexes, ]
test =  boston[-indexes, ]```

You can check the dataset with 'str(boston)' command. Here, the 'medv' variable is y, label column, and remainings are the x, feature data.
Now, we'll separate x and y parts of the train and test data. An x input data should be in matrix type and we'll convert it.

```xtrain = as.matrix(train)[,-14]
ytrain = train[,14]

xtest = as.matrix(test)[,-14]
ytest = test[,14]```

Defining the model

Next, we'll find out the lambda factor which defines the amount of shrinkage, with the 'glmnet' cross-validation function. We'll run cv.glmnet function with the alpha=1 parameter that defines the Lasso method.

`lasso_cv = cv.glmnet(xtrain, ytrain, family="gaussian", alpha=1)`

We can check the coefficients.

```coef(lasso_cv)
14 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 21.7893929324
crim        -0.0122937347
zn           0.0098930474
indus        .
chas         2.5311725502
nox         -5.7841042218
rm           3.9974018400
age          .
dis         -0.4873179429
tax         -0.0003229633
ptratio     -0.8262576443
black        0.0068882072
lstat       -0.5161864470```

The model can also be plotted.

`plot(lasso_cv)`

```best_lambda = lasso_cv\$lambda.min
cat(best_lambda)
0.01014079 ```

Next, we'll fit the model again by using the best-lambda value.

```lasso_mod = glmnet(xtrain, ytrain, family = "gaussian",
alpha = 1, lambda = best_lambda)```

We'll check the coefficients again.

```coef(lasso_mod)
14 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept)  37.021345010
crim         -0.086726277
zn            0.050935275
indus         0.011735446
chas          3.073769966
nox         -18.075419693
rm            3.676920080
age           0.004979978
dis          -1.446401516
tax          -0.011341606
ptratio      -0.940283913
black         0.008557049
lstat        -0.521339466```

Predicting test data

Finally, we'll predict the xtest data with the trained model and check the accuracy with MSE, MAE, RMSE, and R-squared metrics.

```yhat = predict(lasso_mod, xtest)

mse = mean((ytest - yhat) ^ 2)
mae = MAE(ytest, yhat)
rmse = RMSE(ytest, yhat)
r2 = R2(ytest, yhat, form = "traditional")

cat(" MAE:", mae, "\n", "MSE:", mse, "\n",
"RMSE:", rmse, "\n", "R-squared:", r2)
MAE: 3.889155
MSE: 24.04217
RMSE: 4.903281
R-squared: 0.5536334```

We can visualize the results in a plot.

```x = 1:length(ytest)
plot(x, ytest, ylim=c(min(yhat), max(ytest)), pch=20, col="red")
lines(x, yhat, lwd="1", col="blue")
legend("topleft", legend=c("medv", "pred-medv"),
col=c("red", "blue"), lty=1,cex = 0.8, lwd=1, bty='n')```

In this tutorial, we've briefly learned how to use the glmnet lasso method to fit and predict the regression data. The full source code is listed below.

Source code listing

```library(glmnet)
library(caret)```
` `
```set.seed(123)
boston = MASS::Boston
indexes = createDataPartition(boston\$medv, p=.85, list=F)
train = boston[indexes, ]
test =  boston[-indexes, ]```
` `
```xtrain = as.matrix(train)[,-14]
ytrain = train[,14]

xtest = as.matrix(test)[,-14]
ytest = test[,14]```
` `
`lasso_cv = cv.glmnet(xtrain, ytrain, family="gaussian", alpha=1)`
` `
`coef(lasso_cv)`
`plot(lasso_cv) `
` `
```best_lambda = lasso_cv\$lambda.min
cat(best_lambda)
```
```lasso_mod = glmnet(xtrain, ytrain, family = "gaussian",
alpha = 1, lambda = best_lambda)```
` `
`coef(lasso_mod) `
` `
```yhat = predict(lasso_mod, xtest)

mse = mean((ytest - yhat) ^ 2)
mae = MAE(ytest, yhat)
rmse = RMSE(ytest, yhat)
r2 = R2(ytest, yhat, form = "traditional")

cat(" MAE:", mae, "\n", "MSE:", mse, "\n",
"RMSE:", rmse, "\n", "R-squared:", r2)
```
` `
```x = 1:length(ytest)
plot(x, ytest, ylim=c(min(yhat), max(ytest)), pch=20, col="red")
lines(x, yhat, lwd="1", col="blue")
legend("topleft", legend=c("medv", "pred-medv"),
col=c("red", "blue"), lty=1,cex = 0.8, lwd=1, bty='n') ```