# Lasso Regression

Lasso Regression, which stands for Least Absolute Shrinkage and Selection Operator, is a type of linear regression that uses shrinkage. Shrinkage here means that the data values are shrunk towards a central point, like the mean. The lasso technique encourages simple, sparse models (i.e., models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

**Key Features of Lasso Regression:**

1. **Regularization Term:** The key characteristic of Lasso Regression is that it adds an L1 penalty to the regression model, which is the absolute value of the magnitude of the coefficients. The cost function for Lasso regression is:


where 
 is the regularization parameter.

2. **Feature Selection:** One of the advantages of lasso regression over ridge regression is that it can result in sparse models with few coefficients; some coefficients can become exactly zero and be eliminated from the model. This property is called automatic feature selection and is a form of embedded method.

3. **Parameter Tuning:** The strength of the L1 penalty is determined by a parameter, typically denoted as alpha or lambda. Selecting a good value for this parameter is crucial and is typically done using cross-validation.

4. **Bias-Variance Tradeoff:** Similar to ridge regression, lasso also manages the bias-variance tradeoff in model training. Increasing the regularization strength increases bias but decreases variance, potentially leading to better generalization on unseen data.

5. **Scaling:** Before applying lasso, it is recommended to scale/normalize the data as lasso is sensitive to the scale of input features.

## Implementation in Scikit-Learn:

Lasso regression can be implemented using the Lasso class from Scikit-Learn's linear_model module. Here's a basic example:

In [11]:
from sklearn.linear_model import Lasso
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate some regression data
x, y = make_regression(n_samples=100, n_features=2, noise=0.2, random_state=42)

#split the data into training and test sets
x_trian, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# create lasso regression objec
lasso = Lasso(alpha=1.0)

#fit the model
lasso.fit(x_trian, y_train)

#predict the model
y_pred = lasso.predict(x_test)

# Evaluate the model
print("Mean Squared error: ", mean_squared_error(y_test, y_pred))


Mean Squared error:  1.910991935005195


In [12]:
from sklearn.model_selection import GridSearchCV
import numpy as np
# from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error

# #Generate some regressoin data
# x,y = make_regression(n_samples=100, n_features=3, noise=1.0, random_state=42)

# create a lasso regression object
lasso = Lasso()

#create a dictionary for the gridserch key and vlaue
param_grid = {'alpha': np.arange(1, 10, 0.1)}

# Use grid serach cv to find the best value for alpha
lass_grid = GridSearchCV(lasso, param_grid=param_grid, cv=10)

# fit the model
lass_grid.fit(x, y)

# print the best model
print("Tuned lasso regression parameter:{} ".format(lass_grid.best_params_))
print("Best score is :{} ".format(lass_grid.best_score_))



Tuned lasso regression parameter:{'alpha': 1.0} 
Best score is :0.9997662220479695 


In [10]:
# Fine tune alpha value using cv
from sklearn.model_selection import GridSearchCV
import numpy as np

# Create a Lasso regression object
lasso = Lasso()

# Create a dictionary for the grid search key and values
param_grid = {'alpha': np.arange(1, 10, 0.1)}

# Use grid search to find the best value for alpha
lasso_cv = GridSearchCV(lasso, param_grid, cv=10)

# Fit the model
lasso_cv.fit(x, y)

# Print the tuned parameters and score
print("Tuned Lasso Regression Parameters: {}".format(lasso_cv.best_params_))
print("Best score is {}".format(lasso_cv.best_score_))

# # Create a Ridge regression object
# ridge = Ridge()

# # Create a dictionary for the grid search key and values
# param_grid = {'alpha': np.arange(1, 10, 0.1)}

# # Use grid search to find the best value for alpha
# ridge_cv = GridSearchCV(ridge, param_grid, cv=10)

# # Fit the model
# ridge_cv.fit(x, y)

# # Print the tuned parameters and score
# print("Tuned Ridge Regression Parameters: {}".format(ridge_cv.best_params_))
# print("Best score is {}".format(ridge_cv.best_score_))

Tuned Lasso Regression Parameters: {'alpha': 1.0}
Best score is 0.9997662220479695
