# "My Linear Regressions Are Looking Irregular"
> "An introductory guide to regularizing linear regressions."

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [linear regression, regularization]
- hide: false
- search_exclude: true

Adapted from [General Assembly's *Data Science Immersive*](https://generalassemb.ly/education/data-science-immersive/).

## Theory

### Why do we need to regularize our linear regressions? Overfitting.
What is overfitting?
- Overfitting means building a model that matches the training data "too closely."
- The model ends up fitting to noise rather than signal.
- The **bias is too low** and the **variance is too high**.

What can cause overfitting?
- Irrelevant features are included in the model.
- The number of features is close to the number of observations.
- The features are correlated to each other.
- The coefficients of the features are large.
What is regularization and how does it help?
- Regularization helps against overfitting by imposing a penalty that decreases the coefficients of the features.
- Regularization lowers the fit for the training data, but it improves the fit for the test data.

### The two most common types of regularization for linear regressions are *Lasso* and *Ridge*. *Elastic Net* is a mixture of the two.
- Lasso uses the L1 regularization method, Ridge uses L2, and Elastic Net uses both.
- We will be regularizing linear regressions in this post, but other types of regressions can be regularized with L1 and L2 as well.

## Examples

### What kind of situations would call for a regularized linear regression?

- Predicting home prices  based on number of bedrooms upstairs, number of bedrooms downstairs, and total number of bedrooms.
- Predicting a wine's rating based on its fixed acidity, volatile acidity, citric acid, residual sugar, chloride, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and percent alcohol.
- Predicting the ratio of scores for the teams in a basketball game based on the player stats and the team stats.
- Estimating a country's life expectancy based on gross domestic product per person, infant mortality, and region of the world.

## Code
### How to run the Ridge, Lasso, and Elastic Net regressions in Python, using example values.

### *Preparation for Running the Regressions*
- Start by importing the Pandas and NumPy libraries, in order to create DataFrames and arrays, as well as the different types of linear regressions we will use, Ridge, Lasso, and Elastic Net:

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, Lasso, LassoCV, ElasticNet, ElasticNetCV

- Store your training dataset's features in a Pandas DataFrame as X
- Store your training dataset's target variable in an array as y.
- This is the R^2 score from **cross-validation** for a linear regression for the sample dataset:

In [None]:
lr_model = LinearRegression()
cross_val_score(lr_model, X_overfit, y, cv = 5).mean()

> 0.1913277098938003

- The R^2 score is very low. Is the model overfitted, and will regularization improve it?

### *Running a Ridge Regression*
- Ridge regression is **more computationally efficient than Lasso**.
- Ridge regression usually uses larger $\alpha$ (values for the penalty) than the Lasso regression.

Pick a list of $\alpha$-values, instantiate the model, and fit the regression. Save the optimal model and check its R^2 score from cross-validation:

In [None]:
alpha_list = np.logspace(0, 5, 200)
ridge_model = RidgeCV(alphas = alpha_list, store_cv_values = True) # store_cv_values picks best alpha value
ridge_model = ridge_model.fit(X_overfit, y)
ridge_model.alpha_

> 821.434358491943

In [None]:
ridge_opt_model = Ridge(alpha = ridge_model.alpha_)
cross_val_score(ridge_opt_model, X_overfit, y, cv = 5).mean()

> 0.2221069221200395

This R^2 score is an improvement over the linear regression.

### *Running a Lasso Regression*
- If group of predictors are highly correlated, the Lasso regression picks only one of them and shrinks the others to zero.
- The Lasso regression is better with smaller $\alpha$ (values for the penalty) than the Ridge regression.

Pick a list of $\alpha$-values, instantiate the model, and fit the regression. Save the optimal model and check its R^2 score from cross-validation:

In [None]:
l_alpha_list = np.arange(0.001, 0.15, 0.0025)
lasso_model = LassoCV(alphas = l_alpha_list, cv = 5)
lasso_model = lasso_model.fit(X_overfit, y)
lasso_model.alpha_

> 0.011

In [None]:
lasso_model = Lasso(alpha = lasso_model.alpha_)
cross_val_score(lasso_model, X_overfit, y, cv = 5).mean()

> 0.2617417795359766

This R^2 score is an improvement over the Ridge regression.

### *Running an Elastic Net Regression*
- The Elastic Net regression adds a larger penalty than the Ridge and Lasso regressions.
- Elastic Net should only be used if the training accuracy is much higher than the test accuracy or if the independent variables are highly correlated.

Pick a list of $\alpha$-values, instantiate the model, and fit the regression. Save the optimal model and check its R^2 score from cross-validation:

In [None]:
enet_alphas = np.arange(0.5, 1.0, 0.005)
enet_model = ElasticNetCV(alphas = enet_alphas)
enet_model = enet_model.fit(X_overfit, y)
enet_model.alpha_

> 0.5

In [None]:
enet_model = ElasticNet(alpha = enet_model.alpha_)
cross_val_score(enet_model, X_overfit, y, cv = 5).mean()

> 0.07522288223865307

This R^2 score is worse than the linear regression, the Ridge regression, and the Lasso regression, so this model penalized our feature coefficients too much.

### *Conclusions from the Code*
- For this dataset, the best model was Lasso regression. Compare all of the models with ranges of parameters to determine the best-fitting model.
- Don't forget to perform feature engineering before modeling, and to re-optimize your model if you go back and engineer features. Feature engineering will affect the optimal model and its optimal parameters.