# Model Selection (Best Subset Selection, Stagewise Regression, AIC, BIC) & Regularization (Lasso & Ridge)

## Additional Examples

https://towardsdatascience.com/regulate-your-regression-model-with-ridge-lasso-and-elasticnet-92735e192e34

https://towardsdatascience.com/how-to-do-cross-validation-effectively-1bbeb1d69ee8

https://towardsdatascience.com/two-common-pitfalls-to-avoid-when-doing-cross-validation-c68ed79c0e4e

https://towardsdatascience.com/stopping-stepwise-why-stepwise-selection-is-bad-and-what-you-should-use-instead-90818b3f52df

https://mlu-explain.github.io/cross-validation/

https://mlu-explain.github.io/bias-variance/

https://towardsdatascience.com/simple-stepwise-and-weighted-regression-model-53a31d9e4746

https://towardsdatascience.com/visualizing-sklearn-cross-validation-k-fold-shuffle-split-and-time-series-split-a13221eb5a56

https://towardsdatascience.com/python-libraries-for-interpretable-machine-learning-c476a08ed2c7

https://towardsdatascience.com/explain-machine-learning-models-using-shap-library-e05a1583c34f

https://towardsdatascience.com/crafting-one-pipeline-for-machine-learning-steps-373f03e44e1b

https://medium.com/@ali.soleymani.co/stop-using-grid-search-or-random-search-for-hyperparameter-tuning-c2468a2ff887

https://trainindata.medium.com/recursive-feature-elimination-with-python-59bb27e8396a

https://towardsdatascience.com/how-to-tune-multiple-ml-models-with-gridsearchcv-at-once-9fcebfcc6c23

https://towardsdatascience.com/a-guide-to-find-the-best-boosting-model-using-bayesian-hyperparameter-tuning-but-without-c98b6a1ecac8

https://medium.com/@okanyenigun/cross-validation-techniques-for-machine-learning-a-guide-to-improve-model-performance-8748d46281cc

https://towardsdatascience.com/why-you-should-use-scikit-learn-pipelines-8754b4d1e375

https://towardsdatascience.com/benchmarking-machine-learning-models-with-cross-validation-and-matplotlib-in-python-4957a41149e

https://towardsdatascience.com/k-fold-cross-validation-are-you-doing-it-right-e98cdf3e6690

https://towardsdatascience.com/complete-guide-to-regressional-analysis-using-python-bbe76b3e451f

In [None]:
import sys
from pathlib import Path

# Adjust the path based on the location of your notebook
sys.path.append(str(Path().resolve().parent))

# Second cell: Import the PATH from settings
from utils.settings import PATH

print(PATH)

In [None]:
# Importing packages to develop the model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor as VIF
from scipy import stats
from sklearn import linear_model
# import sklearn.preprocessing as preprocessing
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, StandardScaler, MinMaxScaler

Ridge: L2 penalty.

Lasso: L1 penalty.

Elastic-Net: Combination of L1 and L2 penalty.

It is important to scale the data (e.g., using a StandardScaler) before performing ridge
regression, as it is sensitive to the scale of the input features. This is true of most
regularized models.

The RidgeCV class also performs ridge regression, but it automatically tunes
hyperparameters using cross-validation. It’s roughly equivalent to using GridSearchCV,
but it’s optimized for ridge regression and runs much faster. Several other estimators
(mostly linear) also have efficient CV variants, such as LassoCV and ElasticNetCV.

In [None]:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=0.1, solver="cholesky")
ridge_reg.fit(X, y)

sgd_reg = SGDRegressor(penalty="l2", alpha=0.1 / m, tol=None, max_iter=1000, eta0=0.01, random_state=42)
sgd_reg.fit(X, y.ravel()) # y.ravel() because fit() expects 1D targets

from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X, y)

from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5) # When r = 0 elastic net is equivalent to ridge regression, and when r = 1, it is equivalent to lasso regression
elastic_net.fit(X, y)

LogisticRegression(C=1.0, penalty='l2',tol=0.01)

ridge = Ridge(normalize=True)
second_order=PolynomialFeatures(degree=2, interaction_only=False)
ridge.fit(second_order.fit_transform(X), y)
lm.fit(second_order.fit_transform(X), y)


In [None]:
from sklearn.grid_search import GridSearchCV
edges = np.histogram(y, bins=5)[1]
binning = np.digitize(y, edges)
stratified_cv_iterator = StratifiedKFold(binning, n_folds=10,shuffle=True, random_state=101)
search = GridSearchCV(param_grid={'alpha':np.logspace(-4,2,7)},
estimator=ridge, scoring ='mean_squared_error',
n_jobs=1, refit=True, cv=stratified_cv_iterator)
search.fit(second_order.fit_transform(X), y)
print ('Best alpha: %0.5f' % search.best_params_['alpha'])
print ('Best CV mean squared error: %0.3f' % np.abs(search.best_score_))

In [None]:
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=1.0, normalize=True, max_iter=10**5)
#The following comment shows an example of L1 logistic regression
#lr_l1 = LogisticRegression(C=1.0, penalty='l1', tol=0.01)

In [None]:
from sklearn.grid_search import RandomizedSearchCV
from scipy.stats import expon
np.random.seed(101)
search_func=RandomizedSearchCV(estimator=lasso, n_jobs=1, iid=False, refit=True, n_iter=15,
param_distributions={'alpha':np.logspace(-5,2,100)},
scoring='mean_squared_error', cv=stratified_cv_iterator)
search_func.fit(second_order.fit_transform(X), y)
print ('Best alpha: %0.5f' % search_func.best_params_['alpha'])
print ('Best CV mean squared error: %0.3f' % np.abs(search_func.best_score_))

In [None]:
from sklearn.linear_model import ElasticNet
elasticnet = ElasticNet(alpha=1.0, l1_ratio=0.15, normalize=True, max_iter=10**6, random_state=101)
from sklearn.grid_search import RandomizedSearchCV
from scipy.stats import expon
np.random.seed(101)
search_func=RandomizedSearchCV(estimator=elasticnet, param_distributions={'alpha':np.logspace(-5,2,100), 'l1_ratio':np.arange(0.0, 1.01, 0.05)}, n_iter=30, scoring='mean_squared_error', n_jobs=1, iid=False, refit=True, cv=stratified_cv_iterator)
search_func.fit(second_order.fit_transform(X), y)
print ('Best alpha: %0.5f' % search_func.best_params_['alpha'])
print ('Best l1_ratio: %0.5f' % search_func.best_params_['l1_ratio'])
print ('Best CV mean squared error: %0.3f' % \
np.abs(search_func.best_score_))