#### In this question we analyze the data file **Fertility.csv** which is available in the homework folder, and try to build a model for fertility. In this dataset Fertility, the first column, is the response variable, and the other variables are potential predictors. We will use several different statistical modeling techniques. The data set contains 47 rows (samples), split the data into training and test sets. Set the first 30 rows to training samples and the rows 31 through 47 as the test samples.

In [28]:
import numpy as np
import pandas as pd
from ISLP.models import ModelSpec as MS
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import sklearn.linear_model as skl
from sklearn.linear_model import Ridge, Lasso, LinearRegression, RidgeCV, LassoCV
from sklearn.metrics import mean_squared_error
from l0bnb import fit_path

In [7]:
Data = pd.read_csv('Fertility.csv')
# print(Data)
train = (Data.index < 30)
X = MS(Data.columns.drop(['Fertility'])).fit_transform(Data)
Y = Data['Fertility']
print(X.shape)
y_train, X_train = Y.loc[train] , X.loc[train]
y_test, X_test = Y.loc[~train] , X.loc[~train]

(47, 6)


#### (a) Fit a linear model on the training set, and report the test error (MSE) obtained.

In [8]:
LinearModel = LinearRegression()
LinearModel.fit(X_train, y_train)
y_pred_linear = LinearModel.predict(X_test)
mse_linear = mean_squared_error(y_test, y_pred_linear)
# print(pd.DataFrame({'y':y_test, 'pred':pred_linear}))
print('MSE Linear:', mse_linear)

MSE Linear: 183.72179150160574


#### (b) Fit a Ridge regression model on the training set, with λ chosen by cross-validation on a dense grid similar to the example solved in the class. Report the test error obtained.

In [5]:
lambdas = np.logspace(-12, 12, 100)  # Define a range of lambda values
# print(lambdas)
ridge_model = RidgeCV(alphas=lambdas, store_cv_values=True).fit(X_train, y_train)
y_pred_ridge = ridge_model.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print('Best lambda for Ridge:', ridge_model.alpha_)
print('MSE Ridge: ', mse_ridge)

Best lambda for Ridge: 200.92330025650458
MSE Ridge:  190.8866277059537


In [29]:
X = X.drop('intercept', axis=1)
Xs = X - X.mean(0)[None,:]
X_scale = X.std(0)
Xs = Xs / X_scale[None,:]
lambdas = 10**np.linspace(8, -3, 100) / Y.std()
# print(Xs)
soln_array = skl.ElasticNet.path(Xs,
                                 Y,
                                 l1_ratio=0.,
                                 alphas=lambdas)[1]
soln_path = pd.DataFrame(soln_array.T,
                         columns=Data.columns,
                         index=-np.log(lambdas))
soln_path.index.name = 'negative log(lambda)'
soln_path

KeyError: "['intercept'] not found in axis"

#### (c) Fit a LASSO model on the training set, with λ chosen by cross-validation on a dense grid similar to the example solved in the class. Report the test error obtained, along with the number of non-zero coefficient estimates.

In [None]:
# 5-Fold
lasso_model = LassoCV(alphas=lambdas, cv=5).fit(X_train, y_train)

y_pred_lasso = lasso_model.predict(X_test)

mse_lasso = mean_squared_error(y_test, y_pred_lasso)
# print(lasso_model.coef_)
non_zero_coefficients = np.sum(lasso_model.coef_ != 0)

print('Best lambda LASSO:', lasso_model.alpha_)
print('MSE LASSO:', mse_lasso)
print('How many non-zero coefficient estimates: ', non_zero_coefficients)

In [None]:
# Xs = X - np.mean(X, axis = 0)[None,:]
# Xs = X - np.mean(X, axis=0, keepdims=True)
# kfold = skm.KFold(5, random_state=0, shuffle=True)
# scaler = StandardScaler(with_mean=True,  with_std=True)

# lassoCV = skl.ElasticNetCV(n_alphas=100, l1_ratio=1, cv=kfold)
# pipeCV = Pipeline(steps=[('scaler', scaler), ('lasso', lassoCV)])
# pipeCV.fit(X, Y)
# tuned_lasso = pipeCV.named_steps['lasso']
# print(tuned_lasso.alpha_) #printing the lambda that yields the smallest CV error


# lambdas, soln_array = skl.Lasso.path(Xs, Y, l1_ratio=1, n_alphas=100)[:2]
# soln_path = pd.DataFrame(soln_array.T, columns=X.columns, index=-np.log(lambdas))

#### (d) Compare the results of (a), (b), and (c). Which one seems to outperform the others for this specific setup?

#### In this case, the Linear Model got the lowest MSE compared to Ridge and LASSO. The Ridge model is pretty close but got a slightly higher MSE. The LASSO has 2 non-dropped features and the highest MSE of the three models, which means that too many constraints or regularization led to underfitting.

In [None]:
ridgeCV = skl.ElasticNetCV(alphas=lambdas, 
                           l1_ratio=0,
                           # tol = 0.01,
                           cv=KFold(5, random_state=0, shuffle=True))
pipeCV = Pipeline(steps=[('scaler', scaler),
                         ('ridge', ridgeCV)])
pipeCV.fit(X_train, y_train)
# print('lambda:', ridgeCV.alpha_)
# print(X)
# print('coef:', ridgeCV.coef_)
# print('intercept:', ridgeCV.intercept_)
# print('mse_ridge:', np.mean(ridgeCV.mse_path_))