**Homework - Lecture 4**

---

*Instructions:* Please complete the following project and submit your solution by the next class session.

---

- Compare the performance of OLS, Ridge, and Lasso regression on the [California housing dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html). 

- Use the same train-test split and evaluation metric for all three models. 

- Use cross-validation to select the best hyperparameters.

- Use appropriate visualizations (e.g., coefficient paths, residual plots) to analyze the results.

- Discuss the results and explain any differences in performance you observe.

- Ensure features are properly standardized and explain why this is necessary for Ridge and Lasso regression.

- Use [Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html).

- Use markdown to write explanations, equations, and interpretations.

---



In [5]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import pandas as pd
from sklearn.preprocessing import StandardScaler, add_dummy_feature
from sklearn.linear_model import LinearRegression, Ridge, Lasso, RidgeCV, LassoCV
from sklearn.model_selection import KFold
from utils.mse import cv_mse
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline


sns.set_theme("talk")
sns.set_style("whitegrid")
mpl.rcParams["figure.figsize"] = (8, 6)

In [None]:
# Fetch California housing dataset
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True, return_X_y=True) 
X = data[0]
y = data[1]


In [None]:
# OLS regression
X_reshaped = X.values.reshape(-1, X.shape[1])
ols = LinearRegression()
ols.fit(X_reshaped, y)
beta_0 = ols.intercept_
beta = ols.coef_
print("OLS coefficients:", beta_0, beta)

OLS coefficients: -36.94192020718454 [ 4.36693293e-01  9.43577803e-03 -1.07322041e-01  6.45065694e-01
 -3.97638942e-06 -3.78654265e-03 -4.21314378e-01 -4.34513755e-01]


In [24]:
# Ridge regression
alpha = 3
ridge_reg = Ridge(alpha=alpha)
ridge_reg.fit(X_reshaped, y)
print( "Ridge Regression Coefficients:", ridge_reg.coef_, "with alpha =", alpha)

Ridge Regression Coefficients: [ 4.36397546e-01  9.44062581e-03 -1.06756032e-01  6.42065510e-01
 -3.95826399e-06 -3.78599390e-03 -4.21268628e-01 -4.34426196e-01] with alpha = 3


In [25]:
# Lasso regression
alpha = 0.1
lasso_reg = Lasso(alpha=alpha)
lasso_reg.fit(X_reshaped, y)
print( "Lasso Regression Coefficients:", lasso_reg.coef_, "with alpha =", alpha)

Lasso Regression Coefficients: [ 3.90582557e-01  1.50821512e-02 -0.00000000e+00  0.00000000e+00
  1.75019561e-05 -3.32253135e-03 -1.14214430e-01 -9.92250689e-02] with alpha = 0.1
