# EDS232 Lab 2: Regularized Regression and Energy Modeling


## Overview
In this lab, you will explore Ridge Regression and Lasso Regression, two common techniques for regularized regression. Both methods add penalties to the standard regression coefficients, which can help prevent overfitting and improve model interpretability.

## Key Concepts

**Types of Regularization:**

- Ridge Regression:

    - Adds an L2 penalty ($\lambda \sum_{j=1}^p \beta_j^2 $) to the regression loss function.
    - Shrinks coefficients toward zero but does not set any coefficients exactly to zero.
    - Best suited for reducing multicollinearity and improving model stability.

- Lasso Regression:

    - Adds an L1 penalty ($\lambda \sum_{j=1}^p |\beta_j|$) to the regression loss function.
    - Can shrink some coefficients to exactly zero, effectively performing feature selection.
    - Useful when you want a simpler, interpretable model.

**Cross-Validation**

- Cross-validation is a method for model evaluation and hyperparameter tuning.  In this lab we will be tuning $\lambda$, which controls the penalty terms.

- Note: Sci-kit learn refers to $\lambda$, the regularization penalty hyperparameter, as "alpha" instead of "lambda" 


## About the data

Buildings account for a significant portion of global energy consumption. Effective energy modeling and analysis are critical for designing energy-efficient buildings, reducing greenhouse gas emissions, and meeting sustainability goals. 

- This dataset contains energy performance data for 768 different building configurations, generated through simulation using Ecotect software. These configurations vary across multiple architectural features:

| Feature                     | Description                                                                                   |
|-----------------------------|-----------------------------------------------------------------------------------------------|
| Relative Compactness        | A measure of how compact the building's shape is.                                             |
| Surface Area                | The total external surface area of the building.                                              |
| Wall Area                   | The total area of the walls.                                                                  |
| Roof Area                   | The area of the roof.                                                                         |
| Overall Height              | The height of the building.                                                                   |
| Orientation                 | The direction the building faces (categorical, encoded numerically).                         |
| Glazing Area                | The percentage of the building’s facade that is glass.                                        |
| Glazing Area Distribution   | The distribution of the glazing area (e.g., evenly distributed or concentrated on one side). |
| Heating Load (Y1)           | Continuous target variable representing heating energy requirements. (This is the one we'll use for today's lab.) |
| Cooling Load (Y2)           | Continuous target variable representing cooling energy requirements. (We won't be using this one.) |


In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import RidgeCV
from sklearn.linear_model import LassoCV
from ucimlrepo import fetch_ucirepo 


### Step 1: Fetch  dataset 
Grab the data from the UC Irvine Machine Learning Repository using fetch_ucirepo.  Note that the object returned contains both tabular data as well as metadata.  We'll need to add column names to the tabular data.

In [4]:
# Fetch dataset
energy_efficiency = fetch_ucirepo(id=242)

#Specify column names as a list of names from the metadata
feature_names = ["Relative Compactness",
    "Surface Area",
    "Wall Area",
    "Roof Area",
    "Overall Height",
    "Orientation",
    "Glazing Area",
    "Glazing Area Distribution",]

#Add feature_names as column names in energy_efficiency.data
energy_efficiency.data['features'].columns = feature_names

# Rename the 'targets' columns and assign Heating Load target to heating_load
energy_efficiency.data['targets'].columns = ['Heating Load', 'Cooling Load']

heating_load = energy_efficiency.data['targets']['Heating Load']

type(energy_efficiency.data)

ucimlrepo.dotdict.dotdict

### Step 2: Preprocess the data
Split the data into training and test sets **using a 70/30 split and a random state value of 42**. Then scale the data using StandardScaler.

In [5]:

# Assign predictor and outcome variables 
X = energy_efficiency.data.features
Y = heating_load

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

# Standardize the predictors
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

### Step 3: Initial model fits
We'll start by fitting a simple ordinary least squared regression model for comparison purposes.  Then we'll fit an initial ridge regression model. Let's start by choosing 10 as an arbitrary value for alpha. Recall that we can access coefficients from a model instance with `model_name.coef_`. 


In [6]:
# Create OLS instance and fit it
ols = LinearRegression()
ols.fit(X_train_scaled, y_train)

# Define a fixed alpha (lambda)
alpha_fixed = 10

# Create Ridge regression instance and fit it
ridge = Ridge(alpha = alpha_fixed)
ridge.fit(X_train_scaled, y_train)

# Check coefficients of the OLS and ridge models
print("OLS Coefficients:", ols.coef_)
print("Ridge Coefficients:", ridge.coef_)


OLS Coefficients: [-6.68692391 -3.70147238  0.76114173 -3.99242943  7.21681807 -0.02606662
  2.6700316   0.35604219]
Ridge Coefficients: [-2.47753304 -1.43512945  1.71819798 -2.24012431  6.98517033 -0.02697343
  2.64233342  0.3579727 ]


Is the penalty term doing its job?  How can you tell?

Yes, the penalty term is doing its job because it is reducing our negative coefficients.

### Step 4: Evaluate ridge model
Now let's use our ridge model to make predictions and evaluate it using MSE. **Be sure to print your results.**

In [8]:
# Predictions using ridge model
y_train_pred = ridge.predict(X_train_scaled)
y_test_pred = ridge.predict(X_test_scaled)

# Evaluate MSE
mse_train = mean_squared_error(y_train, y_train_pred)
mse_test = mean_squared_error(y_test, y_test_pred)

print(f"Train MSE (alpha={alpha_fixed}): {mse_train:.4f}")
print(f"Test MSE (alpha={alpha_fixed}): {mse_test:.4f}")

Train MSE (alpha=10): 8.7248
Test MSE (alpha=10): 9.9576


Having these MSE values for our training and test data are what we expect to see. Having a slightly higher MSE for our test data is okay. 

### Step 5: Cross-validation with ridge
Now let's move to a more sophisticated version of a ridge regression model using cross-validation using the RidgeCV class.  Let's try models with 0.1, 1.0, and 10.0 as values of alpha. Of the alpha parameter values we try, the model will save the that yielded the best MSE in the `_cv.alpha_` attribute.

In [10]:
# Define the three alpha values
alphas = [0.1, 1.0, 10.0]

# Fit RidgeCV
ridge_cv = RidgeCV(alphas=alphas, cv=10).fit(X_train_scaled, y_train)

# Print best alpha
print(f"Best alpha {ridge_cv.alpha_}")

print(f"Best alpha {ridge_cv.alphas}")

# Evaluate model with the best alpha
y_test_pred_cv =ridge_cv.predict(X_test_scaled)
mse_test_cv = mean_squared_error(y_test, y_test_pred_cv)
print(f"Test MSE with best alpha:{mse_test_cv:.4f}")


Best alpha 1.0
Best alpha [0.1, 1.0, 10.0]
Test MSE with best alpha:9.5469


### Step 6: Fit ridge models to a large range of alpha values 
Now we are going to dig a little deeper to see alpha's effect on the model coefficients. Let's define a larger range of alphas to try on a log scale with np.logspace(). Let's use 100 values ranging between -4 and 4.  Then using a for loop, we'll fit a Ridge model for each of those values of alpha, appending each model's coefficients (`coefficients.append()`) as we go.

In [23]:
# Define alphas for ridge regression.  Initialize an empty list to hold our coefficients
alphas = np.logspace(-4, 4, 100)  # Alphas from 0.0001 to 10,000
coefficients = []

# Fit a ridge model for each alpha and collect coefficients
for alpha in alphas:
    ridge = Ridge(alpha=alpha)
    ridge.fit(X_train_scaled, y_train)
    coefficients.append(ridge.coef_)


print(coefficients)

[array([-6.68674872, -3.7013526 ,  0.76117038, -3.99232615,  7.21686693,
       -0.0260662 ,  2.67003207,  0.35604239]), array([-6.6867129 , -3.70132811,  0.76117624, -3.99230503,  7.21687692,
       -0.02606611,  2.67003216,  0.35604243]), array([-6.68666975, -3.70129861,  0.7611833 , -3.99227959,  7.21688895,
       -0.02606601,  2.67003228,  0.35604248]), array([-6.68661778, -3.70126308,  0.7611918 , -3.99224895,  7.21690345,
       -0.02606589,  2.67003242,  0.35604254]), array([-6.68655518, -3.70122028,  0.76120203, -3.99221205,  7.2169209 ,
       -0.02606574,  2.67003259,  0.35604261]), array([-6.68647978, -3.70116873,  0.76121437, -3.9921676 ,  7.21694193,
       -0.02606556,  2.67003279,  0.3560427 ]), array([-6.68638897, -3.70110663,  0.76122922, -3.99211406,  7.21696725,
       -0.02606534,  2.67003303,  0.3560428 ]), array([-6.68627959, -3.70103185,  0.7612471 , -3.99204958,  7.21699775,
       -0.02606508,  2.67003333,  0.35604292]), array([-6.68614784, -3.70094178,  0.761

AttributeError: 'list' object has no attribute 'unique'

### Step 7: Plot ridge coefficients vs. lambda

In [13]:

# Plot coefficients vs. alpha. Another for loop may be in order

plt.scatter()

AttributeError: 'numpy.ndarray' object has no attribute 'plt'

### Step 8: Run a full cross-validated ridge model using RidgeCV

In [None]:
# Fit a RidgeCV with cross-validation
ridge_cv = 


# Print the optimal alpha and coefficients corresponding to that alpha


# Evaluate the model on the test set by predicting..
y_test_pred = 

# ... and printing the MSE
mse = 
print(mse)

### Step 9: Fit lasso regression using LassoCV

In [None]:
# Fit lasso regression with cross-validation
lasso_cv = 

# Print the optimal alpha and associated coefficients



### Step 10: The "one-standard error" rule 
We can use 1-SE rule to trade off a small amount of accuracy for increased model simplicity and interpretability.  To do so, find the highest penalty value that falls within 1 standard error of the minimum MSE.  Hopefully it will reduce the number of features in the model.

In [None]:
# Think: what is this code doing?
lasso_best_alpha_1se = lasso_cv.alphas_[np.where(
    lasso_cv.mse_path_.mean(axis=1) <= (lasso_cv.mse_path_.mean(axis=1).min() + lasso_cv.mse_path_.std(axis=1).mean())
)[0][0]]

# Fit a Lasso with this new alpha
lasso_1se_model = 

# Identify remaining features for 1-SE rule (those with non-zero coefficients)
remaining_features_1se = 
print(remaining_features_1se)

### Step 11: Compare the performance of your ridge and lasso regression models.


*Your answer here.*