# Unit 3 Ridge Regression

## Lesson Introduction

Hello\! Today, we're going to talk about **Ridge Regression**. Ridge Regression is a special type of linear regression that helps when we have too many features (or variables) in our data. Imagine you have a lot of different ingredients for a recipe but don't know which ones are essential. Ridge Regression helps us decide which ingredients (or features) are important without overloading the recipe.

In this lesson, we'll learn:

  * What Ridge Regression is.
  * How to use Ridge Regression in Python.
  * How to interpret the results.
  * How Ridge Regression compares to regular linear regression.

Ready to dive in? Let's go\!

## What is Ridge Regression?

Ridge Regression is like normal linear regression but with a regularization term added. Why do we need this?

Think about building a sandcastle. If you pile up too much sand without structure, it might collapse. Similarly, in regression, too many variables can make our model too complex and perform poorly on new data. This is known as **overfitting**.

Ridge Regression helps by adding a "penalty" to the equation that keeps the coefficients (weights assigned to each feature) smaller. This penalty term is controlled by a parameter called $\\alpha$.

This penalty works by adding the sum of the squared values of the coefficients to the cost function. In mathematical terms, the Ridge Regression cost function is:

$J(\\theta) = \\sum\_{i=1}^{n}(y\_i - \\hat{y}*i)^2 + \\alpha \\sum*{j=1}^{p}\\theta\_j^2$

Here:

  * $J(\\theta)$ is the cost function, which is a measure of how well the model's predictions match the actual data.
  * $y\_i$ are the actual values.
  * $\\hat{y}\_i$ are the predicted values.
  * $\\theta\_j$ are the coefficients.
  * $\\alpha$ is the regularization parameter.

The term $\\alpha \\sum\_{j=1}^{p}\\theta\_j^2$ is the regularization term which penalizes large coefficients to reduce model complexity and prevent overfitting. The higher the value of $\\alpha$, the stronger the penalty on large coefficients.

## Example of Ridge Regression: Part 1

Let's see Ridge Regression in action using Python and the `Scikit-Learn` library. We'll use a real dataset to demonstrate this.

First, load and split our dataset. We’ll use a diabetes dataset included in `Scikit-Learn`.

```python
import numpy as np
from sklearn.linear_model import Ridge, LinearRegression
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load real dataset
X, y = load_diabetes(return_X_y=True)

# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

Here:

  * We import necessary libraries.
  * Load the diabetes dataset using `load_diabetes()`.
  * Split this dataset into training and testing sets using `train_test_split()`, with 80% for training and 20% for testing.

## Example of Ridge Regression: Part 2

Now, let's train our Ridge Regression model using the training data.

```python
# Train a ridge regression model
ridge_model = Ridge(alpha=0.35)
ridge_model.fit(X_train, y_train)

# Make predictions
y_pred_ridge = ridge_model.predict(X_test)

# Calculate Mean Squared Error
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print(f"Ridge Regression MSE: {mse_ridge}")
# Ridge Regression MSE: 2878.4563201253923
```

Here:

  * We create a Ridge Regression model with $\\alpha$ set to 0.35. This $\\alpha$ value controls the strength of the regularization. Higher values mean stronger regularization.
  * We train (fit) the model using the `fit()` method with our training data (`X_train` and `y_train`).
  * Evaluate the model using Mean Squared Error (MSE).

## Interpreting the Coefficients

Once trained, we can look at the coefficients (weights) and the intercept to understand the model better.

```python
# Print the coefficients
print(f"Coefficients: {ridge_model.coef_}, Intercept: {ridge_model.intercept_}")
# Coefficients: [  44.97986989 -146.87318828  414.52388235  269.57882622  -42.27871117
# -73.50772192 -182.81323752  136.63207571  316.39992559  106.88080884], Intercept: 151.75943045447815
```

Here:

  * We print the coefficients using `ridge_model.coef_` and the intercept using `ridge_model.intercept_`.
  * As with a regular linear regression, coefficients show how much each feature contributes to the final prediction. The intercept is the value when all the features are zero.

## Comparing Performance: Part 1

Ridge Regression is often better than regular linear regression when:

  * **Multicollinearity**: It handles highly correlated features by reducing the variance of coefficient estimates, leading to better generalization.
  * **Overfitting**: It prevents overfitting by adding regularization, improving model performance on new data.
  * **High-Dimensional Data**: It works well when the number of features is high relative to the number of observations, stabilizing coefficient estimates.

Let's compare the performance of the Regular Linear Regression model and the Ridge Regression model using their Mean Squared Error values. For this purposes, we will generate a highly correlated data, where the Ridge Regression is expected to be better:

```python
import pandas as pd
import numpy as np

n_samples = 100
X1 = np.random.rand(n_samples)
X2 = X1 + np.random.normal(0, 0.05, n_samples)  # Higher correlation with smaller noise
X3 = X1 + X2 + np.random.normal(0, 0.05, n_samples)  # Even higher correlation with smaller noise
X4 = X1 + 2*X2 + 0.5*X3 + np.random.normal(0, 0.05, n_samples) 
X5 = X2 + 3*X3 - 0.5*X4 + np.random.normal(0, 0.05, n_samples) 
X = np.vstack([X1, X2, X3, X4, X5]).T

# Step 2: Generate a target variable with more noise
y = 3 * X1 + 5 * X2 + np.random.normal(0, 1.0, n_samples)  # Increased noise in y

# Convert to DataFrame for easier display (optional)
df = pd.DataFrame(X, columns=['X1', 'X2', 'X3', 'X4', 'X5'])
df['y'] = y
```

Features ($x\_2,...,x\_5$) are the linear combinations of other features, which means the data is multicollinear.

## Comparing Performance: Part 2

Now, let's compare the result of the Ridge Regression and the Linear Regression:

```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
mse_lr = mean_squared_error(y_test, y_pred_lr)
print(f"Linear Regression MSE: {mse_lr:.4f}")

# Ridge Regression
ridge = Ridge(alpha=1.5)
ridge.fit(X_train, y_train)
y_pred_ridge = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print(f"Ridge Regression MSE: {mse_ridge:.4f}")
```

Here, we train both `Ridge` and `LinearRegression` models on the generated data and print their MSE scores. Here is the result:

```
Linear Regression MSE: 1.1271
Ridge Regression MSE: 1.0578
```

As you can see, in this case Ridge Regression outperforms the regular linear regression.

## Lesson Summary

In this lesson, we learned about Ridge Regression—a special type of linear regression that helps prevent overfitting by adding a regularization term.

We walked through the steps to:

  * Load and split a dataset.
  * Train a regular linear regression model and a Ridge Regression model in Python using Scikit-Learn.
  * Evaluate both models using Mean Squared Error (MSE).
  * Compare the performance of both models.

Next, we’ll move to the practice section where you'll get hands-on experience implementing Ridge Regression on your own.

## Adjust the Regularization Term for Ridge Regression

Hey there, Stellar Navigator!

Let's refine our Ridge Regression model. Modify the starter code to change the alpha parameter from 1.0 to 10.0 when training the Ridge Regression model. This change will help us understand how varying the regularization strength affects the Mean Squared Error (MSE).

Let's code!

```python
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the diabetes dataset
X, y = load_diabetes(return_X_y=True)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Ridge Regression model with alpha=1.0
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

# Predict and calculate the Mean Squared Error
y_pred = ridge_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print(f"Ridge Regression MSE: {mse}")

```

I've adjusted the alpha parameter in the Ridge model from 1.0 to 10.0 as you requested. You can now run this code to observe the effect of a stronger regularization term on the Mean Squared Error.

```python
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the diabetes dataset
X, y = load_diabetes(return_X_y=True)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Ridge Regression model with alpha=10.0 (adjusted from 1.0)
# The 'alpha' parameter controls the strength of the regularization.
# A higher alpha value means stronger regularization, penalizing larger coefficients more.
ridge_model = Ridge(alpha=10.0)
ridge_model.fit(X_train, y_train)

# Predict and calculate the Mean Squared Error
y_pred = ridge_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print(f"Ridge Regression MSE: {mse}")


```

## Complete the Ridge Regression Model

Alright, Galactic Pioneer! Now it's your turn to shine brighter. Complete the missing pieces in this code to train a Ridge Regression model on the diabetes dataset. Remember, we need to split the dataset and train the model with a regularization strength of alpha=0.5. You've got this!

```python
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# TODO: Load diabetes dataset
data = load_diabetes()
X = data.data
y = data.target

# TODO: Split dataset into train and test sets
X_train, X_test, y_train, y_test = ____

# TODO: Train Ridge Regression model with alpha=0.5
ridge_model = Ridge(____)
ridge_model.fit(____)

# Output model coefficients
print(f"Ridge Regression Coefficients: {ridge_model.coef_}")

```

Alright, Galactic Pioneer! Here's the completed code to train your Ridge Regression model on the diabetes dataset, including splitting the data and setting the regularization strength.

```python
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# TODO: Load diabetes dataset
data = load_diabetes()
X = data.data
y = data.target

# TODO: Split dataset into train and test sets
# Splitting the dataset into 80% training and 20% testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Train Ridge Regression model with alpha=0.5
ridge_model = Ridge(alpha=0.5) # Initialize Ridge model with alpha set to 0.5
ridge_model.fit(X_train, y_train) # Train the model using the training data

# Output model coefficients
print(f"Ridge Regression Coefficients: {ridge_model.coef_}")



```

## Comparing Performance of Linear Regression and Ridge

Great job, Space Explorer!

In this exercise, you'll tune the model by changing from Linear Regression to Ridge Regression. Modify the starter code to replace LinearRegression with Ridge and use an α. α value of 0.05. Afterward, observe how this change affects the Mean Squared Error (MSE) of the model.

Good luck!

```python
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# TODO: Load diabetes dataset
data = load_diabetes()
X = data.data
y = data.target

# TODO: Split dataset into train and test sets
# Splitting the dataset into 80% training and 20% testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Train Ridge Regression model with alpha=0.5
ridge_model = Ridge(alpha=0.5) # Initialize Ridge model with alpha set to 0.5
ridge_model.fit(X_train, y_train) # Train the model using the training data

# Output model coefficients
print(f"Ridge Regression Coefficients: {ridge_model.coef_}")



```


```python
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error # Import mean_squared_error

# Load diabetes dataset
data = load_diabetes()
X = data.data
y = data.target

# Split dataset into train and test sets
# Splitting the dataset into 80% training and 20% testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Ridge Regression model with alpha=0.05
# Changed alpha from 0.5 to 0.05 as requested.
ridge_model = Ridge(alpha=0.05)
ridge_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred_ridge = ridge_model.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)

# Output model coefficients and MSE
print(f"Ridge Regression Coefficients: {ridge_model.coef_}")
print(f"Ridge Regression MSE (alpha=0.05): {mse_ridge:.4f}")

# For comparison, let's also show Linear Regression's MSE (though not explicitly asked, it helps in 'comparing performance')
# from sklearn.linear_model import LinearRegression
# lr_model = LinearRegression()
# lr_model.fit(X_train, y_train)
# y_pred_lr = lr_model.predict(X_test)
# mse_lr = mean_squared_error(y_test, y_pred_lr)
# print(f"Linear Regression MSE: {mse_lr:.4f}")


```
Great job, Space Explorer! Let's tune that model by switching to Ridge Regression with the specified alpha and observe the MSE.

In this updated code, the Ridge model is initialized with alpha=0.05. We then make predictions on the test set and calculate the Mean Squared Error (MSE), which is printed along with the model coefficients. This allows you to directly observe how this specific alpha value impacts the model's performance on unseen data.

## Ridge Regression with Diabetes Dataset

Hey, Space Explorer! Ready for your final challenge on Ridge Regression?

Use the diabetes dataset, split it into training and testing sets, train a Ridge Regression model with alpha=0.5, and print the coefficients, intercept, and the Mean Squared Error (MSE) on the test set.

Let's show those coefficients who's boss!


```python
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# TODO: Load the diabetes dataset and extract features X and target y

# TODO: Split the dataset into training (80%) and testing (20%) sets

# TODO: Train a Ridge Regression model with alpha set to 0.5

# TODO: Print the coefficients of the model using `ridge_model.coef_`

# TODO: Print the intercept of the model using `ridge_model.intercept_`

# TODO: Predict on the test set using the trained model and calculate the MSE

# TODO: Print the Mean Squared Error (MSE)

```

Ridge Regression with Diabetes Dataset
You got this, Space Explorer! Here's the completed code for your final challenge on Ridge Regression, training the model on the diabetes dataset and displaying all the requested metrics.

```python
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# TODO: Load the diabetes dataset and extract features X and target y

# TODO: Split the dataset into training (80%) and testing (20%) sets

# TODO: Train a Ridge Regression model with alpha set to 0.5

# TODO: Print the coefficients of the model using `ridge_model.coef_`

# TODO: Print the intercept of the model using `ridge_model.intercept_`

# TODO: Predict on the test set using the trained model and calculate the MSE

# TODO: Print the Mean Squared Error (MSE)

```