## Lab Exercise 2: Learning Basics of Regularization
This exercise provides a comprehensive introduction to regularization, combining theoretical concepts with practical implementation and evaluation.

**Objective**

The goal of this lab exercise is to understand the fundamentals of regularization, a technique used to prevent overfitting in machine learning models. You will learn how to implement and evaluate regularization techniques such as Ridge (L2) and Lasso (L1) regression.

**Prerequisites**
- Basic knowledge of Python programming
- Basic understanding of linear regression
- Familiarity with numpy, pandas, scikit-learn, and matplotlib libraries 

#### Part 1: Understanding the Theory
**Introduction to Regularization**

Regularization techniques are used to add a penalty to the model complexity, helping to prevent overfitting. The most common regularization techniques are Ridge (L2) and Lasso (L1) regression.

**Key Concepts**
- Ridge Regression (L2): Adds the squared magnitude of coefficients as a penalty term to the loss function.
- Lasso Regression (L1): Adds the absolute value of coefficients as a penalty term to the loss function, which can lead to sparse models with some coefficients set to zero.

#### Part 2: Implementing Regularization
*Step 1: Generate Synthetic Data*

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(0)

# Generate synthetic dataset
num_samples = 100
num_features = 10

X = np.random.rand(num_samples, num_features)
true_coefficients = np.random.randn(num_features)
y = X.dot(true_coefficients) + 0.5 * np.random.randn(num_samples)

# Convert to DataFrame for easier manipulation
feature_names = [f'Feature_{i+1}' for i in range(num_features)]
df = pd.DataFrame(X, columns=feature_names)
df['Target'] = y

# Display the first few rows of the dataset
print(df.head())


*Step 2: Split the Data into Training and Test Sets*

In [None]:
from sklearn.model_selection import train_test_split

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(df[feature_names], df['Target'], test_size=0.2, random_state=0)

print("Training set shape:", X_train.shape)
print("Test set shape:", X_test.shape)


#### Part 3: Applying Regularization Techniques
*Step 3: Ridge Regression*

In [None]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# Initialize the Ridge regression model
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# Make predictions and calculate mean squared error
y_pred_ridge = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)

print(f'MSE using Ridge regression: {mse_ridge:.4f}')


*Step 4: Lasso Regression*

In [None]:
from sklearn.linear_model import Lasso

# Initialize the Lasso regression model
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

# Make predictions and calculate mean squared error
y_pred_lasso = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)

print(f'MSE using Lasso regression: {mse_lasso:.4f}')


*Step 5: Compare with Ordinary Least Squares (OLS)*

In [None]:
from sklearn.linear_model import LinearRegression

# Initialize the OLS regression model
ols = LinearRegression()
ols.fit(X_train, y_train)

# Make predictions and calculate mean squared error
y_pred_ols = ols.predict(X_test)
mse_ols = mean_squared_error(y_test, y_pred_ols)

print(f'MSE using OLS regression: {mse_ols:.4f}')


#### Part 4: Evaluating and Visualizing Results
*Step 6: Compare Coefficients*

In [None]:
# Compare the coefficients from each model
coefficients_df = pd.DataFrame({
    'Feature': feature_names,
    'OLS': ols.coef_,
    'Ridge': ridge.coef_,
    'Lasso': lasso.coef_
})

print(coefficients_df)


*Step 7: Plot Coefficients*

In [None]:
# Plot the coefficients
coefficients_df.set_index('Feature').plot(kind='bar', figsize=(12, 8))
plt.title('Comparison of Coefficients')
plt.xlabel('Features')
plt.ylabel('Coefficient Value')
plt.legend(loc='upper right')
plt.grid(True)
plt.show()


*Step 8: Plot Residuals*

In [None]:
# Plot residuals for each model
plt.figure(figsize=(12, 8))
plt.scatter(y_test, y_test - y_pred_ols, label='OLS Residuals')
plt.scatter(y_test, y_test - y_pred_ridge, label='Ridge Residuals')
plt.scatter(y_test, y_test - y_pred_lasso, label='Lasso Residuals')
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Actual Values')
plt.ylabel('Residuals')
plt.title('Residuals Comparison')
plt.legend()
plt.grid(True)
plt.show()


#### Conclusion
In this lab exercise, you learned the basics of regularization by:

1. Generating a synthetic dataset.
2. Applying Ridge (L2) and Lasso (L1) regression.
3. Comparing the performance and coefficients of regularized models with OLS regression.
4. Visualizing the results and residuals.

Feel free to experiment with different regularization parameters and observe how they affect the model performance and coefficients.