### **Homework: Regularization in Machine Learning**

#### Objective:
The objective of this homework is to understand the concept of **regularization** in machine learning by applying **Ridge Regression** and **Lasso Regression** to a real-world dataset. You will explore how regularization techniques can reduce overfitting and improve the generalization of models.

#### Dataset:
Use the **California Housing Dataset**: `data/california_housing.csv`. The dataset contains housing data for various neighborhoods in California.

#### Dataset Overview:
Key features:
- **MedInc**: Median income in the neighborhood.
- **HouseAge**: Median age of the houses in the neighborhood.
- **AveRooms**: Average number of rooms per household.
- **AveBedrms**: Average number of bedrooms per household.
- **Population**: Total population in the neighborhood.
- **AveOccup**: Average number of people per household.
- **Latitude**: Latitude of the neighborhood.
- **Longitude**: Longitude of the neighborhood.

Target variable:
- **MedHouseVal**: Median house value for the neighborhood (in $100,000s).


#### Steps to Complete:

1. **Load the Dataset**
   - Load the California Housing dataset.
   - Convert the dataset into a pandas DataFrame for easier handling.

2. **Data Exploration**
   - Display the first few rows of the dataset.
   - Check for missing values and handle them if necessary.
   - Explore feature correlations with the target variable (`MedHouseVal`) using a heatmap or scatter plots.

3. **Data Preprocessing**
   - Split the data into training and test sets (80% train, 20% test) using `train_test_split`.
   - Standardize the features using `StandardScaler` to ensure all features are on the same scale.

4. **Baseline Model**
   - Fit a simple **Linear Regression** model to the training data.
   - Evaluate its performance on the test set using:
     - **Mean Squared Error (MSE)**
     - **R-squared (R²)**

5. **Ridge Regression**
   - Apply **Ridge Regression** with different values of the regularization parameter (`alpha`).
   - Use cross-validation to find the optimal `alpha`.
   - Plot the model coefficients as a function of `alpha`.
   - Evaluate the Ridge Regression model and compare it to the baseline model.

6. **Lasso Regression**
   - Apply **Lasso Regression** with different values of `alpha`.
   - Use cross-validation to find the optimal `alpha`.
   - Plot the model coefficients as a function of `alpha`.
   - Evaluate the Lasso Regression model and compare it to the baseline and Ridge models.

7. **Model Comparison**
   - Compare the performance of Linear Regression, Ridge Regression, and Lasso Regression in terms of:
     - MSE
     - R²
     - Number of non-zero coefficients in each model (for Lasso).

8. **Insights**
   - Discuss the impact of regularization on model performance and feature selection.
   - Reflect on how Ridge and Lasso handle overfitting differently.
   - Identify the most important features for predicting house prices based on the Lasso model.

#### Bonus Challenges (Optional):

1. **ElasticNet Regularization**
   - Apply **ElasticNet**, which combines Ridge and Lasso regularization, and tune the `l1_ratio` parameter.
   - Compare its performance with Ridge and Lasso.

2. **Real-World Dataset**
   - Use a different dataset (e.g., a Kaggle dataset) and repeat the analysis to gain more experience.

#### Deliverables:
- A Python script or Jupyter Notebook containing:
  - Data exploration, preprocessing, and model implementation.
  - Visualizations of model coefficients and performance metrics.
  - Comparison of models and discussion of results.
- A brief report discussing:
  - The impact of regularization on model performance.
  - How Ridge and Lasso handle overfitting.
  - Insights into feature importance and model interpretability.

#### Useful Hints:
- Use `GridSearchCV` or `cross_val_score` to tune hyperparameters.
- Regularization parameters (`alpha`) should be explored over a range of values (e.g., logarithmic scale).