<a href="https://colab.research.google.com/github/yadavrishikesh/Deep-Learning-Slides-Code/blob/main/code/DL_OptimizationAlgorithm_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise: Training Linear Regression with Custom Loss Functions

In this exercise, you will train a **linear regression model** on a synthetic dataset and experiment with **different loss functions**.  

The goal is to understand how the **choice of loss function affects the training process** and the resulting model.

## 1. Dataset

We generate a simple 2D dataset where:

- $y = 3 x_1 - 2 x_2 + noise$
- Students will try to fit a linear model $\hat{y} = w_0 + w_1 x_1 + w_2 x_2$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)

# Generate synthetic data
n_samples = 200
X1 = np.random.rand(n_samples,1)*10
X2 = np.random.rand(n_samples,1)*5
noise = np.random.randn(n_samples,1)

y = 3*X1 - 2*X2 + noise
X = np.hstack([np.ones((n_samples,1)), X1, X2])  # Add bias term

# Plot the data (projection)
plt.figure(figsize=(6,6))
plt.scatter(X1, y, label='y vs x1', alpha=0.5)
plt.scatter(X2, y, label='y vs x2', alpha=0.5)
plt.xlabel('Feature')
plt.ylabel('Target y')
plt.title('Synthetic Linear Regression Data')
plt.legend()
plt.show()

## 2. Define Linear Regression Model

The predicted output is:

$$
\hat{y} = w_0 + w_1 x_1 + w_2 x_2
$$

where $w_0$ is the bias, and $w_1, w_2$ are the weights.

## 3. Custom Loss Function

Instead of always using **Mean Squared Error (MSE)**, we can define a **generalized loss function**:

$$
\mathcal{L}(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n} \ell(y_i - \hat{y}_i)
$$

Where \(\ell(\cdot)\) is a **loss function**. Examples:

1. **MSE (standard)**: \(\ell(e) = e^2\)  
2. **MAE (Mean Absolute Error)**: \(\ell(e) = |e|\)  
3. **Huber Loss**:

$$
\ell_\delta(e) =
\begin{cases}
\frac{1}{2} e^2 & \text{if } |e| \leq \delta \\
\delta (|e| - \frac{1}{2} \delta) & \text{otherwise}
\end{cases}
$$

> **Task for students**: implement MSE first, then MAE or Huber, and observe how the convergence and weights change.


## 4. Gradient Computation

For linear regression, the **gradient of the loss w.r.t. weights** is:

- **MSE**:

$$
\frac{\partial \mathcal{L}}{\partial w} = -\frac{2}{n} X^T (y - \hat{y})
$$

- **MAE**:

$$
\frac{\partial \mathcal{L}}{\partial w} = -\frac{1}{n} X^T \text{sign}(y - \hat{y})
$$

> Students can implement the gradient for any custom loss function they choose.


## 5. Optimization Algorithm

Students will implement one or more of:

- Gradient Descent (full-batch)
- Stochastic Gradient Descent
- Adam optimizer  

The training steps:

1. Initialize weights $w = [0, 0, 0]$
2. Compute predicted outputs $\hat{y} = X w$
3. Compute loss using the chosen loss function
4. Compute gradient
5. Update weights: $w -= lr * gradient$
6. Record loss at each epoch
7. Repeat for `epochs`


## 6. Visualization Tasks

After training:

1. **Plot convergence**: loss vs epochs for different loss functions or optimizers.
2. **Compare final weights** with ground truth $[w_0, w_1, w_2] = [0, 3, -2]$.
3. **Plot predicted vs true values**: scatter plot of `$y$` vs `\hat{y}`.
4. Optionally, **plot decision plane** in 3D using `matplotlib` for visual inspection.


## 7. Advanced Extension

- Try **Huber Loss** and observe its effect on **robustness to outliers**.
- Compare **GD vs SGD vs Adam** for the same loss function.
- Record **execution time** for each optimizer like in previous exercises.
- Discuss which combination of **loss + optimizer** converges faster or is more stable.
