# 📘 Notebook 3: Loss, Optimization, and Geometry
In this notebook, we go beyond the closed-form solution and investigate how optimization works.

We focus on:
- The role of the loss function in learning
- The geometry of least squares
- Gradient Descent as an optimization method
- Visualizing the loss surface and convergence
- Deriving gradients manually, step by step


## 🎯 Step 1: Define the Loss Function Explicitly
We start with the squared loss:
$$ L(\beta) = \frac{1}{2} \| y - X\beta \|^2 $$
The factor $\frac{1}{2}$ is for mathematical convenience when taking derivatives.


## 🔍 Step 2: Visualize the Loss Surface
We fix intercept at zero for simplicity and visualize how the loss changes with slope.

In [None]:
import numpy as np
import plotly.graph_objects as go

# Simulate data
np.random.seed(1)
X = np.linspace(0, 5, 30).reshape(-1, 1)
true_beta = 2.5
y = true_beta * X.flatten() + np.random.normal(0, 1, X.shape[0])

# Try different beta values
betas = np.linspace(0, 5, 100)
losses = []
for b in betas:
    y_pred = X.flatten() * b
    loss = 0.5 * np.mean((y - y_pred)**2)
    losses.append(loss)

fig = go.Figure()
fig.add_trace(go.Scatter(x=betas, y=losses, mode='lines', name='Loss'))
fig.update_layout(title='Loss vs Beta (Slope)', xaxis_title='Beta', yaxis_title='Loss')
fig.show()

## 🧮 Step 3: Derive the Gradient
We compute the gradient of the loss function:
$$ \nabla_\beta L = -X^T(y - X\beta) $$

## 🔁 Step 4: Implement Gradient Descent by Hand

In [None]:
def gradient_descent(X, y, lr=0.01, epochs=100):
    n = X.shape[0]
    X_b = np.c_[np.ones((n, 1)), X]  # Add bias
    beta = np.zeros((2,))
    history = []
    for epoch in range(epochs):
        y_pred = X_b @ beta
        error = y - y_pred
        gradient = -X_b.T @ error / n
        beta -= lr * gradient
        loss = 0.5 * np.mean(error ** 2)
        history.append((beta.copy(), loss))
    return beta, history


## 📈 Step 5: Track Convergence

In [None]:
beta_gd, history = gradient_descent(X, y)
betas_list = [b[0] for b in history]
losses = [b[1] for b in history]

import plotly.express as px
px.line(y=losses, labels={'x':'Epoch', 'y':'Loss'}, title='Loss Convergence During Gradient Descent')