# 10. Regularization: Fighting Overfitting

Deep neural networks are powerful, but they can easily **overfit**.
Overfitting means the model memorizes the training data but fails to generalize to new data.

We use **Regularization** to prevent this.
Two common techniques:
1. **Weight Decay (L2 Regularization)**: Penalizes large weights.
2. **Dropout**: Randomly turns off neurons during training.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

torch.manual_seed(42)

## 1. The Problem: Overfitting

Let's create a small dataset and a huge model to force overfitting.

In [None]:
# 1. Data: Noisy Sine Wave
x = torch.unsqueeze(torch.linspace(-1, 1, 20), dim=1)  # Only 20 points!
y = x.pow(3) + 0.3 * torch.rand(x.size())              # Cubic + Noise

# Test data (to check generalization)
x_test = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)
y_test = x_test.pow(3) + 0.3 * torch.rand(x_test.size())

## 2. Model without Regularization

We'll use a large network that can easily memorize these 20 points.

In [None]:
model_overfit = nn.Sequential(
    nn.Linear(1, 200),
    nn.ReLU(),
    nn.Linear(200, 200),
    nn.ReLU(),
    nn.Linear(200, 1)
)

optimizer = optim.Adam(model_overfit.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Train for many epochs
for epoch in range(2000):
    optimizer.zero_grad()
    output = model_overfit(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

## 3. Model WITH Regularization

We'll add:
- **Dropout**: `nn.Dropout(p=0.5)` drops 50% of neurons randomly.
- **Weight Decay**: Added in the optimizer (`weight_decay=1e-4`).

In [None]:
model_reg = nn.Sequential(
    nn.Linear(1, 200),
    nn.Dropout(0.5),  # 50% drop probability
    nn.ReLU(),
    nn.Linear(200, 200),
    nn.Dropout(0.5),
    nn.ReLU(),
    nn.Linear(200, 1)
)

# Weight Decay is L2 Regularization
optimizer_reg = optim.Adam(model_reg.parameters(), lr=0.01, weight_decay=1e-2)

# Train
model_reg.train() # Important for Dropout!
for epoch in range(2000):
    optimizer_reg.zero_grad()
    output = model_reg(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer_reg.step()

## 4. Comparison

Let's see which model generalizes better to the test data.
**Note**: We must call `.eval()` before testing to turn off Dropout!

In [None]:
model_overfit.eval()
model_reg.eval()

pred_overfit = model_overfit(x_test).detach()
pred_reg = model_reg(x_test).detach()

plt.figure(figsize=(10, 5))
plt.scatter(x, y, c='red', s=50, label='Training Data')
plt.plot(x_test, pred_overfit, 'b-', label='Overfitted Model', alpha=0.5)
plt.plot(x_test, pred_reg, 'g-', label='Regularized Model', linewidth=2)
plt.legend()
plt.title("Overfitting vs Regularization")
plt.show()

## 5. Key Takeaways

1. **Overfitting** looks like a squiggly line trying to hit every point.
2. **Regularization** smooths the function.
3. **Dropout**: Use `nn.Dropout()` in your layers.
4. **Weight Decay**: Use `weight_decay` argument in your optimizer.
5. **Modes**: Always switch `model.train()` and `model.eval()` when using Dropout/BatchNorm!