# When Simple Beats Complex: Classical Forecasting Methods

*Part 4 of 8: Moving Averages & Exponential Smoothing*

David's LSTM: 18% MAPE, 3 months of work, GPU clusters.  
Elena's exponential smoothing: 12% MAPE, 5 minutes, laptop.

Today: When (and why) simple methods win.

![Complexity Ladder](forecasting_complexity_ladder.png)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, Holt, ExponentialSmoothing
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (16, 8)
np.random.seed(42)

## The Philosophy

**Start simple. Add complexity only when simple fails.**

1. Naive (baseline)
2. Moving Average
3. Exponential Smoothing
4. Holt's Method (+trend)
5. Holt-Winters (+seasonality)
6. ARIMA (Part 5)
7. Deep Learning (Part 8)

In [None]:
# Generate sample data
n = 200
trend = np.linspace(100, 120, n)
noise = np.random.normal(0, 5, n)
data = trend + noise + np.cumsum(np.random.normal(0, 2, n))

dates = pd.date_range('2023-01-01', periods=n, freq='D')
df = pd.DataFrame({'value': data}, index=dates)

# Split
train_size = int(0.8 * len(df))
train, test = df[:train_size], df[train_size:]

print(f"Train: {len(train)}, Test: {len(test)}")

## Method 1: Naive Forecast

**Formula**: ŷ(t+1) = y(t)

Your baseline. Every method must beat this.

In [None]:
# Naive forecast
naive_forecast = [train['value'].iloc[-1]] * len(test)

mae = mean_absolute_error(test['value'], naive_forecast)
mape = mean_absolute_percentage_error(test['value'], naive_forecast)

print(f"Naive Forecast - MAPE: {mape*100:.2f}%")

plt.figure(figsize=(16, 6))
plt.plot(train.index, train['value'], label='Train', linewidth=2)
plt.plot(test.index, test['value'], label='Actual', linewidth=2)
plt.plot(test.index, naive_forecast, label='Naive', linewidth=2, linestyle='--')
plt.axvline(x=train.index[-1], color='red', linestyle=':', alpha=0.5)
plt.title('Naive Forecast', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Method 2: Simple Moving Average

**Formula**: ŷ(t+1) = (1/N) × Σ y(t-i)

Smooths noise by averaging.

In [None]:
def simple_moving_average(data, window):
    return data[-window:].mean()

# Try different windows
windows = [3, 7, 14]
for window in windows:
    forecasts = []
    for i in range(len(test)):
        history = pd.concat([train['value'], test['value'].iloc[:i]])
        forecast = simple_moving_average(history, window)
        forecasts.append(forecast)
    
    mape = mean_absolute_percentage_error(test['value'], forecasts)
    print(f"SMA({window}): MAPE = {mape*100:.2f}%")

## Method 3: Exponential Smoothing

![Alpha Effect](alpha_parameter_effect.png)

**Formula**: ŷ(t+1) = α·y(t) + (1-α)·ŷ(t)

Recent data matters more.

In [None]:
# Simple Exponential Smoothing
ses_model = SimpleExpSmoothing(train['value'])
ses_fit = ses_model.fit(optimized=True)
ses_forecast = ses_fit.forecast(steps=len(test))

mape_ses = mean_absolute_percentage_error(test['value'], ses_forecast)
print(f"\nSES: MAPE = {mape_ses*100:.2f}%")
print(f"Optimized α = {ses_fit.params['smoothing_level']:.3f}")

plt.figure(figsize=(16, 6))
plt.plot(train.index, train['value'], label='Train', linewidth=2)
plt.plot(test.index, test['value'], label='Actual', linewidth=2)
plt.plot(test.index, ses_forecast, label='SES', linewidth=2, linestyle='--')
plt.axvline(x=train.index[-1], color='red', linestyle=':', alpha=0.5)
plt.title(f'Simple Exponential Smoothing - MAPE: {mape_ses*100:.2f}%', 
         fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Method 4: Holt's Method (Double ES)

Adds **trend** component.

**Use when**: Data has trend but no seasonality.

In [None]:
# Generate trending data
n = 200
trend_data = np.linspace(100, 200, n) + np.random.normal(0, 10, n)
df_trend = pd.DataFrame({'value': trend_data}, 
                        index=pd.date_range('2023-01-01', periods=n, freq='D'))

train_trend = df_trend[:int(0.8*len(df_trend))]
test_trend = df_trend[int(0.8*len(df_trend)):]

# Holt's method
holt_model = Holt(train_trend['value'])
holt_fit = holt_model.fit(optimized=True)
holt_forecast = holt_fit.forecast(steps=len(test_trend))

mape_holt = mean_absolute_percentage_error(test_trend['value'], holt_forecast)
print(f"\nHolt's Method: MAPE = {mape_holt*100:.2f}%")
print(f"α (level) = {holt_fit.params['smoothing_level']:.3f}")
print(f"β (trend) = {holt_fit.params['smoothing_trend']:.3f}")

plt.figure(figsize=(16, 6))
plt.plot(train_trend.index, train_trend['value'], label='Train', linewidth=2)
plt.plot(test_trend.index, test_trend['value'], label='Actual', linewidth=2)
plt.plot(test_trend.index, holt_forecast, label="Holt's", linewidth=2, linestyle='--')
plt.title(f"Holt's Method - MAPE: {mape_holt*100:.2f}%", fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Method 5: Holt-Winters (Triple ES)

Handles **level + trend + seasonality**.

Elena's secret weapon.

In [None]:
# Generate seasonal data
n = 365 * 2
time = np.arange(n)
trend = 100 + 0.1 * time
seasonal = 30 * np.sin(2 * np.pi * time / 365)
noise = np.random.normal(0, 5, n)
seasonal_data = trend + seasonal + noise

df_seasonal = pd.DataFrame({'value': seasonal_data},
                           index=pd.date_range('2022-01-01', periods=n, freq='D'))

train_seasonal = df_seasonal[:int(0.8*len(df_seasonal))]
test_seasonal = df_seasonal[int(0.8*len(df_seasonal)):]

# Holt-Winters
hw_model = ExponentialSmoothing(
    train_seasonal['value'],
    seasonal_periods=365,
    trend='add',
    seasonal='add'
)
hw_fit = hw_model.fit(optimized=True)
hw_forecast = hw_fit.forecast(steps=len(test_seasonal))

mape_hw = mean_absolute_percentage_error(test_seasonal['value'], hw_forecast)
print(f"\nHolt-Winters: MAPE = {mape_hw*100:.2f}%")

plt.figure(figsize=(16, 6))
plt.plot(train_seasonal.index, train_seasonal['value'], 
        label='Train', linewidth=1.5, alpha=0.7)
plt.plot(test_seasonal.index, test_seasonal['value'], 
        label='Actual', linewidth=2)
plt.plot(test_seasonal.index, hw_forecast, 
        label='Holt-Winters', linewidth=2, linestyle='--')
plt.title(f'Holt-Winters - MAPE: {mape_hw*100:.2f}%', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Complete Pipeline

![Decision Tree](classical_methods_decision_tree.png)

Elena's winning strategy:

In [None]:
def forecast_pipeline(train, test, seasonal_period=None):
    results = {}
    
    # 1. Naive
    naive = [train.iloc[-1]] * len(test)
    results['Naive'] = mean_absolute_percentage_error(test, naive)
    
    # 2. SES
    ses = SimpleExpSmoothing(train).fit(optimized=True)
    results['SES'] = mean_absolute_percentage_error(test, ses.forecast(len(test)))
    
    # 3. Holt
    holt = Holt(train).fit(optimized=True)
    results['Holt'] = mean_absolute_percentage_error(test, holt.forecast(len(test)))
    
    # 4. Holt-Winters (if seasonal)
    if seasonal_period:
        hw = ExponentialSmoothing(train, seasonal_periods=seasonal_period,
                                  trend='add', seasonal='add').fit(optimized=True)
        results['HW'] = mean_absolute_percentage_error(test, hw.forecast(len(test)))
    
    # Results
    print("\nMethod Comparison:")
    for name, mape in sorted(results.items(), key=lambda x: x[1]):
        print(f"{name:12s}: {mape*100:6.2f}%")
    
    return results

# Run pipeline
results = forecast_pipeline(train_seasonal['value'], test_seasonal['value'], 365)

## Key Takeaways

1. **Start simple** - Naive is your baseline
2. **Add complexity incrementally** - Don't skip steps
3. **Classical often wins** - Especially with limited data
4. **Understand parameters**:
   - α: Level smoothing (0.8-0.9 for responsive)
   - β: Trend smoothing (0.1-0.2 typical)
   - γ: Seasonal smoothing (auto-optimize)
5. **Check residuals** - They tell you what's missing

## When Classical Beats Deep Learning

- Limited data (<1000 points)
- Clear patterns (trend/seasonality)
- Need interpretability
- Real-time requirements
- Standard hardware

## What's Next

**Part 5: ARIMA Models** - Mathematical rigor for complex patterns.

The series so far:
1. Fundamentals ✓
2. Decomposition ✓
3. Stationarity ✓
4. Classical Methods ✓
5. ARIMA → Next!