# Backtesting & Evaluation

Forecasting models should be evaluated **across time**, not on a single split.
Backtesting simulates rolling-origin forecasts to estimate real-world performance.


## Windowing strategies

- **Expanding window**: train grows over time.
- **Sliding window**: train size fixed, window moves.

Let $w$ be window length, $s$ step size, and $h$ forecast horizon.
Training windows cover $[t-w+1, t]$, with forecasts at $t+1,\dots,t+h$.


In [None]:
import numpy as npimport plotly.graph_objects as gonp.random.seed(33)# Synthetic seriesn = 100t = np.arange(n)y = 5 + 0.03 * t + np.sin(2 * np.pi * t / 20) + np.random.normal(0, 0.4, n)w = 40h = 8step = 10origins = list(range(w - 1, n - h, step))fig = go.Figure()fig.add_trace(go.Scatter(x=t, y=y, mode="lines", name="Series", line=dict(color="#2a3f5f")))for i, origin in enumerate(origins):    train_start = origin - w + 1    train_end = origin    test_start = origin + 1    test_end = origin + h    fig.add_vrect(        x0=train_start,        x1=train_end,        fillcolor="rgba(99,110,250,0.15)",        line_width=0,    )    fig.add_vrect(        x0=test_start,        x1=test_end,        fillcolor="rgba(239,85,59,0.15)",        line_width=0,    )fig.update_layout(    title="Backtesting windows (blue=train, orange=test)",    height=420,)fig

## Metrics (quick intuition)

Common time-series metrics include:
- **MAE**: average absolute error
- **sMAPE**: symmetric percentage error
- **MASE**: scaled by a seasonal naive benchmark


In [None]:
import pandas as pd

# Naive 1-step forecast for illustration
errors = []
y_true_all = []
y_pred_all = []

for origin in origins:
    y_train = y[:origin + 1]
    y_test = y[origin + 1: origin + 1 + h]
    y_pred = np.repeat(y_train[-1], h)
    errors.append(y_test - y_pred)
    y_true_all.append(y_test)
    y_pred_all.append(y_pred)

errors = np.concatenate(errors)
y_true_all = np.concatenate(y_true_all)
y_pred_all = np.concatenate(y_pred_all)

mae = np.mean(np.abs(errors))
smape = np.mean(2 * np.abs(y_pred_all - y_true_all) / (np.abs(y_true_all) + np.abs(y_pred_all)))

# MASE with seasonal period 12 (fallback if short)
seasonal_period = 12
if len(y) > seasonal_period:
    naive_diff = np.abs(y[seasonal_period:] - y[:-seasonal_period]).mean()
    mase = mae / naive_diff
else:
    mase = np.nan

pd.DataFrame({"MAE": [mae], "sMAPE": [smape], "MASE": [mase]})


## sktime mapping (practical pointers)

Key utilities in sktime for evaluation and backtesting include:
- `temporal_train_test_split`
- `SlidingWindowSplitter` / `ExpandingWindowSplitter`
- `ForecastingGridSearchCV` for tuning with time-aware validation

Use these to keep evaluation aligned with temporal order.
