# sktime Model Selection & Time-Series Cross-Validation

Time-series validation must respect **temporal order**. This notebook covers **sliding vs. expanding windows**, visualizes splits, and shows how to tune models with `ForecastingGridSearchCV`.


## Forecasting horizon

For a forecast origin $T$, the **forecasting horizon** is

\[
	ext{fh} = \{T+1, T+2, \ldots, T+h\}
\]

sktime uses a `ForecastingHorizon` object to define these steps explicitly.


## Enhanced Mathematical Foundation

### Cross-Validation Window Strategies

**Sliding Window CV**: For each fold $k$, train on a fixed-length window and test on the subsequent horizon:

$$\text{Train}^{(k)}: [t_k - w, t_k), \quad \text{Test}^{(k)}: [t_k, t_k + h)$$

where $w$ is the window size and $h$ is the forecast horizon.

**Expanding Window CV**: Training set grows as we advance through time:

$$\text{Train}^{(k)}: [0, t_k), \quad \text{Test}^{(k)}: [t_k, t_k + h)$$

### Cross-Validation Error Estimation

The **cross-validation error** aggregates performance across $K$ folds:

$$\bar{e} = \frac{1}{K}\sum_{k=1}^{K} L(\hat{y}^{(k)}, y^{(k)})$$

where $L$ is the loss function (e.g., MSE, MAE).

**Variance of CV estimate** (assuming independence):

$$\text{Var}(\bar{e}) = \frac{1}{K}\sigma^2_e$$

In practice, time-series folds are **dependent**, so variance is often underestimated.

### Information Criteria for Model Selection

**Akaike Information Criterion (AIC)**:

$$\text{AIC} = 2k - 2\ln(\hat{L})$$

**Bayesian Information Criterion (BIC)**:

$$\text{BIC} = k\ln(n) - 2\ln(\hat{L})$$

where $k$ = number of parameters, $n$ = sample size, $\hat{L}$ = maximized likelihood.

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

from sktime.datasets import load_airline
from sktime.forecasting.model_selection import (
    temporal_train_test_split,
    ForecastingHorizon,
    SlidingWindowSplitter,
    ExpandingWindowSplitter,
)

# Load a classic monthly series
y = load_airline()

# Train/test split + forecasting horizon
y_train, y_test = temporal_train_test_split(y, test_size=24)
fh = ForecastingHorizon(y_test.index, is_relative=False)


## Sliding vs. expanding windows

- **Sliding window** keeps a fixed training length.
- **Expanding window** grows the training set as time advances.

Both avoid leakage, but they answer different questions:
- Sliding: "How does the model perform on *recent* history?"
- Expanding: "How does the model improve with *more data*?"


In [None]:
import plotly.express as px


def plot_cv_splits(y, splitter, max_splits=6, title=""):
    index = y.index
    if hasattr(index, "to_timestamp"):
        index = index.to_timestamp()
    fig = go.Figure()

    for split, (train_idx, test_idx) in enumerate(splitter.split(y)):
        if split >= max_splits:
            break
        fig.add_trace(
            go.Scatter(
                x=index[train_idx],
                y=[split] * len(train_idx),
                mode="markers",
                marker=dict(color="rgba(120,120,120,0.6)", size=6),
                name="train" if split == 0 else None,
                showlegend=split == 0,
            )
        )
        fig.add_trace(
            go.Scatter(
                x=index[test_idx],
                y=[split] * len(test_idx),
                mode="markers",
                marker=dict(color="rgba(255,127,14,0.9)", size=6),
                name="test" if split == 0 else None,
                showlegend=split == 0,
            )
        )

    fig.update_layout(
        title=title,
        xaxis_title="Time",
        yaxis=dict(title="Split #", autorange="reversed"),
        height=320 + 40 * max_splits,
    )
    return fig

fh_steps = [1, 2, 3, 6, 12]
cv_sliding = SlidingWindowSplitter(fh=fh_steps, window_length=60, step_length=12)
cv_expanding = ExpandingWindowSplitter(fh=fh_steps, initial_window=60, step_length=12)

fig = plot_cv_splits(y, cv_sliding, title="Sliding Window CV")
fig

In [None]:
fig = plot_cv_splits(y, cv_expanding, title="Expanding Window CV")
fig

## Hyperparameter tuning with time-aware CV

`sktime` provides `ForecastingGridSearchCV` to tune parameters while **respecting time order**.


In [None]:
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.performance_metrics.forecasting import mean_absolute_error

forecaster = NaiveForecaster()
param_grid = {
    "strategy": ["last", "mean", "drift"],
    "window_length": [3, 6, 12],
}

# Use expanding windows for tuning
cv = ExpandingWindowSplitter(fh=fh_steps, initial_window=60, step_length=12)

gscv = ForecastingGridSearchCV(
    forecaster=forecaster,
    param_grid=param_grid,
    cv=cv,
    scoring=mean_absolute_error,
)

gscv.fit(y_train)

best_forecaster = gscv.best_forecaster_
print("Best params:", gscv.best_params_)


## Pitfalls checklist

- **Leakage**: never use future data to compute features or scalers.
- **Horizon mismatch**: ensure `fh` aligns with how you evaluate.
- **Changing seasonality**: prefer windowed CV when regimes drift.
- **Sparse data**: keep `window_length` large enough to capture seasonality.


In [None]:
def plot_model_comparison(
    model_errors: dict, 
    confidence: float = 0.95,
    title: str = "Model Comparison (Cross-Validation)",
) -> go.Figure:
    """
    Bar chart comparing models with error bars (confidence intervals).
    
    Parameters
    ----------
    model_errors : dict
        Dictionary mapping model names to error arrays.
    confidence : float
        Confidence level for error bars.
    title : str
        Plot title.
        
    Returns
    -------
    fig : go.Figure
    """
    model_names = list(model_errors.keys())
    means = []
    ci_lowers = []
    ci_uppers = []
    
    for name, errors in model_errors.items():
        stats = compute_cv_statistics(errors, confidence)
        means.append(stats["mean"])
        ci_lowers.append(stats["mean"] - stats["ci_lower"])
        ci_uppers.append(stats["ci_upper"] - stats["mean"])
    
    # Color scale based on mean (lower is better)
    colors = ["#2ecc71" if m == min(means) else "#3498db" for m in means]
    
    fig = go.Figure()
    
    fig.add_trace(
        go.Bar(
            x=model_names,
            y=means,
            error_y=dict(
                type="data",
                symmetric=False,
                array=ci_uppers,
                arrayminus=ci_lowers,
                color="black",
                thickness=1.5,
                width=4,
            ),
            marker_color=colors,
            text=[f"{m:.2f}" for m in means],
            textposition="outside",
        )
    )
    
    fig.update_layout(
        title=dict(text=title, font=dict(size=16)),
        xaxis=dict(title="Model", tickangle=0),
        yaxis=dict(title="Mean Absolute Error", gridcolor="lightgray"),
        height=400,
        template="plotly_white",
        showlegend=False,
    )
    
    # Add annotation for best model
    best_model = model_names[means.index(min(means))]
    fig.add_annotation(
        x=best_model,
        y=min(means),
        text="★ Best",
        showarrow=True,
        arrowhead=2,
        arrowcolor="#27ae60",
        font=dict(color="#27ae60", size=12),
        yshift=30,
    )
    
    return fig


# Compare all forecasters
fig_compare = plot_model_comparison(
    all_errors,
    title="Naive Forecaster Comparison (95% CI)",
)
fig_compare.show()

# Print detailed statistics
print("\nDetailed Cross-Validation Statistics:")
print("-" * 60)
for name, errors in all_errors.items():
    stats = compute_cv_statistics(errors)
    print(f"{name:15s} | MAE: {stats['mean']:6.2f} ± {stats['std']:5.2f} | "
          f"95% CI: [{stats['ci_lower']:6.2f}, {stats['ci_upper']:6.2f}]")

### Model Comparison with Error Bars

In [None]:
def plot_error_distribution(errors: np.ndarray, title: str = "CV Error Distribution") -> go.Figure:
    """
    Plot histogram and box plot of CV errors.
    
    Parameters
    ----------
    errors : np.ndarray
        Array of error values from each fold.
    title : str
        Plot title.
        
    Returns
    -------
    fig : go.Figure
    """
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=("Error Histogram", "Error Box Plot"),
        column_widths=[0.6, 0.4],
    )
    
    # Histogram
    fig.add_trace(
        go.Histogram(
            x=errors,
            nbinsx=max(5, len(errors) // 2),
            marker_color="steelblue",
            opacity=0.7,
            name="Errors",
        ),
        row=1, col=1,
    )
    
    # Mean line
    mean_err = np.mean(errors)
    fig.add_vline(
        x=mean_err,
        line_dash="dash",
        line_color="red",
        annotation_text=f"Mean: {mean_err:.2f}",
        row=1, col=1,
    )
    
    # Box plot
    fig.add_trace(
        go.Box(
            y=errors,
            marker_color="darkorange",
            name="CV Errors",
            boxmean="sd",
        ),
        row=1, col=2,
    )
    
    fig.update_layout(
        title=dict(text=title, font=dict(size=16)),
        showlegend=False,
        height=350,
        template="plotly_white",
    )
    fig.update_xaxes(title_text="MAE", row=1, col=1)
    fig.update_yaxes(title_text="Frequency", row=1, col=1)
    fig.update_yaxes(title_text="MAE", row=1, col=2)
    
    return fig


# Collect errors for multiple forecasters
forecasters = {
    "Naive Last": naive_last_forecaster,
    "Naive Mean": naive_mean_forecaster,
    "Naive Drift": naive_drift_forecaster,
}

all_errors = {}
for name, forecaster_fn in forecasters.items():
    cv_gen = sliding_window_cv(len(y_synth), window_size=60, horizon=12, step=6)
    all_errors[name] = cross_validate_forecaster(y_synth, cv_gen, forecaster_fn)

# Plot error distribution for drift forecaster
fig_dist = plot_error_distribution(all_errors["Naive Drift"], title="Naive Drift - CV Error Distribution")
fig_dist.show()

### Error Distribution Across Folds

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots


def plot_cv_splits_numpy(
    n: int,
    cv_generator_fn: Callable,
    max_splits: int = 8,
    title: str = "Cross-Validation Splits",
) -> go.Figure:
    """
    Visualize CV splits as train/test windows using Plotly.
    
    Parameters
    ----------
    n : int
        Total number of observations.
    cv_generator_fn : Callable
        Function that returns a CV split generator.
    max_splits : int
        Maximum number of splits to display.
    title : str
        Plot title.
        
    Returns
    -------
    fig : go.Figure
        Plotly figure object.
    """
    fig = go.Figure()
    
    splits_shown = 0
    for fold, (train_idx, test_idx) in enumerate(cv_generator_fn()):
        if splits_shown >= max_splits:
            break
        
        # Train markers
        fig.add_trace(
            go.Scatter(
                x=train_idx,
                y=[fold] * len(train_idx),
                mode="markers",
                marker=dict(color="steelblue", size=6, symbol="square"),
                name="Train" if fold == 0 else None,
                showlegend=fold == 0,
                legendgroup="train",
            )
        )
        
        # Test markers
        fig.add_trace(
            go.Scatter(
                x=test_idx,
                y=[fold] * len(test_idx),
                mode="markers",
                marker=dict(color="darkorange", size=8, symbol="circle"),
                name="Test" if fold == 0 else None,
                showlegend=fold == 0,
                legendgroup="test",
            )
        )
        splits_shown += 1
    
    fig.update_layout(
        title=dict(text=title, font=dict(size=16)),
        xaxis=dict(title="Time Index", gridcolor="lightgray"),
        yaxis=dict(
            title="Fold #",
            tickmode="linear",
            dtick=1,
            autorange="reversed",
            gridcolor="lightgray",
        ),
        height=300 + 35 * max_splits,
        template="plotly_white",
        legend=dict(orientation="h", yanchor="bottom", y=1.02),
    )
    return fig


# Sliding window visualization
fig_sliding = plot_cv_splits_numpy(
    n=120,
    cv_generator_fn=lambda: sliding_window_cv(120, window_size=60, horizon=12, step=12),
    title="Sliding Window CV (NumPy Implementation)",
)
fig_sliding.show()

# Expanding window visualization  
fig_expanding = plot_cv_splits_numpy(
    n=120,
    cv_generator_fn=lambda: expanding_window_cv(120, min_train=60, horizon=12, step=12),
    title="Expanding Window CV (NumPy Implementation)",
)
fig_expanding.show()

### Plotly Visualization: CV Split Diagram

In [None]:
def cross_validate_forecaster(
    y: np.ndarray,
    cv_splits: Generator,
    forecaster_fn: Callable[[np.ndarray], Callable[[int], np.ndarray]],
    metric_fn: Callable[[np.ndarray, np.ndarray], float] = None,
) -> np.ndarray:
    """
    Evaluate a forecaster across CV splits.
    
    Parameters
    ----------
    y : np.ndarray
        Time series values (1D array).
    cv_splits : Generator
        Generator yielding (train_idx, test_idx) tuples.
    forecaster_fn : Callable
        Function that takes training data and returns a forecast function.
        The forecast function takes horizon length and returns predictions.
    metric_fn : Callable, optional
        Error metric function(y_true, y_pred) -> float.
        Defaults to Mean Absolute Error.
        
    Returns
    -------
    errors : np.ndarray
        Array of error values for each fold.
    """
    if metric_fn is None:
        metric_fn = lambda y_true, y_pred: np.mean(np.abs(y_true - y_pred))
    
    errors = []
    for train_idx, test_idx in cv_splits:
        y_train = y[train_idx]
        y_test = y[test_idx]
        
        # Fit forecaster and predict
        predict_fn = forecaster_fn(y_train)
        y_pred = predict_fn(len(test_idx))
        
        # Compute error
        error = metric_fn(y_test, y_pred)
        errors.append(error)
    
    return np.array(errors)


def compute_cv_statistics(
    errors: np.ndarray, confidence: float = 0.95
) -> dict:
    """
    Compute summary statistics for cross-validation errors.
    
    Parameters
    ----------
    errors : np.ndarray
        Array of error values from each CV fold.
    confidence : float
        Confidence level for interval (default 0.95).
        
    Returns
    -------
    stats : dict
        Dictionary with mean, std, se, ci_lower, ci_upper.
    """
    n = len(errors)
    mean = np.mean(errors)
    std = np.std(errors, ddof=1)
    se = std / np.sqrt(n)
    
    # t-distribution for small samples
    t_crit = stats.t.ppf((1 + confidence) / 2, df=n - 1)
    ci_lower = mean - t_crit * se
    ci_upper = mean + t_crit * se
    
    return {
        "mean": mean,
        "std": std,
        "se": se,
        "ci_lower": ci_lower,
        "ci_upper": ci_upper,
        "n_folds": n,
    }


# Define simple forecasters (NumPy only)
def naive_last_forecaster(y_train: np.ndarray) -> Callable[[int], np.ndarray]:
    """Naive forecaster: predict last observed value."""
    last_value = y_train[-1]
    return lambda h: np.full(h, last_value)


def naive_mean_forecaster(y_train: np.ndarray) -> Callable[[int], np.ndarray]:
    """Mean forecaster: predict mean of training data."""
    mean_value = np.mean(y_train)
    return lambda h: np.full(h, mean_value)


def naive_drift_forecaster(y_train: np.ndarray) -> Callable[[int], np.ndarray]:
    """Drift forecaster: extend linear trend from first to last observation."""
    n = len(y_train)
    slope = (y_train[-1] - y_train[0]) / (n - 1)
    last_value = y_train[-1]
    return lambda h: last_value + slope * np.arange(1, h + 1)


# Example: Cross-validate on synthetic data
np.random.seed(42)
y_synth = 100 + 0.5 * np.arange(120) + 10 * np.sin(np.arange(120) * 2 * np.pi / 12) + np.random.randn(120) * 5

# Evaluate naive_last forecaster
cv_gen = sliding_window_cv(len(y_synth), window_size=60, horizon=12, step=12)
errors = cross_validate_forecaster(y_synth, cv_gen, naive_last_forecaster)

cv_stats = compute_cv_statistics(errors)
print("Cross-Validation Statistics (Naive Last):")
print(f"  Mean MAE:  {cv_stats['mean']:.3f}")
print(f"  Std:       {cv_stats['std']:.3f}")
print(f"  95% CI:    [{cv_stats['ci_lower']:.3f}, {cv_stats['ci_upper']:.3f}]")
print(f"  # Folds:   {cv_stats['n_folds']}")

In [None]:
from typing import Generator, Tuple, Callable
import numpy as np
import scipy.stats as stats


def sliding_window_cv(
    n: int, window_size: int, horizon: int, step: int = 1
) -> Generator[Tuple[np.ndarray, np.ndarray], None, None]:
    """
    Generate sliding window cross-validation splits.
    
    Train window is fixed-length, slides forward by `step` each fold.
    
    Parameters
    ----------
    n : int
        Total number of observations.
    window_size : int
        Fixed training window length.
    horizon : int
        Forecast horizon (test set length).
    step : int
        Step size between consecutive folds.
        
    Yields
    ------
    train_idx, test_idx : Tuple[np.ndarray, np.ndarray]
        Index arrays for train and test sets.
    """
    start = window_size
    while start + horizon <= n:
        train_idx = np.arange(start - window_size, start)
        test_idx = np.arange(start, start + horizon)
        yield train_idx, test_idx
        start += step


def expanding_window_cv(
    n: int, min_train: int, horizon: int, step: int = 1
) -> Generator[Tuple[np.ndarray, np.ndarray], None, None]:
    """
    Generate expanding window cross-validation splits.
    
    Training window starts at index 0 and grows each fold.
    
    Parameters
    ----------
    n : int
        Total number of observations.
    min_train : int
        Minimum initial training set size.
    horizon : int
        Forecast horizon (test set length).
    step : int
        Step size between consecutive folds.
        
    Yields
    ------
    train_idx, test_idx : Tuple[np.ndarray, np.ndarray]
        Index arrays for train and test sets.
    """
    start = min_train
    while start + horizon <= n:
        train_idx = np.arange(0, start)
        test_idx = np.arange(start, start + horizon)
        yield train_idx, test_idx
        start += step


# Example usage
n_obs = 120
print("Sliding Window CV splits (first 5):")
for i, (tr, te) in enumerate(sliding_window_cv(n_obs, window_size=60, horizon=12, step=12)):
    if i >= 5:
        break
    print(f"  Fold {i+1}: train [{tr[0]:3d}, {tr[-1]:3d}], test [{te[0]:3d}, {te[-1]:3d}]")

print("\nExpanding Window CV splits (first 5):")
for i, (tr, te) in enumerate(expanding_window_cv(n_obs, min_train=60, horizon=12, step=12)):
    if i >= 5:
        break
    print(f"  Fold {i+1}: train [{tr[0]:3d}, {tr[-1]:3d}], test [{te[0]:3d}, {te[-1]:3d}]")

---

## Low-Level NumPy Implementation

Pure NumPy generators for time-series cross-validation splits, without external dependencies.