# sktime Model Selection & Time-Series Cross-Validation

Time-series validation must respect **temporal order**. This notebook covers **sliding vs. expanding windows**, visualizes splits, and shows how to tune models with `ForecastingGridSearchCV`.


## Forecasting horizon

For a forecast origin $T$, the **forecasting horizon** is

\[
	ext{fh} = \{T+1, T+2, \ldots, T+h\}
\]

sktime uses a `ForecastingHorizon` object to define these steps explicitly.


In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

from sktime.datasets import load_airline
from sktime.forecasting.model_selection import (
    temporal_train_test_split,
    ForecastingHorizon,
    SlidingWindowSplitter,
    ExpandingWindowSplitter,
)

# Load a classic monthly series
y = load_airline()

# Train/test split + forecasting horizon
y_train, y_test = temporal_train_test_split(y, test_size=24)
fh = ForecastingHorizon(y_test.index, is_relative=False)


## Sliding vs. expanding windows

- **Sliding window** keeps a fixed training length.
- **Expanding window** grows the training set as time advances.

Both avoid leakage, but they answer different questions:
- Sliding: "How does the model perform on *recent* history?"
- Expanding: "How does the model improve with *more data*?"


In [None]:
import plotly.express as px


def plot_cv_splits(y, splitter, max_splits=6, title=""):
    index = y.index
    if hasattr(index, "to_timestamp"):
        index = index.to_timestamp()
    fig = go.Figure()

    for split, (train_idx, test_idx) in enumerate(splitter.split(y)):
        if split >= max_splits:
            break
        fig.add_trace(
            go.Scatter(
                x=index[train_idx],
                y=[split] * len(train_idx),
                mode="markers",
                marker=dict(color="rgba(120,120,120,0.6)", size=6),
                name="train" if split == 0 else None,
                showlegend=split == 0,
            )
        )
        fig.add_trace(
            go.Scatter(
                x=index[test_idx],
                y=[split] * len(test_idx),
                mode="markers",
                marker=dict(color="rgba(255,127,14,0.9)", size=6),
                name="test" if split == 0 else None,
                showlegend=split == 0,
            )
        )

    fig.update_layout(
        title=title,
        xaxis_title="Time",
        yaxis=dict(title="Split #", autorange="reversed"),
        height=320 + 40 * max_splits,
    )
    return fig

fh_steps = [1, 2, 3, 6, 12]
cv_sliding = SlidingWindowSplitter(fh=fh_steps, window_length=60, step_length=12)
cv_expanding = ExpandingWindowSplitter(fh=fh_steps, initial_window=60, step_length=12)

fig = plot_cv_splits(y, cv_sliding, title="Sliding Window CV")
fig.show()


In [None]:
fig = plot_cv_splits(y, cv_expanding, title="Expanding Window CV")
fig.show()


## Hyperparameter tuning with time-aware CV

`sktime` provides `ForecastingGridSearchCV` to tune parameters while **respecting time order**.


In [None]:
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sktime.performance_metrics.forecasting import mean_absolute_error

forecaster = NaiveForecaster()
param_grid = {
    "strategy": ["last", "mean", "drift"],
    "window_length": [3, 6, 12],
}

# Use expanding windows for tuning
cv = ExpandingWindowSplitter(fh=fh_steps, initial_window=60, step_length=12)

gscv = ForecastingGridSearchCV(
    forecaster=forecaster,
    param_grid=param_grid,
    cv=cv,
    scoring=mean_absolute_error,
)

gscv.fit(y_train)

best_forecaster = gscv.best_forecaster_
print("Best params:", gscv.best_params_)


## Pitfalls checklist

- **Leakage**: never use future data to compute features or scalers.
- **Horizon mismatch**: ensure `fh` aligns with how you evaluate.
- **Changing seasonality**: prefer windowed CV when regimes drift.
- **Sparse data**: keep `window_length` large enough to capture seasonality.
