# Lag Features for Time Series

Lag features turn a time series into a supervised learning table by using past values as predictors.
They are the backbone of autoregressive models, tree-based regressors, and many ML pipelines.

We will build lagged features, visualize the alignment, and see how lags capture dependency structure.


In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

np.random.seed(7)

n = 120
t = np.arange(n)
y = 10 + 0.05 * t + 2 * np.sin(2 * np.pi * t / 24) + np.random.normal(0, 0.6, n)
series = pd.Series(y, index=pd.RangeIndex(n), name="y")

series.head()


## Definition

For lag order $p$, create feature vectors from the past values:

$${\bf x}_t = [y_{t-1}, y_{t-2}, \dots, y_{t-p}]$$

The target is usually $y_t$ (one-step ahead) or $y_{t+h}$ for horizon $h$.
The first $p$ rows have missing lags and are dropped before training.


In [None]:
def make_lags(s, lags):
    df = pd.DataFrame({"y": s})
    for lag in lags:
        df[f"lag_{lag}"] = s.shift(lag)
    return df

lags = [1, 2, 3, 12]
df_lag = make_lags(series, lags)
df_lag.head(10)


## Alignment intuition

A lag feature is simply the same series shifted right. Overlaying $y_t$ and $y_{t-1}$
shows how past values line up with the present.


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=series.index, y=series, mode="lines", name="y_t", line=dict(color="#1f77b4")))
fig.add_trace(go.Scatter(x=series.index, y=df_lag["lag_1"], mode="lines", name="y_{t-1} (shifted)", line=dict(color="#ff7f0e", dash="dash")))
fig.update_layout(title="Lag-1 feature aligned with y_t", xaxis_title="t", yaxis_title="value", height=420)
fig

## Lag scatter (persistence check)

Plotting $y_t$ against $y_{t-1}$ reveals persistence. A tight diagonal cloud implies strong
autoregressive structure.


In [None]:
scatter_df = df_lag.dropna()
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=scatter_df["lag_1"],
    y=scatter_df["y"],
    mode="markers",
    marker=dict(size=6, opacity=0.7, color="#2ca02c")
))
fig.update_layout(title="Lag-1 relationship", xaxis_title="y_{t-1}", yaxis_title="y_t", height=420)
fig

## sktime mapping (where this lives)

sktime provides convenient transformers for lagged features and window summaries:

- `Lag` builds explicit lag columns.
- `WindowSummarizer` creates rolling statistics (mean, std, quantiles, ...).

These are often combined with regression forecasters or reduction strategies.
