# Machine Learning Forecasting — Overview

## Purpose
- Reframe forecasting as supervised learning with features.
- Use tabular models on lagged and engineered features.
- Scale to many series with global models.

## Key questions this section answers
- Which lags and rolling features matter?
- How do we avoid leakage in validation?
- Which models handle nonlinearity best?

## Topics
- Lag/rolling features, calendar features
- Tree-based models and linear baselines
- Multi-step forecasting strategies
- Feature importance and explainability
- Cross-validation for time series

## References
- scikit-learn, LightGBM, XGBoost; sktime pipelines


In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

rng = np.random.default_rng(4)

n = 240
t = np.arange(n)
y = 0.05 * t + 2 * np.sin(2 * np.pi * t / 24) + rng.normal(0, 0.5, n)

series = pd.Series(y)

df = pd.DataFrame({"y": series})
for lag in range(1, 4):
    df[f"lag_{lag}"] = df["y"].shift(lag)

df = df.dropna()
X = df[["lag_1", "lag_2", "lag_3"]].values
y_target = df["y"].values

X_design = np.c_[np.ones(len(X)), X]
coef, *_ = np.linalg.lstsq(X_design, y_target, rcond=None)
pred = X_design @ coef

fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=y_target, name="actual"))
fig.add_trace(go.Scatter(x=df.index, y=pred, name="linear regression"))
fig.update_layout(title="Lag-feature regression forecast", xaxis_title="t", yaxis_title="value")
fig.show()
