# Global and Hierarchical Forecasting

Global models learn shared patterns across many related series, while hierarchical forecasting
enforces coherence across aggregation levels (store -> region -> total). This notebook builds
a fully reproducible example with Plotly visuals and numpy-based baselines.

## What you will learn
- The difference between local and global forecasting models.
- How hierarchical aggregation and reconciliation work.
- How to evaluate bottom-up vs top-down forecasts.


## Problem setup

Suppose we have series $y_{i,t}$ for series $i$ and time $t$.

- **Local model**: fit a separate model per series.
  $$\hat{y}_{i,t+h} = f_i(y_{i,1:t}, x_{i,1:t}).$$
- **Global model**: fit a shared model across all series, allowing shared patterns to transfer.
  $$\hat{y}_{i,t+h} = f(y_{i,1:t}, x_{i,1:t}; \theta).$$
- **Hierarchical forecasting**: predictions must be **coherent** across aggregation levels.
  For example, store-level forecasts must sum to regional and total forecasts.


In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

rng = np.random.default_rng(42)


## Simulate hierarchical retail data

We create two regions with multiple stores. Each store shares a global trend and seasonal
pattern, plus region and store-level effects.


In [None]:
regions = {
    "North": ["N-1", "N-2"],
    "South": ["S-1", "S-2", "S-3"],
}

periods = 72
period_index = pd.period_range("2018-01", periods=periods, freq="M")
t = np.arange(periods)

global_trend = 0.15 * t
global_season = 3.0 * np.sin(2 * np.pi * t / 12)

records = []
for region, stores in regions.items():
    region_bias = rng.normal(2.0, 0.4)
    for store in stores:
        store_bias = rng.normal(0.0, 0.8)
        noise = rng.normal(0, 0.8, size=periods)
        y = 20 + region_bias + store_bias + global_trend + global_season + noise
        for idx, period in enumerate(period_index):
            records.append({
                "region": region,
                "store": store,
                "period": period,
                "t": idx,
                "y": y[idx],
            })

df = pd.DataFrame.from_records(records)
df["series_id"] = df["region"] + "/" + df["store"]
df["timestamp"] = df["period"].dt.to_timestamp()
df.head()


In [None]:
tree_df = df.groupby(["region", "store"], as_index=False)["y"].mean()
fig = px.sunburst(tree_df, path=["region", "store"], values="y",
                 title="Hierarchy (average series level)")
fig.show()


In [None]:
fig = go.Figure()
for series_id in sorted(df["series_id"].unique()):
    s = df[df["series_id"] == series_id]
    fig.add_trace(go.Scatter(
        x=s["timestamp"],
        y=s["y"],
        mode="lines",
        name=series_id
    ))
fig.update_layout(title="Store-level series", xaxis_title="time", yaxis_title="y", height=420)
fig.show()


## Local vs global baselines

We fit simple linear + seasonal models. Local models fit each series independently;
global models share the same slope and seasonality across all series while allowing
series-specific intercepts.


In [None]:
def make_features(t_values, include_intercept=True):
    t_values = np.asarray(t_values)
    sin_term = np.sin(2 * np.pi * t_values / 12)
    cos_term = np.cos(2 * np.pi * t_values / 12)
    cols = []
    if include_intercept:
        cols.append(np.ones_like(t_values))
    cols.extend([t_values, sin_term, cos_term])
    return np.column_stack(cols)

def fit_linear(X, y):
    beta, *_ = np.linalg.lstsq(X, y, rcond=None)
    return beta

def mae(y_true, y_pred):
    return float(np.mean(np.abs(y_true - y_pred)))

horizon = 12
t_train = t[:-horizon]
t_test = t[-horizon:]

series_ids = sorted(df["series_id"].unique())


In [None]:
# Local models
local_preds = {}
local_mae = {}

for series_id in series_ids:
    s = df[df["series_id"] == series_id].sort_values("t")
    y = s["y"].values
    y_train, y_test = y[:-horizon], y[-horizon:]

    X_train = make_features(t_train, include_intercept=True)
    X_test = make_features(t_test, include_intercept=True)
    beta = fit_linear(X_train, y_train)
    y_pred = X_test @ beta

    local_preds[series_id] = y_pred
    local_mae[series_id] = mae(y_test, y_pred)

local_mae


In [None]:
# Global model with shared slope/seasonality and series-specific intercepts
n_series = len(series_ids)
X_blocks = []
y_blocks = []

for idx, series_id in enumerate(series_ids):
    s = df[df["series_id"] == series_id].sort_values("t")
    y = s["y"].values
    y_train = y[:-horizon]
    X_shared = make_features(t_train, include_intercept=False)

    intercept_cols = np.zeros((len(t_train), n_series))
    intercept_cols[:, idx] = 1.0
    X = np.hstack([intercept_cols, X_shared])

    X_blocks.append(X)
    y_blocks.append(y_train)

X_train_all = np.vstack(X_blocks)
y_train_all = np.concatenate(y_blocks)

beta_global = fit_linear(X_train_all, y_train_all)
beta_intercepts = beta_global[:n_series]
beta_shared = beta_global[n_series:]

global_preds = {}
global_mae = {}

for idx, series_id in enumerate(series_ids):
    s = df[df["series_id"] == series_id].sort_values("t")
    y = s["y"].values
    y_test = y[-horizon:]

    X_test_shared = make_features(t_test, include_intercept=False)
    y_pred = beta_intercepts[idx] + X_test_shared @ beta_shared

    global_preds[series_id] = y_pred
    global_mae[series_id] = mae(y_test, y_pred)

global_mae


In [None]:
score_df = pd.DataFrame({
    "series_id": series_ids,
    "local_mae": [local_mae[sid] for sid in series_ids],
    "global_mae": [global_mae[sid] for sid in series_ids],
})
score_df


In [None]:
score_long = score_df.melt(id_vars="series_id", var_name="model", value_name="mae")
fig = px.bar(score_long, x="series_id", y="mae", color="model",
             barmode="group", title="Local vs global MAE (lower is better)")
fig.show()


In [None]:
example_id = series_ids[0]
s = df[df["series_id"] == example_id].sort_values("t")
y = s["y"].values
y_train, y_test = y[:-horizon], y[-horizon:]

train_idx = s["timestamp"].iloc[:-horizon]
test_idx = s["timestamp"].iloc[-horizon:]

fig = go.Figure()
fig.add_trace(go.Scatter(x=train_idx, y=y_train, mode="lines", name="Train"))
fig.add_trace(go.Scatter(x=test_idx, y=y_test, mode="lines", name="Test", line=dict(color="black")))
fig.add_trace(go.Scatter(x=test_idx, y=local_preds[example_id], mode="lines", name="Local pred"))
fig.add_trace(go.Scatter(x=test_idx, y=global_preds[example_id], mode="lines", name="Global pred"))
fig.update_layout(title=f"Example series: {example_id}", xaxis_title="time", yaxis_title="y")
fig.show()


## Hierarchical reconciliation

We compare two simple strategies:

- **Bottom-up**: forecast each store and sum to region/total.
- **Top-down**: forecast the total, then allocate to stores using historical proportions.


In [None]:
# Build forecast DataFrame (store-level) using the global model
pred_rows = []
for series_id in series_ids:
    s = df[df["series_id"] == series_id].sort_values("t")
    test_periods = s["period"].iloc[-horizon:]
    region, store = series_id.split("/")
    for period, pred in zip(test_periods, global_preds[series_id]):
        pred_rows.append({
            "region": region,
            "store": store,
            "period": period,
            "timestamp": period.to_timestamp(),
            "y_pred": pred,
        })

pred_df = pd.DataFrame(pred_rows)

# Actuals for test horizon
actual_df = df[df["t"] >= t_test[0]].copy()
actual_df = actual_df[["region", "store", "period", "timestamp", "y"]]

# Bottom-up aggregation
bu_region = pred_df.groupby(["region", "period"], as_index=False)["y_pred"].sum().sort_values("period")
bu_total = pred_df.groupby(["period"], as_index=False)["y_pred"].sum().sort_values("period")


In [None]:
# Top-down: forecast total series, then allocate by historical share
total_series = df.groupby("period", as_index=False)["y"].sum().sort_values("period")
total_train = total_series.iloc[:-horizon]
total_test = total_series.iloc[-horizon:]

beta_total = fit_linear(make_features(t_train, include_intercept=True), total_train["y"].values)
total_pred = make_features(t_test, include_intercept=True) @ beta_total

train_df = df[df["t"] < t_test[0]]
shares = train_df.groupby("series_id")["y"].sum() / train_df["y"].sum()

td_rows = []
for series_id in series_ids:
    region, store = series_id.split("/")
    share = shares.loc[series_id]
    for period, pred in zip(total_test["period"], total_pred):
        td_rows.append({
            "region": region,
            "store": store,
            "period": period,
            "timestamp": period.to_timestamp(),
            "y_pred": pred * share,
        })

td_df = pd.DataFrame(td_rows)
td_region = td_df.groupby(["region", "period"], as_index=False)["y_pred"].sum().sort_values("period")
td_total = td_df.groupby(["period"], as_index=False)["y_pred"].sum().sort_values("period")


In [None]:
# Compare total-level accuracy
actual_total = total_test["y"].values
bu_total_mae = mae(actual_total, bu_total["y_pred"].values)
td_total_mae = mae(actual_total, td_total["y_pred"].values)

bu_total_mae, td_total_mae


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=total_test["period"].dt.to_timestamp(), y=actual_total,
                         mode="lines", name="Actual total", line=dict(color="black")))
fig.add_trace(go.Scatter(x=bu_total["period"].dt.to_timestamp(), y=bu_total["y_pred"],
                         mode="lines", name="Bottom-up"))
fig.add_trace(go.Scatter(x=td_total["period"].dt.to_timestamp(), y=td_total["y_pred"],
                         mode="lines", name="Top-down"))
fig.update_layout(title="Total forecast: bottom-up vs top-down", xaxis_title="time", yaxis_title="y")
fig.show()


## sktime mapping (high level)

- sktime supports hierarchical **mtypes** via pandas MultiIndex containers.
- Forecasting estimators can be evaluated on hierarchical data if they declare the right tags.
- Reconciliation strategies (bottom-up, top-down, middle-out) can be implemented as
  preprocessing or post-processing steps in a pipeline.

See the datatypes and registry notebooks in
`data_science/time_series/sktime_algorithms/` for a version-specific catalog.


## Exercises
- Swap the global model for a lag-based regression (add $y_{t-1}$ and $y_{t-12}$).
- Compare bottom-up vs top-down at the **region** level, not just total.
- Try different allocation rules (last-year share vs average share).


## Further reading
- Hyndman et al., *Forecasting: Principles and Practice* (hierarchical forecasting).
- Global forecasting surveys and benchmark papers on pooled learning across series.
