# Changepoint Detection (Classical)

## Purpose
- Detect regime shifts in mean, variance, or seasonal structure.
- Separate structural breaks from random noise.
- Build interpretable baselines before more complex detectors.

## What is a changepoint?
A changepoint is a time index $	au$ where the data-generating process changes. For a piecewise-constant mean model:

$$
 y_t = \mu_k + \epsilon_t, \quad t \in (	au_{k-1}, 	au_k]
$$

A typical goal is to estimate the changepoints $\{	au_k\}$ and segment-specific parameters $\mu_k$.


## Offline vs online detection
- **Offline**: entire series is available (batch segmentation).
- **Online**: detect changes as data arrives (streaming).

This notebook covers both with simple, transparent methods.


In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

rng = np.random.default_rng(7)

n = 240
true_cps = [60, 140, 190]
means = [0.0, 1.5, -0.5, 0.8]
stds = [0.3, 0.3, 0.6, 0.25]

segments = [0] + true_cps + [n]
values = np.empty(n)
for idx, (start, end) in enumerate(zip(segments[:-1], segments[1:])):
    values[start:end] = rng.normal(means[idx], stds[idx], end - start)

series = pd.Series(values, index=pd.RangeIndex(n), name="y")

fig = go.Figure()
fig.add_trace(go.Scatter(x=series.index, y=series, mode="lines", name="series"))
for cp in true_cps:
    fig.add_vline(x=cp, line_dash="dash", line_color="red", annotation_text=f"cp={cp}")
fig.update_layout(
    title="Synthetic series with true changepoints",
    xaxis_title="time",
    yaxis_title="value",
    template="plotly_white",
)
fig


## Single changepoint via sum of squared errors (SSE)
For a candidate changepoint $	au$, define the segmentation cost:

$$
C(	au) = \sum_{t=1}^{	au} (y_t - ar{y}_{1: 	au})^2 + \sum_{t=	au+1}^{n} (y_t - ar{y}_{	au+1:n})^2
$$

We choose $	au$ that minimizes $C(	au)$.


In [None]:
def make_sse(y: np.ndarray):
    y = np.asarray(y, dtype=float)
    c1 = np.cumsum(y)
    c2 = np.cumsum(y ** 2)

    def sse(start: int, end: int) -> float:
        if end <= start:
            return 0.0
        s1 = c1[end - 1] - (c1[start - 1] if start > 0 else 0.0)
        s2 = c2[end - 1] - (c2[start - 1] if start > 0 else 0.0)
        n = end - start
        return s2 - (s1 * s1) / n

    return sse


def best_single_cp(y: np.ndarray, min_size: int = 10):
    n = len(y)
    sse = make_sse(y)
    candidates = np.arange(min_size, n - min_size)
    costs = np.empty_like(candidates, dtype=float)
    for i, cp in enumerate(candidates):
        costs[i] = sse(0, cp) + sse(cp, n)
    best_idx = int(np.argmin(costs))
    return int(candidates[best_idx]), candidates, costs

best_cp, candidates, costs = best_single_cp(series.values, min_size=12)

fig = go.Figure()
fig.add_trace(go.Scatter(x=candidates, y=costs, mode="lines", name="segmentation cost"))
fig.add_vline(x=best_cp, line_dash="dash", line_color="green", annotation_text=f"best={best_cp}")
for cp in true_cps:
    fig.add_vline(x=cp, line_dash="dot", line_color="red")
fig.update_layout(
    title="SSE cost for single changepoint",
    xaxis_title="candidate changepoint",
    yaxis_title="cost",
    template="plotly_white",
)
fig


## Binary segmentation for multiple changepoints
Binary segmentation repeats the single-changepoint search on subsegments and accepts splits when the improvement exceeds a penalty.


In [None]:
def binary_segmentation(y: np.ndarray, min_size: int = 12, penalty: float = 12.0):
    y = np.asarray(y, dtype=float)
    n = len(y)
    sse = make_sse(y)
    cps = []

    def segment_cost(start: int, end: int) -> float:
        return sse(start, end)

    def best_cp_in_segment(start: int, end: int):
        best_cp = None
        best_cost = np.inf
        for cp in range(start + min_size, end - min_size):
            cost = segment_cost(start, cp) + segment_cost(cp, end)
            if cost < best_cost:
                best_cost = cost
                best_cp = cp
        return best_cp, best_cost

    def recurse(start: int, end: int):
        if end - start < 2 * min_size:
            return
        base_cost = segment_cost(start, end)
        cp, split_cost = best_cp_in_segment(start, end)
        if cp is None:
            return
        improvement = base_cost - split_cost
        if improvement > penalty:
            cps.append(cp)
            recurse(start, cp)
            recurse(cp, end)

    recurse(0, n)
    return sorted(cps)

found_cps = binary_segmentation(series.values, min_size=12, penalty=10.0)

fig = go.Figure()
fig.add_trace(go.Scatter(x=series.index, y=series, mode="lines", name="series"))
for cp in found_cps:
    fig.add_vline(x=cp, line_dash="dash", line_color="green", annotation_text=f"found={cp}")
for cp in true_cps:
    fig.add_vline(x=cp, line_dash="dot", line_color="red")
fig.update_layout(
    title="Binary segmentation changepoints",
    xaxis_title="time",
    yaxis_title="value",
    template="plotly_white",
)
fig


## Online detection with CUSUM
CUSUM detects shifts in the mean by accumulating deviations from a reference mean $\mu_0$:

$$
egin{aligned}
G_t^+ &= \max\left(0, G_{t-1}^+ + x_t - (\mu_0 + k)ight) \
G_t^- &= \max\left(0, G_{t-1}^- + (\mu_0 - k) - x_tight)
\end{aligned}
$$

An alarm triggers when either statistic exceeds a threshold $h$.


In [None]:
def cusum(y: np.ndarray, target_mean: float, k: float = 0.2, h: float = 4.0):
    gpos = np.zeros_like(y, dtype=float)
    gneg = np.zeros_like(y, dtype=float)
    alarms = []
    for i, x in enumerate(y):
        gpos[i] = max(0.0, (gpos[i - 1] if i > 0 else 0.0) + x - (target_mean + k))
        gneg[i] = max(0.0, (gneg[i - 1] if i > 0 else 0.0) + (target_mean - k) - x)
        if gpos[i] > h or gneg[i] > h:
            alarms.append(i)
            gpos[i] = 0.0
            gneg[i] = 0.0
    return gpos, gneg, alarms

baseline = series.iloc[:40].mean()

gpos, gneg, alarms = cusum(series.values, target_mean=baseline, k=0.2, h=4.0)

fig = go.Figure()
fig.add_trace(go.Scatter(x=series.index, y=gpos, mode="lines", name="CUSUM +"))
fig.add_trace(go.Scatter(x=series.index, y=gneg, mode="lines", name="CUSUM -"))
fig.add_hline(y=4.0, line_dash="dash", line_color="gray")
for alarm in alarms:
    fig.add_vline(x=alarm, line_dash="dot", line_color="purple")
fig.update_layout(
    title="CUSUM statistics with alarms",
    xaxis_title="time",
    yaxis_title="statistic",
    template="plotly_white",
)
fig


## Practical tips
- **Minimum segment length** prevents overfitting to noise.
- **Penalties** control how many changepoints you accept.
- **Seasonality** can mask changes; remove seasonal components first.
- **Variance shifts** might require different costs or robust losses.

## Exercises
1. Increase the noise level and adjust the binary segmentation penalty.
2. Change the baseline period used for CUSUM and observe alarm timing.
3. Replace the SSE cost with an absolute-error cost (more robust to outliers).

## Further reading
- Basseville & Nikiforov (1993), *Detection of Abrupt Changes*.
- Killick, Fearnhead, & Eckley (2012), PELT algorithm for optimal segmentation.
- Adams & MacKay (2007), Bayesian online changepoint detection.
