# sktime Series Annotation (Anomalies, Change Points, Segmentation)

Series annotation attaches labels to time points or intervals: where behavior shifts, where anomalies appear, or how regimes segment across time.

## What is series annotation?
We learn a label function over time:

$$
a_t = g(y_{1:T})
$$

where $a_t$ can be binary (anomaly vs normal), categorical (regime labels), or interval-based (start/end indices).

## Tasks and notation
- **Point anomalies**: $a_t \in \{0,1\}$ flags outlier timestamps.
- **Collective anomalies**: detect abnormal *intervals* $[s_j, e_j]$.
- **Change points**: detect $\tau_1, \tau_2, \dots$ where the data-generating process shifts.
- **Segmentation**: assign regime labels $\ell_t \in \{1,\dots,K\}$ for contiguous regions.

A simple change point model assumes

$$
y_t \sim P_k \quad \text{for} \quad \tau_{k-1} < t \le \tau_k.
$$


In [1]:
import numpy as np
import plotly.graph_objects as go

rng = np.random.default_rng(9)
t = np.arange(220)
y = 0.6 * np.sin(t / 10) + 0.25 * rng.normal(size=t.size)

# Regime shifts
y[80:] += 1.2
y[150:] -= 1.6
change_points = [80, 150]

# Point anomalies
anoms = np.array([30, 112, 190])
y[anoms] += np.array([3.5, -3.0, 4.0])

fig = go.Figure()
fig.add_trace(go.Scatter(x=t, y=y, mode="lines", name="series"))
fig.add_trace(
    go.Scatter(
        x=anoms,
        y=y[anoms],
        mode="markers",
        marker=dict(color="crimson", size=9),
        name="point anomalies",
    )
)
for cp in change_points:
    fig.add_vline(x=cp, line_dash="dash", line_color="gray")

fig.update_layout(
    title="Synthetic series with anomalies and change points",
    height=320,
    margin=dict(l=20, r=20, t=50, b=20),
)
fig.show()


## sktime mapping
sktime exposes series annotation via the **series-annotator** scitype. Annotators typically provide:
- `fit` to learn thresholds or model parameters,
- `predict` (or `transform`) to return anomaly labels or change-point intervals.

Estimator tags describe whether an annotator supports multivariate series, returns point labels vs intervals, or requires exogenous variables.

## Evaluation notes
- Use tolerance windows for change points so near-misses count.
- For anomalies, prefer precision/recall/F1 over raw accuracy when events are rare.
- For segmentation, compare overlap of predicted vs true intervals (IoU-style metrics).

## Next steps
- Explore the dynamic catalog in `data_science/time_series/sktime_algorithms/registry/06_annotation_catalog.ipynb`.
- Combine annotators with transformers (detrending, smoothing) for cleaner signals.