# sktime Time Series Clustering

Time series clustering groups sequences by shape, dynamics, or derived features so you can discover regimes, archetypes, or segments without labels.

## What you will learn
- The clustering setup and core notation for time series.
- Why distance choices (e.g., elastic distances) matter for sequence shape.
- How sktime exposes clusterers via a unified estimator API.


## Clustering setup and notation
Given a dataset of $n$ time series $X^{(i)} \in \mathbb{R}^{T_i \times d}$, a clusterer assigns each series a label
$\hat{z}_i \in \{1,\dots,K\}$. Many algorithms minimize within-cluster dispersion:

$$
\sum_{i=1}^n d\bigl(X^{(i)}, \mu_{\hat{z}_i}\bigr),
$$

where $d$ is a time-series distance (Euclidean, DTW, or feature-space distance) and $\mu_k$ is a cluster prototype.


In [1]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

rng = np.random.default_rng(10)
t = np.linspace(0, 1, 80)

def _make_group(kind, n=6):
    series = []
    for _ in range(n):
        noise = 0.15 * rng.normal(size=t.size)
        if kind == "sine":
            y = np.sin(2 * np.pi * (t * 1.5 + rng.normal(scale=0.02)))
        elif kind == "trend":
            y = 1.5 * t + 0.2 * np.sin(2 * np.pi * t)
        else:
            y = np.sign(np.sin(2 * np.pi * t * 3))
        series.append(y + noise)
    return np.array(series)

groups = {"shape A": _make_group("sine"), "shape B": _make_group("trend"), "shape C": _make_group("square")}

fig = make_subplots(rows=1, cols=3, subplot_titles=list(groups.keys()), shared_yaxes=True)
for col, (name, data) in enumerate(groups.items(), start=1):
    for row in data:
        fig.add_trace(
            go.Scatter(x=t, y=row, mode="lines", line=dict(width=1), showlegend=False),
            row=1, col=col,
        )
fig.update_layout(
    title="Three intuitive cluster archetypes (synthetic)",
    height=320,
    margin=dict(l=20, r=20, t=50, b=20),
)
fig.update_xaxes(showticklabels=False)
fig.show()


## Distance vs representation
Common clustering design choices:
- **Distance-based**: cluster using elastic distances (e.g., DTW) or shape-aware metrics.
- **Feature-based**: transform series to features (summary stats, spectra, learned features) then cluster.
- **Model-based**: fit generative models per cluster and compare likelihoods.

These choices define *what counts as similar* in your domain (shape, phase, frequency, or dynamics).

## sktime mapping
sktime clusterers follow a unified API (scitype = `clusterer`). In practice you:
- Prepare a collection of series (equal or unequal length depending on estimator tags).
- Call `fit` to learn prototypes or parameters.
- Use `predict` or `fit_predict` to obtain cluster labels.

The registry catalog (next notebook) lists every clusterer available in your local sktime install.

## Next steps
- Explore the dynamic catalog in `data_science/time_series/sktime_algorithms/registry/05_clusterer_catalog.ipynb`.
- Pair clusterers with sktime transformers (e.g., smoothing, feature extraction) for better separability.