# sktime Time Series Clustering

Time series clustering groups sequences by shape, dynamics, or derived features so you can discover regimes, archetypes, or segments without labels.

## What you will learn
- The clustering setup and core notation for time series.
- Why distance choices (e.g., elastic distances) matter for sequence shape.
- How sktime exposes clusterers via a unified estimator API.


## Clustering setup and notation
Given a dataset of $n$ time series $X^{(i)} \in \mathbb{R}^{T_i \times d}$, a clusterer assigns each series a label
$\hat{z}_i \in \{1,\dots,K\}$. Many algorithms minimize within-cluster dispersion:

$$
\sum_{i=1}^n d\bigl(X^{(i)}, \mu_{\hat{z}_i}\bigr),
$$

where $d$ is a time-series distance (Euclidean, DTW, or feature-space distance) and $\mu_k$ is a cluster prototype.


## Enhanced Mathematical Foundation

### Dynamic Time Warping (DTW) Distance
DTW finds the optimal alignment between two time series $\mathbf{x} = (x_1, \dots, x_m)$ and $\mathbf{y} = (y_1, \dots, y_n)$ by solving the recurrence:

$$
D(i, j) = d(x_i, y_j) + \min\{D(i-1, j),\; D(i, j-1),\; D(i-1, j-1)\}
$$

where $d(x_i, y_j) = (x_i - y_j)^2$ is the pointwise squared distance. The DTW distance is $\sqrt{D(m, n)}$.

### k-Means Objective for Time Series
Given $N$ time series $\{X^{(1)}, \dots, X^{(N)}\}$, the k-means objective minimizes:

$$
\min_{\boldsymbol{\mu}, \mathbf{z}} \sum_{i=1}^{N} d\bigl(X^{(i)}, \mu_{z_i}\bigr)
$$

where $\mathbf{z} = (z_1, \dots, z_N)$ are cluster assignments and $\boldsymbol{\mu} = (\mu_1, \dots, \mu_K)$ are cluster centroids.

### Centroid Update (DTW Barycenter Averaging)
For DTW-based clustering, centroids are updated using **DBA** (DTW Barycenter Averaging):

$$
\mu_k = \text{DBA}\bigl(\{X^{(i)} : z_i = k\}\bigr)
$$

DBA iteratively refines the barycenter by aligning all cluster members and averaging aligned points.

### Silhouette Score
The silhouette score measures cluster cohesion vs. separation for each sample $i$:

$$
s_i = \frac{b_i - a_i}{\max(a_i, b_i)}
$$

where:
- $a_i$ = mean intra-cluster distance (to other points in same cluster)
- $b_i$ = mean nearest-cluster distance (to points in closest other cluster)

Values range from $-1$ (misclassified) to $+1$ (well-clustered).

## Low-Level NumPy Implementation

Below we implement DTW distance, pairwise distance matrix, k-means clustering for time series, and silhouette score computation from scratch using NumPy.

In [None]:
import numpy as np

def dtw_distance(x: np.ndarray, y: np.ndarray) -> tuple[float, np.ndarray]:
    """
    Compute DTW distance between two 1D time series with full cost matrix.
    
    Parameters
    ----------
    x : np.ndarray, shape (m,)
        First time series
    y : np.ndarray, shape (n,)
        Second time series
    
    Returns
    -------
    distance : float
        DTW distance (square root of accumulated cost)
    cost_matrix : np.ndarray, shape (m, n)
        Full accumulated cost matrix D(i,j)
    """
    m, n = len(x), len(y)
    
    # Initialize cost matrix with infinity
    D = np.full((m + 1, n + 1), np.inf)
    D[0, 0] = 0.0
    
    # Fill cost matrix using dynamic programming
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = (x[i - 1] - y[j - 1]) ** 2
            D[i, j] = cost + min(D[i - 1, j], D[i, j - 1], D[i - 1, j - 1])
    
    # Return distance and trimmed cost matrix (exclude padding row/col)
    return np.sqrt(D[m, n]), D[1:, 1:]


def compute_pairwise_dtw(X: np.ndarray) -> np.ndarray:
    """
    Compute pairwise DTW distance matrix for a collection of time series.
    
    Parameters
    ----------
    X : np.ndarray, shape (N, T)
        Collection of N time series, each of length T
    
    Returns
    -------
    dist_matrix : np.ndarray, shape (N, N)
        Symmetric distance matrix where dist_matrix[i,j] = DTW(X[i], X[j])
    """
    N = X.shape[0]
    dist_matrix = np.zeros((N, N))
    
    for i in range(N):
        for j in range(i + 1, N):
            dist, _ = dtw_distance(X[i], X[j])
            dist_matrix[i, j] = dist
            dist_matrix[j, i] = dist  # Symmetric
    
    return dist_matrix


def kmeans_timeseries(
    X: np.ndarray, 
    n_clusters: int, 
    metric: str = "dtw",
    max_iter: int = 100,
    random_state: int = 42
) -> tuple[np.ndarray, np.ndarray, list[float]]:
    """
    k-Means clustering for time series using Lloyd's algorithm.
    
    Parameters
    ----------
    X : np.ndarray, shape (N, T)
        Collection of N time series, each of length T
    n_clusters : int
        Number of clusters K
    metric : str
        Distance metric: 'dtw' or 'euclidean'
    max_iter : int
        Maximum number of iterations
    random_state : int
        Random seed for centroid initialization
    
    Returns
    -------
    labels : np.ndarray, shape (N,)
        Cluster assignments for each time series
    centroids : np.ndarray, shape (K, T)
        Cluster centroids (mean of assigned series)
    inertias : list[float]
        Inertia (sum of distances to centroids) at each iteration
    """
    rng = np.random.default_rng(random_state)
    N, T = X.shape
    
    # Initialize centroids randomly from data points
    centroid_idx = rng.choice(N, size=n_clusters, replace=False)
    centroids = X[centroid_idx].copy()
    
    labels = np.zeros(N, dtype=int)
    inertias = []
    
    def _distance(a, b):
        if metric == "dtw":
            dist, _ = dtw_distance(a, b)
            return dist
        else:  # euclidean
            return np.linalg.norm(a - b)
    
    for iteration in range(max_iter):
        # Assignment step: assign each series to nearest centroid
        old_labels = labels.copy()
        distances_to_centroids = np.zeros((N, n_clusters))
        
        for i in range(N):
            for k in range(n_clusters):
                distances_to_centroids[i, k] = _distance(X[i], centroids[k])
        
        labels = np.argmin(distances_to_centroids, axis=1)
        inertia = np.sum(np.min(distances_to_centroids, axis=1))
        inertias.append(inertia)
        
        # Check convergence
        if np.array_equal(labels, old_labels):
            break
        
        # Update step: recompute centroids as mean of assigned series
        # (Note: For true DTW clustering, DBA should be used; here we use arithmetic mean)
        for k in range(n_clusters):
            mask = labels == k
            if np.any(mask):
                centroids[k] = X[mask].mean(axis=0)
    
    return labels, centroids, inertias


def compute_silhouette(
    X: np.ndarray, 
    labels: np.ndarray, 
    distance_matrix: np.ndarray | None = None
) -> tuple[np.ndarray, float]:
    """
    Compute silhouette scores for clustering evaluation.
    
    Parameters
    ----------
    X : np.ndarray, shape (N, T)
        Collection of N time series
    labels : np.ndarray, shape (N,)
        Cluster assignments
    distance_matrix : np.ndarray, shape (N, N), optional
        Precomputed pairwise distance matrix. If None, computed using DTW.
    
    Returns
    -------
    silhouette_samples : np.ndarray, shape (N,)
        Silhouette score for each sample
    silhouette_avg : float
        Mean silhouette score across all samples
    """
    N = len(labels)
    unique_labels = np.unique(labels)
    n_clusters = len(unique_labels)
    
    if distance_matrix is None:
        distance_matrix = compute_pairwise_dtw(X)
    
    silhouette_samples = np.zeros(N)
    
    for i in range(N):
        # a_i: mean distance to other points in same cluster
        same_cluster = labels == labels[i]
        same_cluster[i] = False  # Exclude self
        
        if np.sum(same_cluster) > 0:
            a_i = np.mean(distance_matrix[i, same_cluster])
        else:
            a_i = 0.0
        
        # b_i: mean distance to points in nearest other cluster
        b_i = np.inf
        for k in unique_labels:
            if k != labels[i]:
                other_cluster = labels == k
                if np.sum(other_cluster) > 0:
                    mean_dist = np.mean(distance_matrix[i, other_cluster])
                    b_i = min(b_i, mean_dist)
        
        if b_i == np.inf:
            b_i = 0.0
        
        # Silhouette score
        if max(a_i, b_i) > 0:
            silhouette_samples[i] = (b_i - a_i) / max(a_i, b_i)
        else:
            silhouette_samples[i] = 0.0
    
    return silhouette_samples, np.mean(silhouette_samples)


print("âœ“ DTW distance, pairwise matrix, k-means, and silhouette functions defined")

### Example: Apply Clustering to Synthetic Data

Let's apply our NumPy implementation to the synthetic time series data.

In [None]:
# Combine synthetic groups into a single dataset
X_all = np.vstack([groups["shape A"], groups["shape B"], groups["shape C"]])
true_labels = np.array([0]*6 + [1]*6 + [2]*6)

print(f"Dataset shape: {X_all.shape} (N={X_all.shape[0]} series, T={X_all.shape[1]} time points)")

# Run k-means clustering with DTW metric
labels_pred, centroids, inertias = kmeans_timeseries(X_all, n_clusters=3, metric="dtw", max_iter=50)

# Compute pairwise DTW distance matrix
print("Computing pairwise DTW distance matrix...")
dist_matrix = compute_pairwise_dtw(X_all)

# Compute silhouette scores
sil_samples, sil_avg = compute_silhouette(X_all, labels_pred, dist_matrix)

print(f"Converged in {len(inertias)} iterations")
print(f"Final inertia: {inertias[-1]:.4f}")
print(f"Mean silhouette score: {sil_avg:.4f}")

## Plotly Visualizations

### DTW Cost Matrix Heatmap
Visualize the accumulated cost matrix between two example time series.

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Compute DTW between two series from different clusters
series_a = X_all[0]   # From shape A (sine)
series_b = X_all[12]  # From shape C (square)
dtw_dist, cost_matrix = dtw_distance(series_a, series_b)

# Create DTW cost matrix heatmap
fig_dtw = make_subplots(
    rows=2, cols=2,
    specs=[[{"colspan": 2}, None], [{"type": "xy"}, {"type": "xy"}]],
    row_heights=[0.6, 0.4],
    subplot_titles=["DTW Accumulated Cost Matrix", "Series A (Sine)", "Series B (Square)"],
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)

# Cost matrix heatmap
fig_dtw.add_trace(
    go.Heatmap(
        z=cost_matrix,
        colorscale="Viridis",
        colorbar=dict(title="Cost", x=1.02),
        hovertemplate="i=%{y}<br>j=%{x}<br>D(i,j)=%{z:.2f}<extra></extra>"
    ),
    row=1, col=1
)

# Series A
fig_dtw.add_trace(
    go.Scatter(x=list(range(len(series_a))), y=series_a, mode="lines", 
               line=dict(color="#1f77b4", width=2), name="Series A"),
    row=2, col=1
)

# Series B
fig_dtw.add_trace(
    go.Scatter(x=list(range(len(series_b))), y=series_b, mode="lines",
               line=dict(color="#ff7f0e", width=2), name="Series B"),
    row=2, col=2
)

fig_dtw.update_layout(
    title=f"DTW Distance = {dtw_dist:.4f}",
    height=550,
    showlegend=False,
    margin=dict(l=50, r=50, t=80, b=40)
)
fig_dtw.update_xaxes(title_text="j (Series B index)", row=1, col=1)
fig_dtw.update_yaxes(title_text="i (Series A index)", row=1, col=1)
fig_dtw.show()

### Cluster Assignments with Centroids
Visualize the clustering results showing all series colored by cluster assignment, with centroids overlaid.

In [None]:
# Cluster assignment visualization
cluster_colors = ["#1f77b4", "#ff7f0e", "#2ca02c"]
cluster_names = ["Cluster 0", "Cluster 1", "Cluster 2"]

fig_clusters = make_subplots(
    rows=1, cols=3,
    subplot_titles=[f"Cluster {k} (n={np.sum(labels_pred==k)})" for k in range(3)],
    shared_yaxes=True
)

for k in range(3):
    mask = labels_pred == k
    cluster_series = X_all[mask]
    
    # Plot individual series (faded)
    for i, series in enumerate(cluster_series):
        fig_clusters.add_trace(
            go.Scatter(
                x=list(range(len(series))), y=series,
                mode="lines",
                line=dict(color=cluster_colors[k], width=1),
                opacity=0.4,
                showlegend=False,
                hoverinfo="skip"
            ),
            row=1, col=k+1
        )
    
    # Plot centroid (bold)
    fig_clusters.add_trace(
        go.Scatter(
            x=list(range(len(centroids[k]))), y=centroids[k],
            mode="lines",
            line=dict(color=cluster_colors[k], width=4),
            name=f"Centroid {k}",
            showlegend=True
        ),
        row=1, col=k+1
    )

fig_clusters.update_layout(
    title="K-Means Clustering Results with DTW Distance",
    height=350,
    margin=dict(l=40, r=40, t=80, b=40),
    legend=dict(orientation="h", yanchor="bottom", y=-0.2, xanchor="center", x=0.5)
)
fig_clusters.update_xaxes(title_text="Time", row=1, col=2)
fig_clusters.update_yaxes(title_text="Value", row=1, col=1)
fig_clusters.show()

### Silhouette Plot
Visualize the silhouette score for each sample, grouped by cluster assignment.

In [None]:
# Silhouette plot
fig_sil = go.Figure()

y_lower = 0
for k in range(3):
    cluster_sil = sil_samples[labels_pred == k]
    cluster_sil_sorted = np.sort(cluster_sil)
    cluster_size = len(cluster_sil_sorted)
    
    y_upper = y_lower + cluster_size
    y_range = np.arange(y_lower, y_upper)
    
    fig_sil.add_trace(
        go.Bar(
            x=cluster_sil_sorted,
            y=y_range,
            orientation="h",
            marker=dict(color=cluster_colors[k]),
            name=f"Cluster {k}",
            hovertemplate="Sample<br>Silhouette: %{x:.3f}<extra></extra>"
        )
    )
    
    # Add cluster label
    fig_sil.add_annotation(
        x=-0.05, y=(y_lower + y_upper) / 2,
        text=f"Cluster {k}",
        showarrow=False,
        font=dict(size=11, color=cluster_colors[k]),
        xanchor="right"
    )
    
    y_lower = y_upper + 2  # Gap between clusters

# Add vertical line for average silhouette
fig_sil.add_vline(
    x=sil_avg, 
    line=dict(color="red", width=2, dash="dash"),
    annotation_text=f"Avg: {sil_avg:.3f}",
    annotation_position="top right"
)

fig_sil.update_layout(
    title="Silhouette Plot for Time Series Clustering",
    xaxis_title="Silhouette Coefficient",
    yaxis_title="Sample Index (sorted within cluster)",
    height=400,
    showlegend=True,
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="center", x=0.5),
    margin=dict(l=80, r=40, t=80, b=40),
    xaxis=dict(range=[-0.2, 1.0])
)
fig_sil.update_yaxes(showticklabels=False)
fig_sil.show()

### Pairwise DTW Distance Matrix Heatmap
Visualize the full pairwise distance matrix to see cluster structure.

In [None]:
# Reorder distance matrix by cluster assignment for better visualization
sorted_idx = np.argsort(labels_pred)
dist_matrix_sorted = dist_matrix[sorted_idx][:, sorted_idx]
labels_sorted = labels_pred[sorted_idx]

# Create annotations for cluster boundaries
cluster_boundaries = []
for k in range(3):
    cluster_end = np.sum(labels_sorted <= k)
    cluster_boundaries.append(cluster_end - 0.5)

fig_dist = go.Figure()

fig_dist.add_trace(
    go.Heatmap(
        z=dist_matrix_sorted,
        colorscale="Blues",
        colorbar=dict(title="DTW Distance"),
        hovertemplate="Series %{y} vs Series %{x}<br>DTW Distance: %{z:.3f}<extra></extra>"
    )
)

# Add cluster boundary lines
for boundary in cluster_boundaries[:-1]:
    fig_dist.add_hline(y=boundary, line=dict(color="red", width=2))
    fig_dist.add_vline(x=boundary, line=dict(color="red", width=2))

fig_dist.update_layout(
    title="Pairwise DTW Distance Matrix (sorted by cluster)",
    xaxis_title="Series Index",
    yaxis_title="Series Index",
    height=500,
    width=550,
    margin=dict(l=60, r=60, t=80, b=60)
)
fig_dist.update_yaxes(autorange="reversed")
fig_dist.show()

print(f"\nðŸ“Š Summary:")
print(f"   â€¢ Number of clusters: 3")
print(f"   â€¢ Cluster sizes: {[np.sum(labels_pred==k) for k in range(3)]}")
print(f"   â€¢ Mean silhouette score: {sil_avg:.4f}")

In [None]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

rng = np.random.default_rng(10)
t = np.linspace(0, 1, 80)

def _make_group(kind, n=6):
    series = []
    for _ in range(n):
        noise = 0.15 * rng.normal(size=t.size)
        if kind == "sine":
            y = np.sin(2 * np.pi * (t * 1.5 + rng.normal(scale=0.02)))
        elif kind == "trend":
            y = 1.5 * t + 0.2 * np.sin(2 * np.pi * t)
        else:
            y = np.sign(np.sin(2 * np.pi * t * 3))
        series.append(y + noise)
    return np.array(series)

groups = {"shape A": _make_group("sine"), "shape B": _make_group("trend"), "shape C": _make_group("square")}

fig = make_subplots(rows=1, cols=3, subplot_titles=list(groups.keys()), shared_yaxes=True)
for col, (name, data) in enumerate(groups.items(), start=1):
    for row in data:
        fig.add_trace(
            go.Scatter(x=t, y=row, mode="lines", line=dict(width=1), showlegend=False),
            row=1, col=col,
        )
fig.update_layout(
    title="Three intuitive cluster archetypes (synthetic)",
    height=320,
    margin=dict(l=20, r=20, t=50, b=20),
)
fig.update_xaxes(showticklabels=False)
fig

## Distance vs representation
Common clustering design choices:
- **Distance-based**: cluster using elastic distances (e.g., DTW) or shape-aware metrics.
- **Feature-based**: transform series to features (summary stats, spectra, learned features) then cluster.
- **Model-based**: fit generative models per cluster and compare likelihoods.

These choices define *what counts as similar* in your domain (shape, phase, frequency, or dynamics).

## sktime mapping
sktime clusterers follow a unified API (scitype = `clusterer`). In practice you:
- Prepare a collection of series (equal or unequal length depending on estimator tags).
- Call `fit` to learn prototypes or parameters.
- Use `predict` or `fit_predict` to obtain cluster labels.

The registry catalog (next notebook) lists every clusterer available in your local sktime install.

## Next steps
- Explore the dynamic catalog in `data_science/time_series/sktime_algorithms/registry/05_clusterer_catalog.ipynb`.
- Pair clusterers with sktime transformers (e.g., smoothing, feature extraction) for better separability.