# Dynamic Time Warping (DTW) for Time Series Classification

DTW aligns two sequences by allowing local time stretching and compression. It is a classic distance for time-series classification and clustering, especially when patterns are similar but out of phase.

**Learning goals**
- Understand the DTW objective and path constraints.
- Implement DTW with a warping window.
- Visualize the cost matrix and alignment path.
- Use DTW in a simple k-NN classifier.


## Intuition: align shapes, not timestamps

Two sequences can have the same shape but evolve at different speeds. DTW finds a lowest-cost alignment path through a cost matrix so that similar shapes line up even if they are shifted or stretched in time.


## DTW objective and recurrence

Let $x=(x_1,\dots,x_n)$ and $y=(y_1,\dots,y_m)$. Define a pointwise cost
$c_{i,j}=|x_i-y_j|$ (or squared error). DTW solves:

$$
D_{i,j}=c_{i,j}+\min\{D_{i-1,j},\;D_{i,j-1},\;D_{i-1,j-1}\}
$$

with boundary $D_{1,1}=c_{1,1}$. The DTW distance is $D_{n,m}$.

**Path constraints**
- **Boundary**: starts at $(1,1)$ and ends at $(n,m)$.
- **Monotonicity**: indices never decrease.
- **Continuity**: steps are $(1,0)$, $(0,1)$, or $(1,1)$.

A **Sakoe-Chiba band** limits warping to $|i-j|\le w$, improving speed and avoiding pathological alignments.


In [None]:
import numpy as np
import plotly.graph_objects as go

rng = np.random.default_rng(7)
n = 80
t = np.linspace(0, 2 * np.pi, n)

x = np.sin(t) + 0.10 * rng.normal(size=n)
y = np.sin(t + 0.8) + 0.10 * rng.normal(size=n)

fig = go.Figure()
fig.add_trace(go.Scatter(x=np.arange(n), y=x, mode='lines', name='x'))
fig.add_trace(go.Scatter(x=np.arange(n), y=y, mode='lines', name='y (shifted)'))
fig.update_layout(title='Same shape, different timing', xaxis_title='Time index', yaxis_title='Value')
fig

## DTW from scratch (with a warping window)

We compute the dynamic programming table and optionally backtrack the alignment path.


In [None]:
def dtw_distance(x, y, window=None, return_path=False):
    # Compute DTW distance between 1D sequences.
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    n, m = len(x), len(y)

    if window is None:
        w = max(n, m)
    else:
        w = max(int(window), abs(n - m))

    D = np.full((n + 1, m + 1), np.inf)
    D[0, 0] = 0.0

    for i in range(1, n + 1):
        j_start = max(1, i - w)
        j_end = min(m, i + w)
        for j in range(j_start, j_end + 1):
            cost = abs(x[i - 1] - y[j - 1])
            D[i, j] = cost + min(D[i - 1, j], D[i, j - 1], D[i - 1, j - 1])

    if not return_path:
        return D[n, m]

    # Backtrack alignment path
    i, j = n, m
    path = []
    while i > 0 and j > 0:
        path.append((i - 1, j - 1))
        steps = [
            (D[i - 1, j - 1], i - 1, j - 1),
            (D[i - 1, j], i - 1, j),
            (D[i, j - 1], i, j - 1),
        ]
        _, i, j = min(steps, key=lambda v: v[0])

    path.reverse()
    return D[n, m], np.array(path), D[1:, 1:]

In [None]:
euclid = np.linalg.norm(x - y)
dtw, path, cost = dtw_distance(x, y, window=20, return_path=True)

print(f'Euclidean distance: {euclid:.3f}')
print(f'DTW distance      : {dtw:.3f}')

## Cost matrix and alignment path

The heatmap shows local costs, and the line is the optimal alignment path.


In [None]:
fig = go.Figure()
fig.add_trace(go.Heatmap(z=cost, colorscale='Viridis', showscale=True))
fig.add_trace(go.Scatter(
    x=path[:, 1],
    y=path[:, 0],
    mode='lines',
    line=dict(color='white', width=2),
    name='Alignment path'
))
fig.update_yaxes(autorange='reversed')
fig.update_layout(title='DTW cost matrix and optimal path', xaxis_title='y index', yaxis_title='x index')
fig

## Visualizing the matched pairs

We can draw a few alignment links to see how points correspond across sequences.


In [None]:
def plot_alignment(x, y, path, stride=6, offset=1.6):
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=np.arange(len(x)), y=x, mode='lines', name='x'))
    fig.add_trace(go.Scatter(x=np.arange(len(y)), y=y + offset, mode='lines', name='y (offset)'))

    for i, j in path[::stride]:
        fig.add_trace(go.Scatter(
            x=[i, j],
            y=[x[i], y[j] + offset],
            mode='lines',
            line=dict(color='rgba(0,0,0,0.25)', width=1),
            showlegend=False
        ))

    fig.update_layout(title='DTW alignment links', xaxis_title='Time index')
    return fig

plot_alignment(x, y, path)

## Normalization matters

DTW is sensitive to scale. A common practice is **z-normalization** before computing distances.


In [None]:
def z_normalize(a):
    a = np.asarray(a, dtype=float)
    return (a - a.mean()) / (a.std() + 1e-8)

## k-NN classification with DTW

We build a small synthetic dataset with time warping and compare Euclidean vs DTW 1-NN.


In [None]:
def warp_time(t, rng, knots=5, sigma=0.15):
    knot_x = np.linspace(0, 1, knots)
    knot_y = np.cumsum(np.clip(rng.normal(1.0, sigma, size=knots), 0.05, None))
    knot_y = (knot_y - knot_y[0]) / (knot_y[-1] - knot_y[0])
    return np.interp(t, knot_x, knot_y)

def make_series(label, length, rng):
    t = np.linspace(0, 1, length)
    t_warp = warp_time(t, rng)
    if label == 0:
        base = np.sin(2 * np.pi * t_warp)
    else:
        base = np.sign(np.sin(2 * np.pi * t_warp))
    noise = rng.normal(0, 0.15, size=length)
    return base + noise

def make_dataset(n_per_class=30, length=60, rng=None):
    rng = rng or np.random.default_rng(0)
    X, y = [], []
    for label in (0, 1):
        for _ in range(n_per_class):
            X.append(make_series(label, length, rng))
            y.append(label)
    return np.array(X), np.array(y)

rng = np.random.default_rng(42)
X, y = make_dataset(rng=rng)
perm = rng.permutation(len(y))
train_size = int(0.7 * len(y))
train_idx, test_idx = perm[:train_size], perm[train_size:]

X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]

In [None]:
def knn_predict(X_train, y_train, x, distance):
    distances = [distance(x, x_tr) for x_tr in X_train]
    return y_train[int(np.argmin(distances))]

def euclidean(a, b):
    return np.linalg.norm(a - b)

def dtw_norm(a, b):
    return dtw_distance(z_normalize(a), z_normalize(b), window=8)

def accuracy(distance):
    preds = [knn_predict(X_train, y_train, x, distance) for x in X_test]
    return float(np.mean(np.array(preds) == y_test))

acc_euclid = accuracy(euclidean)
acc_dtw = accuracy(dtw_norm)

print(f'1-NN accuracy (Euclidean): {acc_euclid:.3f}')
print(f'1-NN accuracy (DTW)      : {acc_dtw:.3f}')

In [None]:
fig = go.Figure()
fig.add_trace(go.Bar(x=['Euclidean', 'DTW'], y=[acc_euclid, acc_dtw], marker_color=['#9bb7d4', '#2f4b7c']))
fig.update_layout(title='DTW often improves k-NN for warped shapes', yaxis_title='Accuracy', yaxis_range=[0, 1])
fig

## Complexity and practical tips

- **Complexity**: naive DTW is $\mathcal{O}(nm)$ time and memory.
- **Windowing**: use a Sakoe-Chiba band to reduce runtime and avoid over-warping.
- **Normalization**: z-normalize each series before DTW.
- **Lower bounds**: for large datasets, use bounds (e.g., LB_Keogh) to prune candidates.


## Exercises

1. Try squared error instead of absolute error in the cost definition. How does it change sensitivity to outliers?
2. Increase the warping window and observe accuracy/runtime changes.
3. Replace the square wave class with a sawtooth wave and re-evaluate.
4. Implement a simple LB_Keogh bound and measure speedups for k-NN.


## Further reading

- Berndt, D. J., & Clifford, J. (1994). Using Dynamic Time Warping to Find Patterns in Time Series.
- Keogh, E., & Ratanamahatana, C. A. (2005). Exact indexing of dynamic time warping.
- Rakthanmanon, T., et al. (2012). Searching and mining trillions of time series subsequences under DTW.
