# KNeighborsTimeSeriesClassifier

K-Nearest Neighbors (KNN) for time series classification adapts the classic KNN algorithm by using **distance metrics designed for sequential data**. The most powerful is **Dynamic Time Warping (DTW)**, which allows elastic alignment between series of different lengths or with temporal distortions.

**Key Intuition**: Instead of comparing time series point-by-point (which fails when signals are shifted or stretched), DTW finds the optimal alignment that minimizes the total distance between matched points.

## Mathematical Foundation

### KNN Classification Rule

Given a query time series $\mathbf{x}$, KNN finds the $k$ nearest neighbors in the training set and assigns the majority class:

$$\hat{y} = \text{mode}\{y_i : i \in \mathcal{N}_k(\mathbf{x})\}$$

where $\mathcal{N}_k(\mathbf{x})$ is the set of indices of the $k$ training samples closest to $\mathbf{x}$.

### Distance Metrics

**Euclidean Distance** (point-to-point, rigid alignment):

$$d_E(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{t=1}^{T}(x_t - y_t)^2}$$

This works well when series are perfectly aligned but fails with phase shifts or speed variations.

**Dynamic Time Warping (DTW)** allows non-linear alignment via a recurrence relation:

$$D(i,j) = d(x_i, y_j) + \min\begin{cases} D(i-1, j) & \text{(insertion)} \\ D(i, j-1) & \text{(deletion)} \\ D(i-1, j-1) & \text{(match)} \end{cases}$$

where $d(x_i, y_j) = (x_i - y_j)^2$ is the local cost.

**Warping Path Constraints**:
- **Boundary conditions**: Path starts at $(1,1)$ and ends at $(T_x, T_y)$
- **Monotonicity**: Indices can only increase (no going back in time)
- **Continuity**: Steps limited to adjacent cells
- **Sakoe-Chiba band**: $|i - j| \leq w$ (window constraint for efficiency)

## Low-Level NumPy Implementation

Building DTW and KNN from scratch helps understand the algorithm mechanics. These implementations prioritize clarity over optimization.

In [None]:
def euclidean_distance(x: np.ndarray, y: np.ndarray) -> float:
    """
    Compute Euclidean distance between two time series.
    
    This is the L2 norm of the difference vector. Works best when 
    series are perfectly aligned in time.
    
    Parameters
    ----------
    x, y : np.ndarray of shape (T,)
        Time series of equal length T
        
    Returns
    -------
    float : Euclidean distance
    """
    return np.sqrt(np.sum((x - y) ** 2))


def dtw_distance(x: np.ndarray, y: np.ndarray, return_matrix: bool = False):
    """
    Compute Dynamic Time Warping distance with full cost matrix.
    
    DTW finds the optimal alignment between two sequences by computing
    a cumulative cost matrix where each cell (i,j) represents the 
    minimum cost to align x[:i+1] with y[:j+1].
    
    Parameters
    ----------
    x : np.ndarray of shape (n,)
        First time series
    y : np.ndarray of shape (m,)
        Second time series  
    return_matrix : bool
        If True, also return the cost matrix and warping path
        
    Returns
    -------
    float : DTW distance (square root of accumulated cost)
    D : np.ndarray (optional) : Full cost matrix
    path : list (optional) : Optimal warping path as list of (i,j) tuples
    """
    n, m = len(x), len(y)
    
    # Initialize cost matrix with infinity
    D = np.full((n + 1, m + 1), np.inf)
    D[0, 0] = 0
    
    # Fill the cost matrix using dynamic programming
    for i in range(1, n + 1):
        for j in range(1, m + 1):
            # Local cost: squared difference
            cost = (x[i-1] - y[j-1]) ** 2
            # Cumulative cost: local + minimum of three predecessors
            D[i, j] = cost + min(D[i-1, j],      # insertion (vertical step)
                                  D[i, j-1],      # deletion (horizontal step)
                                  D[i-1, j-1])    # match (diagonal step)
    
    # Backtrack to find optimal warping path
    if return_matrix:
        path = []
        i, j = n, m
        while i > 0 or j > 0:
            path.append((i-1, j-1))
            if i == 0:
                j -= 1
            elif j == 0:
                i -= 1
            else:
                # Find which predecessor was used
                argmin = np.argmin([D[i-1, j-1], D[i-1, j], D[i, j-1]])
                if argmin == 0:
                    i, j = i-1, j-1
                elif argmin == 1:
                    i -= 1
                else:
                    j -= 1
        path.reverse()
        return np.sqrt(D[n, m]), D[1:, 1:], path
    
    return np.sqrt(D[n, m])

In [None]:
def compute_distance_matrix(X_train: np.ndarray, X_test: np.ndarray, 
                            metric: str = "dtw") -> np.ndarray:
    """
    Compute pairwise distance matrix between test and training samples.
    
    This is the computational bottleneck of KNN - we need O(n_test * n_train)
    distance computations, each of which is O(T^2) for DTW.
    
    Parameters
    ----------
    X_train : np.ndarray of shape (n_train, T)
        Training time series
    X_test : np.ndarray of shape (n_test, T)
        Test time series
    metric : str
        Distance metric: "euclidean" or "dtw"
        
    Returns
    -------
    np.ndarray of shape (n_test, n_train) : Distance matrix
    """
    n_test, n_train = len(X_test), len(X_train)
    dist_matrix = np.zeros((n_test, n_train))
    
    # Select distance function
    dist_fn = euclidean_distance if metric == "euclidean" else dtw_distance
    
    for i in range(n_test):
        for j in range(n_train):
            dist_matrix[i, j] = dist_fn(X_test[i], X_train[j])
            
    return dist_matrix


def knn_classify(distance_matrix: np.ndarray, y_train: np.ndarray, 
                 k: int = 1) -> np.ndarray:
    """
    Classify test samples using k-nearest neighbors voting.
    
    For each test sample, find k training samples with smallest distance
    and return the most common class among them.
    
    Parameters
    ----------
    distance_matrix : np.ndarray of shape (n_test, n_train)
        Precomputed distances from test to training samples
    y_train : np.ndarray of shape (n_train,)
        Training labels
    k : int
        Number of neighbors to consider
        
    Returns
    -------
    np.ndarray of shape (n_test,) : Predicted labels
    """
    n_test = distance_matrix.shape[0]
    predictions = []
    
    for i in range(n_test):
        # Find indices of k nearest neighbors
        neighbor_indices = np.argsort(distance_matrix[i])[:k]
        # Get their labels
        neighbor_labels = y_train[neighbor_indices]
        # Majority vote (mode)
        unique, counts = np.unique(neighbor_labels, return_counts=True)
        predictions.append(unique[np.argmax(counts)])
        
    return np.array(predictions)

## Visualizing DTW Alignment

**Intuition**: The DTW cost matrix is like a terrain map where we're finding the cheapest path from corner to corner. Dark cells = cheap alignment, bright cells = expensive. The optimal path snakes through the valleys.

In [None]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create two synthetic time series with different phase/speed
np.random.seed(42)
t = np.linspace(0, 2*np.pi, 50)

# Series 1: sine wave
series1 = np.sin(t) + 0.1 * np.random.randn(len(t))

# Series 2: sine wave with phase shift and stretch
t2 = np.linspace(0.5, 2.5*np.pi, 50)
series2 = np.sin(t2) + 0.1 * np.random.randn(len(t2))

# Compute DTW with cost matrix and path
dtw_dist, cost_matrix, warping_path = dtw_distance(series1, series2, return_matrix=True)
eucl_dist = euclidean_distance(series1, series2)

print(f"Euclidean distance: {eucl_dist:.3f}")
print(f"DTW distance: {dtw_dist:.3f}")

In [None]:
# Plot DTW cost matrix with optimal warping path
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        "DTW Cost Matrix with Optimal Path",
        "Time Series Alignment via DTW",
        "Series 1 (Query)",
        "Series 2 (Reference)"
    ),
    specs=[[{}, {}], [{}, {}]],
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)

# Cost matrix heatmap
fig.add_trace(
    go.Heatmap(
        z=cost_matrix, 
        colorscale="Viridis",
        colorbar=dict(title="Cost", x=0.45, len=0.45, y=0.78),
        showscale=True
    ), 
    row=1, col=1
)

# Warping path overlay
path_x = [p[1] for p in warping_path]
path_y = [p[0] for p in warping_path]
fig.add_trace(
    go.Scatter(
        x=path_x, y=path_y, 
        mode="lines+markers",
        line=dict(color="red", width=2),
        marker=dict(size=4),
        name="Optimal Path"
    ),
    row=1, col=1
)

# DTW alignment connections
for i, j in warping_path[::3]:  # Sample every 3rd point for clarity
    fig.add_trace(
        go.Scatter(
            x=[i, j], y=[series1[i], series2[j]],
            mode="lines",
            line=dict(color="rgba(255,0,0,0.3)", width=1),
            showlegend=False
        ),
        row=1, col=2
    )

# Series on alignment plot
fig.add_trace(
    go.Scatter(x=list(range(len(series1))), y=series1, name="Series 1", 
               line=dict(color="blue")),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=list(range(len(series2))), y=series2, name="Series 2",
               line=dict(color="green")),
    row=1, col=2
)

# Individual series plots
fig.add_trace(
    go.Scatter(x=list(range(len(series1))), y=series1, 
               line=dict(color="blue"), showlegend=False),
    row=2, col=1
)
fig.add_trace(
    go.Scatter(x=list(range(len(series2))), y=series2,
               line=dict(color="green"), showlegend=False),
    row=2, col=2
)

fig.update_layout(
    height=700, 
    title_text="Dynamic Time Warping: Cost Matrix & Alignment",
    showlegend=True
)
fig.update_xaxes(title_text="Series 2 index", row=1, col=1)
fig.update_yaxes(title_text="Series 1 index", row=1, col=1)
fig.update_xaxes(title_text="Time", row=1, col=2)
fig.update_yaxes(title_text="Value", row=1, col=2)

fig

## K-Nearest Neighbors Visualization

**Intuition**: For a query time series, we compute distances to all training samples and select the k closest ones. The query is assigned the majority class among its neighbors. Choosing k is a bias-variance tradeoff: small k = flexible but noisy, large k = smooth but may miss local patterns.

In [None]:
# Generate synthetic classification dataset
np.random.seed(123)

def generate_class_samples(n_samples, base_freq, noise=0.1):
    """Generate time series samples for a class with given frequency."""
    samples = []
    for _ in range(n_samples):
        t = np.linspace(0, 4*np.pi, 60)
        phase = np.random.uniform(0, np.pi)
        amp = np.random.uniform(0.8, 1.2)
        series = amp * np.sin(base_freq * t + phase) + noise * np.random.randn(len(t))
        samples.append(series)
    return np.array(samples)

# Create training data: 2 classes with different frequencies
X_class0 = generate_class_samples(15, base_freq=1.0)  # Low frequency
X_class1 = generate_class_samples(15, base_freq=2.0)  # High frequency
X_train_synth = np.vstack([X_class0, X_class1])
y_train_synth = np.array([0]*15 + [1]*15)

# Create a query sample (class 1)
query = generate_class_samples(1, base_freq=2.0)[0]

# Compute distances
distances = np.array([dtw_distance(query, x) for x in X_train_synth])
k = 5
neighbor_indices = np.argsort(distances)[:k]
neighbor_labels = y_train_synth[neighbor_indices]

print(f"Query's {k} nearest neighbors:")
print(f"  Indices: {neighbor_indices}")
print(f"  Labels: {neighbor_labels}")
print(f"  Distances: {distances[neighbor_indices].round(3)}")
print(f"  Predicted class: {np.bincount(neighbor_labels).argmax()}")

In [None]:
# Visualize KNN classification
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        "Query Time Series",
        f"K={k} Nearest Neighbors",
        "Distance to All Training Samples",
        "Training Data (Sample per Class)"
    ),
    vertical_spacing=0.15
)

# Query series
fig.add_trace(
    go.Scatter(y=query, mode="lines", line=dict(color="red", width=3), name="Query"),
    row=1, col=1
)

# K nearest neighbors
colors = ["blue" if y == 0 else "green" for y in neighbor_labels]
for idx, (ni, color) in enumerate(zip(neighbor_indices, colors)):
    fig.add_trace(
        go.Scatter(
            y=X_train_synth[ni], mode="lines",
            line=dict(color=color, width=1.5),
            name=f"Neighbor {idx+1} (class {y_train_synth[ni]})",
            opacity=0.7
        ),
        row=1, col=2
    )

# Distance bar chart
bar_colors = ["blue" if y == 0 else "green" for y in y_train_synth]
fig.add_trace(
    go.Bar(
        x=list(range(len(distances))), y=distances,
        marker_color=bar_colors,
        name="Distances",
        showlegend=False
    ),
    row=2, col=1
)
# Highlight k nearest
fig.add_trace(
    go.Scatter(
        x=neighbor_indices.tolist(), y=distances[neighbor_indices],
        mode="markers", marker=dict(color="red", size=12, symbol="circle-open", line=dict(width=3)),
        name=f"K={k} Nearest"
    ),
    row=2, col=1
)

# Sample training data
fig.add_trace(
    go.Scatter(y=X_class0[0], mode="lines", line=dict(color="blue"), name="Class 0 (low freq)"),
    row=2, col=2
)
fig.add_trace(
    go.Scatter(y=X_class1[0], mode="lines", line=dict(color="green"), name="Class 1 (high freq)"),
    row=2, col=2
)

fig.update_layout(height=700, title_text="KNN Time Series Classification")
fig.update_xaxes(title_text="Time", row=1, col=1)
fig.update_xaxes(title_text="Time", row=1, col=2)
fig.update_xaxes(title_text="Training Sample Index", row=2, col=1)
fig.update_yaxes(title_text="DTW Distance", row=2, col=1)

fig

## Effect of K on Classification Accuracy

**Intuition**: The choice of $k$ controls the smoothness of decision boundaries:
- **k=1**: Most flexible, uses single nearest neighbor (prone to noise)
- **Large k**: Smoother decisions, but may miss local patterns
- **Optimal k**: Usually found via cross-validation

In [None]:
# Create train/test split for accuracy evaluation
np.random.seed(456)

# Generate more samples for proper evaluation
X_c0_train = generate_class_samples(20, base_freq=1.0)
X_c1_train = generate_class_samples(20, base_freq=2.0)
X_c0_test = generate_class_samples(10, base_freq=1.0)
X_c1_test = generate_class_samples(10, base_freq=2.0)

X_tr = np.vstack([X_c0_train, X_c1_train])
y_tr = np.array([0]*20 + [1]*20)
X_te = np.vstack([X_c0_test, X_c1_test])
y_te = np.array([0]*10 + [1]*10)

# Compute distance matrix once (reuse for different k values)
dist_mat = compute_distance_matrix(X_tr, X_te, metric="dtw")

# Evaluate accuracy for different k values
k_values = list(range(1, 21, 2))
accuracies = []

for k in k_values:
    preds = knn_classify(dist_mat, y_tr, k=k)
    acc = np.mean(preds == y_te)
    accuracies.append(acc)
    
print("K vs Accuracy:")
for k, acc in zip(k_values, accuracies):
    print(f"  k={k:2d}: {acc:.2%}")

In [None]:
# Plot accuracy vs k
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=k_values, y=accuracies,
    mode="lines+markers",
    line=dict(color="royalblue", width=3),
    marker=dict(size=10),
    name="Test Accuracy"
))

# Highlight optimal k
best_k = k_values[np.argmax(accuracies)]
best_acc = max(accuracies)
fig.add_trace(go.Scatter(
    x=[best_k], y=[best_acc],
    mode="markers",
    marker=dict(color="red", size=15, symbol="star"),
    name=f"Best k={best_k}"
))

fig.update_layout(
    title="Classification Accuracy vs Number of Neighbors (k)",
    xaxis_title="k (Number of Neighbors)",
    yaxis_title="Accuracy",
    yaxis=dict(range=[0, 1.05], tickformat=".0%"),
    height=400,
    showlegend=True
)

fig.add_annotation(
    x=best_k, y=best_acc,
    text=f"Optimal k={best_k}<br>Acc={best_acc:.1%}",
    showarrow=True,
    arrowhead=2,
    ax=40, ay=-40
)

fig

In [None]:
# Create train/test split for accuracy evaluation
np.random.seed(456)

# Generate more samples for proper evaluation
X_c0_train = generate_class_samples(20, base_freq=1.0)
X_c1_train = generate_class_samples(20, base_freq=2.0)
X_c0_test = generate_class_samples(10, base_freq=1.0)
X_c1_test = generate_class_samples(10, base_freq=2.0)

X_tr = np.vstack([X_c0_train, X_c1_train])
y_tr = np.array([0]*20 + [1]*20)
X_te = np.vstack([X_c0_test, X_c1_test])
y_te = np.array([0]*10 + [1]*10)

# Compute distance matrix once (reuse for different k values)
dist_mat = compute_distance_matrix(X_tr, X_te, metric="dtw")

# Evaluate accuracy for different k values
k_values = list(range(1, 21, 2))
accuracies = []

for k in k_values:
    preds = knn_classify(dist_mat, y_tr, k=k)
    acc = np.mean(preds == y_te)
    accuracies.append(acc)
    
print("K vs Accuracy:")
for k, acc in zip(k_values, accuracies):
    print(f"  k={k:2d}: {acc:.2%}")

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from sktime.datasets import load_basic_motions, load_unit_test

In [None]:
X_train, y_train = load_basic_motions(split="train", return_X_y=True)
X_test, y_test = load_basic_motions(split="test", return_X_y=True)



## Using sktime's KNeighborsTimeSeriesClassifier

Now let's use the optimized implementation from sktime on a real dataset. The `KNeighborsTimeSeriesClassifier` provides:
- Multiple distance metrics (DTW, Euclidean, MSM, TWE, etc.)
- Window constraints for faster DTW
- Automatic handling of multivariate series

In [None]:
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sklearn.metrics import classification_report

clf = KNeighborsTimeSeriesClassifier(n_neighbors=1, distance="dtw")
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print(classification_report(y_test, pred))


## Key Takeaways & Practical Notes

### When to Use KNN-DTW
✅ **Use when**: Small-to-medium datasets, interpretability needed, time warping expected  
❌ **Avoid when**: Large datasets (O(n²T²) complexity), real-time requirements

### Performance Tips
1. **Window constraints**: Use `distance_params={"window": 0.1}` to limit DTW warping (10% of series length)
2. **Early abandoning**: Stop distance computation once threshold exceeded
3. **Downsampling**: Reduce series length for faster computation
4. **Lower bounding**: Use LB_Keogh to prune candidates before full DTW

### Distance Metric Selection
| Metric | Best For | Complexity |
|--------|----------|------------|
| Euclidean | Aligned series, fast baseline | O(T) |
| DTW | Phase shifts, speed variations | O(T²) |
| DTW + window | Large-scale DTW | O(T·w) |
| MSM (Move-Split-Merge) | Spike patterns | O(T²) |

### Hyperparameter Guidelines
- Start with **k=1** for DTW (often optimal for time series)
- Use **odd k** values to avoid ties
- Cross-validate both k and window size