# RocketClassifier

ROCKET (RandOm Convolutional KErnel Transform) turns a time series into a high‑dimensional feature vector using many random convolutional kernels. A fast linear classifier on these features often achieves strong accuracy with minimal tuning.


## Mathematical Foundation

ROCKET transforms time series into a rich feature space using **random convolutional kernels**. Understanding the math reveals why this simple idea works so well.

### Random Convolutional Kernel

A kernel $k$ of length $l$ convolves with a time series $x$ to produce an output:

$$k(t) = w \cdot x_{t:t+l}$$

where the weights $w \sim \mathcal{N}(0, 1)$ are drawn from a standard normal distribution.

### Kernel Parameters

Each random kernel is defined by:

| Parameter | Description | Typical Values |
|-----------|-------------|----------------|
| **Length** $l$ | Number of weights in kernel | $l \in \{7, 9, 11\}$ (sampled uniformly) |
| **Dilation** $d$ | Spacing between input elements | $d \in \{1, 2, ..., \lfloor \frac{T-1}{l-1} \rfloor\}$ |
| **Bias** $b$ | Additive constant | $b \sim \mathcal{U}(-1, 1)$ |
| **Padding** | Zero-padding at boundaries | Random choice |

### Convolution Operation

For a kernel with weights $w = [w_0, w_1, ..., w_{l-1}]$ and dilation $d$, the convolution output at position $i$ is:

$$z_i = \sum_{j=0}^{l-1} w_j \cdot x_{i \cdot d + j} + b$$

The dilation parameter allows the kernel to capture patterns at different temporal scales:
- **Small dilation** ($d=1$): Captures fine-grained local patterns
- **Large dilation** ($d=8$): Captures patterns spanning wider time intervals

### Feature Extraction

ROCKET extracts **two features** from each kernel's convolution output $z$:

#### 1. Proportion of Positive Values (PPV)

$$\text{PPV}(z) = \frac{1}{T}\sum_{t=1}^{T} \mathbf{1}_{z_t > 0}$$

**Intuition**: PPV measures "how often" the pattern matched. A high PPV means the time series frequently exhibits the pattern captured by this kernel.

#### 2. Maximum Value (Max Pooling)

$$\text{max}(z) = \max_t z_t$$

**Intuition**: The max captures the "best match" — how strongly the pattern appeared at its peak location.

With $N$ kernels, ROCKET produces $2N$ features (PPV + max for each kernel).

## Visualizing Random Kernels

Let's build intuition by visualizing what random kernels look like and how they interact with time series data.

In [None]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

np.random.seed(42)

# Generate sample random kernels with different lengths
kernel_lengths = [7, 9, 11]
kernels = []
for length in kernel_lengths:
    weights = np.random.randn(length)
    kernels.append(weights)

# Visualize kernel shapes
fig = make_subplots(rows=1, cols=3, subplot_titles=[f"Kernel (length={l})" for l in kernel_lengths])

for i, (kernel, length) in enumerate(zip(kernels, kernel_lengths)):
    fig.add_trace(
        go.Scatter(
            x=list(range(length)),
            y=kernel,
            mode='lines+markers',
            name=f'L={length}',
            line=dict(width=2),
            marker=dict(size=8)
        ),
        row=1, col=i+1
    )
    fig.add_hline(y=0, line_dash="dash", line_color="gray", opacity=0.5, row=1, col=i+1)

fig.update_layout(
    title="Random Convolutional Kernels (weights ~ N(0,1))",
    height=350,
    showlegend=False,
    template="plotly_white"
)
fig.update_xaxes(title_text="Position")
fig.update_yaxes(title_text="Weight")
fig

### Convolution Output Visualization

When we convolve a kernel with a time series, we get an output signal. Let's see how different kernels respond to a sample time series containing distinct patterns.

In [None]:
# Create a sample time series with distinct patterns
T = 100
t = np.linspace(0, 4*np.pi, T)
time_series = np.sin(t) + 0.5 * np.sin(3*t) + 0.2 * np.random.randn(T)

# Apply convolution with one kernel (dilation=1, bias=0)
def convolve_with_kernel(x, kernel, dilation=1, bias=0):
    """Apply a single kernel to a time series with given dilation."""
    l = len(kernel)
    # Calculate effective kernel span
    effective_length = (l - 1) * dilation + 1
    output_length = len(x) - effective_length + 1
    
    if output_length <= 0:
        return np.array([])
    
    output = np.zeros(output_length)
    for i in range(output_length):
        indices = np.arange(l) * dilation + i
        output[i] = np.sum(kernel * x[indices]) + bias
    return output

# Apply different kernels with different dilations
kernel = kernels[0]  # Length 7 kernel
dilations = [1, 2, 4]
outputs = [convolve_with_kernel(time_series, kernel, d, bias=0) for d in dilations]

# Visualization
fig = make_subplots(
    rows=4, cols=1,
    subplot_titles=["Original Time Series", 
                    "Convolution Output (dilation=1)",
                    "Convolution Output (dilation=2)", 
                    "Convolution Output (dilation=4)"],
    vertical_spacing=0.08
)

# Original time series
fig.add_trace(
    go.Scatter(x=list(range(T)), y=time_series, mode='lines', 
               name='Input', line=dict(color='blue', width=2)),
    row=1, col=1
)

# Convolution outputs
colors = ['green', 'orange', 'red']
for i, (output, dilation) in enumerate(zip(outputs, dilations)):
    fig.add_trace(
        go.Scatter(x=list(range(len(output))), y=output, mode='lines',
                   name=f'd={dilation}', line=dict(color=colors[i], width=2)),
        row=i+2, col=1
    )
    # Add zero line and highlight positive regions
    fig.add_hline(y=0, line_dash="dash", line_color="gray", opacity=0.5, row=i+2, col=1)

fig.update_layout(
    title="Kernel Convolution at Different Dilations",
    height=700,
    showlegend=True,
    template="plotly_white"
)
fig

### PPV and Max Feature Distribution

The PPV (Proportion of Positive Values) and Max features extracted from convolution outputs form the feature space for classification. Let's visualize how these features differ across time series.

In [None]:
# Generate multiple random time series of two classes
np.random.seed(42)
n_samples = 100
T = 100

# Class A: Sine-dominant patterns
class_a = [np.sin(np.linspace(0, 4*np.pi, T)) + 0.3*np.random.randn(T) for _ in range(n_samples//2)]
# Class B: Sawtooth-like patterns  
class_b = [np.sign(np.sin(np.linspace(0, 4*np.pi, T))) + 0.3*np.random.randn(T) for _ in range(n_samples//2)]

all_series = class_a + class_b
labels = ['Class A']*len(class_a) + ['Class B']*len(class_b)

# Generate random kernels and compute features
n_kernels = 50
kernels_for_viz = []
for _ in range(n_kernels):
    length = np.random.choice([7, 9, 11])
    weights = np.random.randn(length)
    dilation = np.random.randint(1, 5)
    bias = np.random.uniform(-1, 1)
    kernels_for_viz.append({'weights': weights, 'dilation': dilation, 'bias': bias})

# Extract PPV and Max features for all series
def extract_features(series, kernels):
    ppvs = []
    maxs = []
    for k in kernels:
        output = convolve_with_kernel(series, k['weights'], k['dilation'], k['bias'])
        if len(output) > 0:
            ppv = np.mean(output > 0)
            max_val = np.max(output)
        else:
            ppv, max_val = 0.5, 0
        ppvs.append(ppv)
        maxs.append(max_val)
    return ppvs, maxs

all_ppvs = []
all_maxs = []
for series in all_series:
    ppvs, maxs = extract_features(series, kernels_for_viz)
    all_ppvs.append(ppvs)
    all_maxs.append(maxs)

all_ppvs = np.array(all_ppvs)
all_maxs = np.array(all_maxs)

# Visualize PPV distribution for first kernel
fig = make_subplots(rows=1, cols=2, subplot_titles=["PPV Feature (Kernel 1)", "Max Feature (Kernel 1)"])

# PPV histogram by class
for i, cls in enumerate(['Class A', 'Class B']):
    mask = np.array(labels) == cls
    fig.add_trace(
        go.Histogram(x=all_ppvs[mask, 0], name=cls, opacity=0.7, nbinsx=20),
        row=1, col=1
    )
    fig.add_trace(
        go.Histogram(x=all_maxs[mask, 0], name=cls, opacity=0.7, nbinsx=20, showlegend=False),
        row=1, col=2
    )

fig.update_layout(
    title="Feature Distributions by Class (Single Kernel)",
    height=400,
    barmode='overlay',
    template="plotly_white"
)
fig.update_xaxes(title_text="PPV Value", row=1, col=1)
fig.update_xaxes(title_text="Max Value", row=1, col=2)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig

### Classification Decision Boundary

With many kernels, ROCKET creates a high-dimensional feature space. A linear classifier finds hyperplanes to separate classes. Let's visualize this using PCA to reduce to 2D.

In [None]:
from sklearn.decomposition import PCA
from sklearn.linear_model import RidgeClassifierCV

# Combine PPV and Max features
all_features = np.hstack([all_ppvs, all_maxs])
y = np.array([0 if l == 'Class A' else 1 for l in labels])

# Reduce to 2D for visualization
pca = PCA(n_components=2)
features_2d = pca.fit_transform(all_features)

# Fit classifier in 2D space
clf_2d = RidgeClassifierCV()
clf_2d.fit(features_2d, y)

# Create decision boundary mesh
x_min, x_max = features_2d[:, 0].min() - 1, features_2d[:, 0].max() + 1
y_min, y_max = features_2d[:, 1].min() - 1, features_2d[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                     np.linspace(y_min, y_max, 100))
Z = clf_2d.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot
fig = go.Figure()

# Decision boundary contour
fig.add_trace(go.Contour(
    x=np.linspace(x_min, x_max, 100),
    y=np.linspace(y_min, y_max, 100),
    z=Z,
    showscale=False,
    colorscale=[[0, 'rgba(66, 133, 244, 0.3)'], [1, 'rgba(234, 67, 53, 0.3)']],
    contours=dict(showlines=False),
    name='Decision Boundary'
))

# Scatter points
for i, (cls, color, symbol) in enumerate([('Class A', 'blue', 'circle'), ('Class B', 'red', 'diamond')]):
    mask = y == i
    fig.add_trace(go.Scatter(
        x=features_2d[mask, 0],
        y=features_2d[mask, 1],
        mode='markers',
        name=cls,
        marker=dict(color=color, size=10, symbol=symbol, line=dict(width=1, color='white'))
    ))

fig.update_layout(
    title=f"ROCKET Features in 2D (PCA) with Decision Boundary<br><sub>Accuracy: {clf_2d.score(features_2d, y):.1%}</sub>",
    xaxis_title="PC1",
    yaxis_title="PC2",
    height=500,
    template="plotly_white"
)
fig

---

## Low-Level NumPy Implementation

Understanding ROCKET deeply requires implementing it from scratch. Below we build a complete ROCKET classifier using only NumPy and a simple linear classifier.

### Step 1: Generate Random Kernels

Each kernel has randomly sampled:
- **Weights**: Drawn from $\mathcal{N}(0, 1)$
- **Length**: Chosen from $\{7, 9, 11\}$
- **Dilation**: Exponentially distributed to cover multiple time scales
- **Bias**: Uniform on $[-1, 1]$
- **Padding**: Random boolean

In [None]:
def generate_random_kernels(n_kernels: int, max_length: int = 11, random_state: int = None) -> list:
    """
    Generate random convolutional kernels for ROCKET.
    
    Parameters
    ----------
    n_kernels : int
        Number of kernels to generate.
    max_length : int
        Maximum kernel length. Lengths are sampled from {7, 9, 11} ∩ [1, max_length].
    random_state : int, optional
        Random seed for reproducibility.
    
    Returns
    -------
    kernels : list of dict
        Each kernel contains: weights, length, dilation, bias, padding.
    """
    if random_state is not None:
        np.random.seed(random_state)
    
    # Possible kernel lengths
    candidate_lengths = np.array([7, 9, 11])
    lengths = candidate_lengths[candidate_lengths <= max_length]
    if len(lengths) == 0:
        lengths = np.array([max_length])
    
    kernels = []
    for _ in range(n_kernels):
        # Random length
        length = np.random.choice(lengths)
        
        # Random weights ~ N(0, 1)
        weights = np.random.randn(length)
        
        # Random dilation (exponentially distributed to cover multiple scales)
        # Max dilation ensures kernel fits in typical time series
        max_dilation = 32  # Configurable
        dilation = 2 ** np.random.uniform(0, np.log2(max_dilation + 1))
        dilation = int(np.floor(dilation))
        dilation = max(1, dilation)
        
        # Random bias ~ Uniform(-1, 1)
        bias = np.random.uniform(-1, 1)
        
        # Random padding (True/False)
        padding = np.random.choice([True, False])
        
        kernels.append({
            'weights': weights,
            'length': length,
            'dilation': dilation,
            'bias': bias,
            'padding': padding
        })
    
    return kernels


# Example: Generate and inspect 5 kernels
example_kernels = generate_random_kernels(5, random_state=42)
for i, k in enumerate(example_kernels):
    print(f"Kernel {i+1}: length={k['length']}, dilation={k['dilation']}, "
          f"bias={k['bias']:.3f}, padding={k['padding']}")
    print(f"  weights: {k['weights'].round(3)}\n")

### Step 2: Apply Kernel to Time Series

The core operation: convolve a kernel with a time series using the specified dilation. This implements:

$$z_i = \sum_{j=0}^{l-1} w_j \cdot x_{i \cdot d + j} + b$$

In [None]:
def apply_kernel(X: np.ndarray, kernel: np.ndarray, dilation: int = 1, 
                  bias: float = 0.0, padding: bool = False) -> np.ndarray:
    """
    Apply a single convolutional kernel to a time series.
    
    Parameters
    ----------
    X : np.ndarray
        Input time series of shape (T,) or (N, T) for batch processing.
    kernel : np.ndarray
        Kernel weights of shape (L,).
    dilation : int
        Dilation factor (spacing between kernel elements).
    bias : float
        Bias term added to convolution output.
    padding : bool
        If True, zero-pad the input to maintain output length.
    
    Returns
    -------
    output : np.ndarray
        Convolution output.
    """
    # Handle 1D input
    if X.ndim == 1:
        X = X.reshape(1, -1)
    
    N, T = X.shape
    L = len(kernel)
    
    # Effective kernel span with dilation
    effective_length = (L - 1) * dilation + 1
    
    # Calculate padding
    if padding:
        pad_total = effective_length - 1
        pad_left = pad_total // 2
        pad_right = pad_total - pad_left
        X = np.pad(X, ((0, 0), (pad_left, pad_right)), mode='constant', constant_values=0)
        T = X.shape[1]
    
    # Output length
    output_length = T - effective_length + 1
    
    if output_length <= 0:
        return np.zeros((N, 1))
    
    # Compute convolution via explicit loop (educational, not optimized)
    output = np.zeros((N, output_length))
    
    for i in range(output_length):
        # Gather dilated indices
        indices = np.arange(L) * dilation + i
        # Dot product with kernel weights
        output[:, i] = X[:, indices] @ kernel + bias
    
    return output.squeeze()


# Demo: Apply a kernel to our sample time series
demo_kernel = example_kernels[0]
demo_output = apply_kernel(
    time_series, 
    demo_kernel['weights'], 
    demo_kernel['dilation'], 
    demo_kernel['bias'],
    demo_kernel['padding']
)

print(f"Input length: {len(time_series)}")
print(f"Kernel length: {demo_kernel['length']}, dilation: {demo_kernel['dilation']}")
print(f"Output length: {len(demo_output)}")
print(f"Output range: [{demo_output.min():.3f}, {demo_output.max():.3f}]")

### Step 3: Extract ROCKET Features

From each kernel's convolution output, we extract two features:
1. **PPV** (Proportion of Positive Values): $\text{PPV}(z) = \frac{1}{T}\sum_{t=1}^{T} \mathbf{1}_{z_t > 0}$
2. **Max**: $\text{max}(z) = \max_t z_t$

This gives us $2 \times n\_kernels$ features per time series.

In [None]:
def extract_rocket_features(X: np.ndarray, kernels: list) -> np.ndarray:
    """
    Extract ROCKET features (PPV and Max) from time series using given kernels.
    
    Parameters
    ----------
    X : np.ndarray
        Input time series of shape (N, T) where N is number of samples.
    kernels : list
        List of kernel dictionaries from generate_random_kernels().
    
    Returns
    -------
    features : np.ndarray
        Feature matrix of shape (N, 2 * n_kernels).
        First n_kernels columns are PPV features, next n_kernels are Max features.
    """
    if X.ndim == 1:
        X = X.reshape(1, -1)
    
    N = X.shape[0]
    n_kernels = len(kernels)
    
    # Pre-allocate feature arrays
    ppv_features = np.zeros((N, n_kernels))
    max_features = np.zeros((N, n_kernels))
    
    for k_idx, kernel in enumerate(kernels):
        # Apply kernel to all samples
        for sample_idx in range(N):
            output = apply_kernel(
                X[sample_idx],
                kernel['weights'],
                kernel['dilation'],
                kernel['bias'],
                kernel['padding']
            )
            
            if len(output) > 0:
                # PPV: proportion of positive values
                ppv_features[sample_idx, k_idx] = np.mean(output > 0)
                # Max: maximum value
                max_features[sample_idx, k_idx] = np.max(output)
            else:
                ppv_features[sample_idx, k_idx] = 0.5
                max_features[sample_idx, k_idx] = 0.0
    
    # Concatenate PPV and Max features
    features = np.hstack([ppv_features, max_features])
    
    return features


# Demo: Extract features from a batch of time series
demo_series = np.array(class_a[:5] + class_b[:5])  # 10 samples
demo_kernels = generate_random_kernels(100, random_state=123)
demo_features = extract_rocket_features(demo_series, demo_kernels)

print(f"Input shape: {demo_series.shape}")
print(f"Number of kernels: {len(demo_kernels)}")
print(f"Feature shape: {demo_features.shape}")
print(f"Features per sample: {demo_features.shape[1]} (100 PPV + 100 Max)")

### Step 4: Complete ROCKET Classifier

Now we combine everything into a complete classifier. We use Ridge regression (a fast linear classifier) on the extracted features.

In [None]:
class SimpleROCKETClassifier:
    """
    A from-scratch ROCKET classifier implementation.
    
    ROCKET = Random Convolutional Kernel Transform + Linear Classifier
    
    Parameters
    ----------
    n_kernels : int
        Number of random kernels (default: 10,000).
    alpha : float
        Ridge regularization strength (default: 1.0).
    random_state : int, optional
        Random seed for reproducibility.
    """
    
    def __init__(self, n_kernels: int = 10000, alpha: float = 1.0, random_state: int = None):
        self.n_kernels = n_kernels
        self.alpha = alpha
        self.random_state = random_state
        self.kernels_ = None
        self.weights_ = None
        self.bias_ = None
        self.classes_ = None
    
    def fit(self, X: np.ndarray, y: np.ndarray):
        """Fit the ROCKET classifier."""
        if X.ndim == 1:
            X = X.reshape(1, -1)
        
        # Store classes
        self.classes_ = np.unique(y)
        
        # Generate random kernels
        self.kernels_ = generate_random_kernels(
            self.n_kernels, 
            random_state=self.random_state
        )
        
        # Extract features
        features = extract_rocket_features(X, self.kernels_)
        
        # Standardize features (important for Ridge)
        self.mean_ = features.mean(axis=0)
        self.std_ = features.std(axis=0) + 1e-8
        features_scaled = (features - self.mean_) / self.std_
        
        # Encode labels to 0/1 for binary, or one-hot for multiclass
        if len(self.classes_) == 2:
            y_encoded = (y == self.classes_[1]).astype(float)
        else:
            # One-hot encoding
            y_encoded = np.zeros((len(y), len(self.classes_)))
            for i, cls in enumerate(self.classes_):
                y_encoded[y == cls, i] = 1
        
        # Ridge regression: (X^T X + αI)^{-1} X^T y
        n_features = features_scaled.shape[1]
        XtX = features_scaled.T @ features_scaled
        XtY = features_scaled.T @ y_encoded
        
        # Solve normal equations with regularization
        self.weights_ = np.linalg.solve(
            XtX + self.alpha * np.eye(n_features),
            XtY
        )
        
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """Predict class labels."""
        if X.ndim == 1:
            X = X.reshape(1, -1)
        
        # Extract and scale features
        features = extract_rocket_features(X, self.kernels_)
        features_scaled = (features - self.mean_) / self.std_
        
        # Compute scores
        scores = features_scaled @ self.weights_
        
        if len(self.classes_) == 2:
            # Binary: threshold at 0.5
            predictions = (scores > 0.5).astype(int)
            return self.classes_[predictions.ravel()]
        else:
            # Multiclass: argmax
            return self.classes_[np.argmax(scores, axis=1)]
    
    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        """Compute classification accuracy."""
        return np.mean(self.predict(X) == y)

### Step 5: Test Our Implementation

Let's test our from-scratch ROCKET classifier on the synthetic data and visualize the results with a confusion matrix.

In [None]:
# Prepare train/test split from synthetic data
np.random.seed(42)
n_train = 80

# Shuffle data
all_X = np.array(all_series)
all_y = np.array([0 if l == 'Class A' else 1 for l in labels])
indices = np.random.permutation(len(all_X))
all_X, all_y = all_X[indices], all_y[indices]

X_train_synth = all_X[:n_train]
y_train_synth = all_y[:n_train]
X_test_synth = all_X[n_train:]
y_test_synth = all_y[n_train:]

# Train our from-scratch ROCKET classifier (using fewer kernels for speed)
print("Training SimpleROCKETClassifier...")
rocket_clf = SimpleROCKETClassifier(n_kernels=500, alpha=1.0, random_state=42)
rocket_clf.fit(X_train_synth, y_train_synth)

# Evaluate
train_acc = rocket_clf.score(X_train_synth, y_train_synth)
test_acc = rocket_clf.score(X_test_synth, y_test_synth)

print(f"\nResults (500 kernels):")
print(f"  Train Accuracy: {train_acc:.1%}")
print(f"  Test Accuracy:  {test_acc:.1%}")

In [None]:
# Confusion Matrix Visualization
from sklearn.metrics import confusion_matrix

y_pred = rocket_clf.predict(X_test_synth)
cm = confusion_matrix(y_test_synth, y_pred)

# Create annotated heatmap
fig = go.Figure(data=go.Heatmap(
    z=cm,
    x=['Pred: Class A', 'Pred: Class B'],
    y=['True: Class A', 'True: Class B'],
    colorscale='Blues',
    showscale=True,
    text=cm,
    texttemplate="%{text}",
    textfont={"size": 20}
))

fig.update_layout(
    title=f"Confusion Matrix (Test Accuracy: {test_acc:.1%})",
    xaxis_title="Predicted",
    yaxis_title="Actual",
    height=400,
    width=500,
    template="plotly_white"
)
fig

### Comparison: Our Implementation vs. sktime

Let's compare our from-scratch implementation with sktime's optimized ROCKET transformer.

In [None]:
import time

# Create DataFrames for sktime (required format)
import pandas as pd

def to_sktime_format(X):
    """Convert numpy array to sktime nested DataFrame format."""
    n_samples, n_timepoints = X.shape
    data = {'dim_0': [pd.Series(X[i]) for i in range(n_samples)]}
    return pd.DataFrame(data)

X_train_sk = to_sktime_format(X_train_synth)
X_test_sk = to_sktime_format(X_test_synth)

# sktime ROCKET
from sktime.transformations.panel.rocket import Rocket
from sklearn.linear_model import RidgeClassifierCV
from sklearn.pipeline import make_pipeline

# Time our implementation
start = time.time()
our_clf = SimpleROCKETClassifier(n_kernels=500, random_state=42)
our_clf.fit(X_train_synth, y_train_synth)
our_time = time.time() - start
our_acc = our_clf.score(X_test_synth, y_test_synth)

# Time sktime implementation
start = time.time()
sktime_clf = make_pipeline(
    Rocket(num_kernels=500, random_state=42),
    RidgeClassifierCV()
)
sktime_clf.fit(X_train_sk, y_train_synth)
sktime_time = time.time() - start
sktime_acc = sktime_clf.score(X_test_sk, y_test_synth)

# Results comparison
print("="*50)
print("COMPARISON: Our Implementation vs sktime ROCKET")
print("="*50)
print(f"\n{'Metric':<20} {'Ours':<15} {'sktime':<15}")
print("-"*50)
print(f"{'Test Accuracy':<20} {our_acc:<15.1%} {sktime_acc:<15.1%}")
print(f"{'Training Time':<20} {our_time:<15.3f}s {sktime_time:<15.3f}s")
print(f"{'Num Kernels':<20} {500:<15} {500:<15}")
print("\nNote: sktime is faster due to Numba JIT compilation.")

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from sktime.datasets import load_basic_motions, load_unit_test



## Data


In [None]:
X_train, y_train = load_basic_motions(split="train", return_X_y=True)
X_test, y_test = load_basic_motions(split="test", return_X_y=True)
print(X_train.shape, y_train.shape)


## Model: ROCKET + RidgeClassifier


In [None]:
from sktime.transformations.panel.rocket import Rocket
from sklearn.linear_model import RidgeClassifierCV
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report

clf = make_pipeline(
    Rocket(num_kernels=10_000, random_state=42),
    RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))
)

clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print(classification_report(y_test, pred))


## Why ROCKET works

ROCKET creates features that capture local shape patterns at many scales. The linear classifier then finds a decision boundary in this feature space. It is both **fast** and **accurate**, making it a top baseline for time‑series classification.
