# RocketRegressor

ROCKET (RandOm Convolutional KErnel Transform) features can be used for time series regression by pairing the transform with a linear regressor (e.g., Ridge). This approach achieves state-of-the-art accuracy with remarkable computational efficiency.

## Key Concepts

- **Random Kernels**: Generate thousands of random convolutional kernels with varying lengths, weights, biases, dilations, and paddings
- **Feature Extraction**: Apply each kernel to time series, extracting PPV (proportion of positive values) and max pooling features
- **Linear Regression**: Use Ridge regression on the extracted features to predict continuous targets



## Mathematical Foundation

### 1. Random Kernel Convolution

Each kernel produces a convolution output $z$ by sliding across the input time series $x$:

$$z_i = \sum_{j=0}^{l-1} w_j \cdot x_{i \cdot d + j} + b$$

Where:
- $w_j$ are the kernel weights
- $l$ is the kernel length
- $d$ is the dilation factor (spacing between elements)
- $b$ is the bias term

### 2. Kernel Parameter Distributions

ROCKET generates kernels with random parameters:

| Parameter | Distribution |
|-----------|-------------|
| Weights | $w \sim \mathcal{N}(0, 1)$ (standard normal) |
| Length | $l \in \{7, 9, 11\}$ (uniform random choice) |
| Bias | $b \sim \mathcal{U}(-1, 1)$ (uniform) |
| Dilation | $d = \lfloor 2^{\mathcal{U}(0, \log_2(\frac{T-1}{l-1}))} \rfloor$ |
| Padding | Valid or same (random choice) |

### 3. Feature Extraction

For each kernel convolution output $z$, ROCKET extracts two features:

**Proportion of Positive Values (PPV)**:
$$\text{PPV}(z) = \frac{1}{T}\sum_{t=1}^{T} \mathbf{1}_{z_t > 0}$$

**Maximum Value**:
$$\text{Max}(z) = \max_{t} z_t$$

This gives $2 \times n_{kernels}$ features per time series.

### 4. Ridge Regression

The final prediction uses Ridge regression on extracted features:

$$\hat{y} = X\beta$$

Where the coefficients are computed as:

$$\beta = (X^TX + \lambda I)^{-1}X^Ty$$

- $X$ is the feature matrix (n_samples × n_features)
- $\lambda$ is the regularization parameter
- $I$ is the identity matrix



In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from sktime.datasets import load_basic_motions, load_unit_test



In [None]:
X_train, y_train = load_unit_test(split="train", return_X_y=True)
X_test, y_test = load_unit_test(split="test", return_X_y=True)



## Fit model


In [None]:
from sktime.transformations.panel.rocket import Rocket
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_absolute_error

model = make_pipeline(Rocket(num_kernels=10_000, random_state=42), Ridge())
model.fit(X_train, y_train)
pred = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, pred))


---

## Low-Level NumPy Implementation

Let's build ROCKET from scratch to understand how it works internally. This implementation focuses on clarity over optimization.



In [None]:
def generate_kernels(n_kernels: int, series_length: int, random_state: int = 42) -> list:
    """
    Generate random convolutional kernels for ROCKET.
    
    Each kernel is a dictionary containing:
    - weights: kernel weights drawn from N(0,1)
    - length: kernel length (7, 9, or 11)
    - bias: bias term from U(-1, 1)
    - dilation: dilation factor for convolution
    - padding: amount of zero-padding
    
    Parameters
    ----------
    n_kernels : int
        Number of kernels to generate
    series_length : int
        Length of input time series (needed for dilation calculation)
    random_state : int
        Random seed for reproducibility
        
    Returns
    -------
    list of dict
        List of kernel parameter dictionaries
    """
    np.random.seed(random_state)
    kernels = []
    
    candidate_lengths = [7, 9, 11]
    
    for _ in range(n_kernels):
        # Random kernel length
        length = np.random.choice(candidate_lengths)
        
        # Random weights from standard normal
        weights = np.random.randn(length)
        # Mean-center the weights (as in original ROCKET)
        weights = weights - weights.mean()
        
        # Random bias from uniform distribution
        bias = np.random.uniform(-1, 1)
        
        # Random dilation - ensures kernel can "see" entire series
        max_dilation = max(1, (series_length - 1) // (length - 1))
        dilation = 2 ** np.random.uniform(0, np.log2(max_dilation + 1))
        dilation = int(dilation)
        
        # Random padding (valid or same)
        use_padding = np.random.choice([True, False])
        if use_padding:
            padding = ((length - 1) * dilation) // 2
        else:
            padding = 0
            
        kernels.append({
            'weights': weights,
            'length': length,
            'bias': bias,
            'dilation': dilation,
            'padding': padding
        })
    
    return kernels

# Generate example kernels
series_length = 24  # Example series length
kernels = generate_kernels(n_kernels=100, series_length=series_length)

print(f"Generated {len(kernels)} kernels")
print(f"\nExample kernel:")
for key, value in kernels[0].items():
    if isinstance(value, np.ndarray):
        print(f"  {key}: {value.round(3)}")
    else:
        print(f"  {key}: {value}")



In [None]:
def apply_kernel_to_series(x: np.ndarray, kernel: dict) -> np.ndarray:
    """
    Apply a single kernel to a time series using convolution.
    
    Performs dilated convolution: z_i = sum(w_j * x[i*d + j]) + b
    
    Parameters
    ----------
    x : np.ndarray
        Input time series of shape (T,)
    kernel : dict
        Kernel parameters (weights, bias, dilation, padding)
        
    Returns
    -------
    np.ndarray
        Convolution output
    """
    weights = kernel['weights']
    bias = kernel['bias']
    dilation = kernel['dilation']
    padding = kernel['padding']
    
    # Apply padding
    if padding > 0:
        x_padded = np.pad(x, padding, mode='constant', constant_values=0)
    else:
        x_padded = x
    
    # Compute effective kernel length with dilation
    length = len(weights)
    effective_length = (length - 1) * dilation + 1
    
    # Number of valid positions
    n_positions = len(x_padded) - effective_length + 1
    
    if n_positions <= 0:
        return np.array([bias])  # Fallback for very short series
    
    # Perform dilated convolution
    output = np.zeros(n_positions)
    for i in range(n_positions):
        # Extract dilated receptive field
        indices = np.arange(length) * dilation + i
        receptive_field = x_padded[indices]
        # Compute convolution at this position
        output[i] = np.dot(weights, receptive_field) + bias
    
    return output


def extract_features(X: np.ndarray, kernels: list) -> np.ndarray:
    """
    Extract ROCKET features from time series dataset.
    
    For each kernel, extracts two features per series:
    1. PPV (Proportion of Positive Values): fraction of output > 0
    2. Max: maximum value of convolution output
    
    Parameters
    ----------
    X : np.ndarray
        Time series data of shape (n_samples, series_length)
    kernels : list
        List of kernel dictionaries from generate_kernels()
        
    Returns
    -------
    np.ndarray
        Feature matrix of shape (n_samples, 2 * n_kernels)
    """
    n_samples = X.shape[0]
    n_kernels = len(kernels)
    
    # 2 features per kernel: PPV and max
    features = np.zeros((n_samples, 2 * n_kernels))
    
    for i in range(n_samples):
        x = X[i]
        for k, kernel in enumerate(kernels):
            # Apply convolution
            conv_output = apply_kernel_to_series(x, kernel)
            
            # Extract PPV (proportion of positive values)
            ppv = np.mean(conv_output > 0)
            features[i, 2*k] = ppv
            
            # Extract max value
            max_val = np.max(conv_output)
            features[i, 2*k + 1] = max_val
    
    return features


# Test on a simple example
np.random.seed(42)
X_example = np.random.randn(5, 24)  # 5 samples, length 24

features = extract_features(X_example, kernels)
print(f"Input shape: {X_example.shape}")
print(f"Output features shape: {features.shape}")
print(f"Features per sample: {features.shape[1]} (2 × {len(kernels)} kernels)")



In [None]:
def fit_ridge_regression(X: np.ndarray, y: np.ndarray, alpha: float = 1.0) -> tuple:
    """
    Fit Ridge regression using the normal equation.
    
    Solves: β = (X'X + αI)^(-1) X'y
    
    Using np.linalg.solve for numerical stability instead of explicit inverse.
    
    Parameters
    ----------
    X : np.ndarray
        Feature matrix of shape (n_samples, n_features)
    y : np.ndarray
        Target values of shape (n_samples,)
    alpha : float
        Regularization strength (λ in the formula)
        
    Returns
    -------
    tuple
        (coefficients, intercept)
    """
    # Center the target for intercept
    y_mean = np.mean(y)
    y_centered = y - y_mean
    
    # Center features (optional but improves numerical stability)
    X_mean = np.mean(X, axis=0)
    X_centered = X - X_mean
    
    # Compute X'X + αI
    n_features = X.shape[1]
    XtX = X_centered.T @ X_centered
    regularized = XtX + alpha * np.eye(n_features)
    
    # Compute X'y
    Xty = X_centered.T @ y_centered
    
    # Solve for coefficients using Cholesky decomposition (more stable)
    # (X'X + αI)β = X'y
    coefficients = np.linalg.solve(regularized, Xty)
    
    # Compute intercept: y_mean - X_mean @ coefficients
    intercept = y_mean - X_mean @ coefficients
    
    return coefficients, intercept


def predict_ridge(X: np.ndarray, coefficients: np.ndarray, intercept: float) -> np.ndarray:
    """
    Make predictions using fitted Ridge coefficients.
    
    Parameters
    ----------
    X : np.ndarray
        Feature matrix
    coefficients : np.ndarray
        Fitted coefficients
    intercept : float
        Fitted intercept
        
    Returns
    -------
    np.ndarray
        Predictions
    """
    return X @ coefficients + intercept


# Test Ridge regression implementation
np.random.seed(42)
X_ridge_test = np.random.randn(100, 10)
true_coef = np.random.randn(10)
y_ridge_test = X_ridge_test @ true_coef + np.random.randn(100) * 0.1

coef, intercept = fit_ridge_regression(X_ridge_test, y_ridge_test, alpha=1.0)
y_pred = predict_ridge(X_ridge_test, coef, intercept)

print("Ridge Regression Test:")
print(f"  R² score: {1 - np.sum((y_ridge_test - y_pred)**2) / np.sum((y_ridge_test - np.mean(y_ridge_test))**2):.4f}")
print(f"  Mean Absolute Error: {np.mean(np.abs(y_ridge_test - y_pred)):.4f}")



### Complete ROCKET Regressor Pipeline

Now let's combine all components into a complete implementation:



In [None]:
class RocketRegressorFromScratch:
    """
    ROCKET Regressor implemented from scratch using NumPy.
    
    This is an educational implementation that demonstrates the core concepts
    of ROCKET for time series regression.
    """
    
    def __init__(self, n_kernels: int = 1000, alpha: float = 1.0, random_state: int = 42):
        """
        Initialize ROCKET Regressor.
        
        Parameters
        ----------
        n_kernels : int
            Number of random kernels to generate
        alpha : float
            Ridge regularization strength
        random_state : int
            Random seed for reproducibility
        """
        self.n_kernels = n_kernels
        self.alpha = alpha
        self.random_state = random_state
        self.kernels = None
        self.coefficients = None
        self.intercept = None
        
    def fit(self, X: np.ndarray, y: np.ndarray):
        """
        Fit the ROCKET regressor.
        
        Parameters
        ----------
        X : np.ndarray
            Training time series of shape (n_samples, series_length)
        y : np.ndarray
            Target values of shape (n_samples,)
        """
        series_length = X.shape[1]
        
        # Generate random kernels
        self.kernels = generate_kernels(
            self.n_kernels, 
            series_length, 
            self.random_state
        )
        
        # Extract features
        features = extract_features(X, self.kernels)
        
        # Fit Ridge regression
        self.coefficients, self.intercept = fit_ridge_regression(
            features, y, self.alpha
        )
        
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        Make predictions on new time series.
        
        Parameters
        ----------
        X : np.ndarray
            Time series of shape (n_samples, series_length)
            
        Returns
        -------
        np.ndarray
            Predictions of shape (n_samples,)
        """
        features = extract_features(X, self.kernels)
        return predict_ridge(features, self.coefficients, self.intercept)
    
    def score(self, X: np.ndarray, y: np.ndarray) -> float:
        """
        Compute R² score.
        """
        y_pred = self.predict(X)
        ss_res = np.sum((y - y_pred) ** 2)
        ss_tot = np.sum((y - np.mean(y)) ** 2)
        return 1 - ss_res / ss_tot


# Create synthetic regression data for demonstration
np.random.seed(42)
n_samples = 100
series_length = 50

# Generate time series with a pattern related to target
t = np.linspace(0, 4*np.pi, series_length)
X_synth = np.zeros((n_samples, series_length))
y_synth = np.zeros(n_samples)

for i in range(n_samples):
    freq = 0.5 + np.random.rand() * 2  # Random frequency
    amp = 0.5 + np.random.rand() * 2   # Random amplitude
    phase = np.random.rand() * 2 * np.pi
    X_synth[i] = amp * np.sin(freq * t + phase) + np.random.randn(series_length) * 0.1
    y_synth[i] = freq + amp  # Target is sum of frequency and amplitude

# Split data
split = int(0.8 * n_samples)
X_train_synth, X_test_synth = X_synth[:split], X_synth[split:]
y_train_synth, y_test_synth = y_synth[:split], y_synth[split:]

# Fit our custom implementation
rocket_scratch = RocketRegressorFromScratch(n_kernels=500, alpha=1.0)
rocket_scratch.fit(X_train_synth, y_train_synth)

y_pred_scratch = rocket_scratch.predict(X_test_synth)

print("Custom ROCKET Regressor Results:")
print(f"  R² Score: {rocket_scratch.score(X_test_synth, y_test_synth):.4f}")
print(f"  MAE: {np.mean(np.abs(y_test_synth - y_pred_scratch)):.4f}")



---

## Plotly Visualizations

Let's visualize the key components of ROCKET to build intuition about how it works.

### 1. Sample Kernels Visualization

**Intuition**: Each kernel has a unique shape determined by its random weights. The diversity of kernel shapes allows ROCKET to capture various patterns in time series - some kernels detect trends, others detect oscillations or spikes.



In [None]:
# Visualize sample kernels
fig_kernels = go.Figure()

# Select 6 diverse kernels to display
sample_indices = [0, 10, 25, 50, 75, 99]
colors = px.colors.qualitative.Set2

for idx, kernel_idx in enumerate(sample_indices):
    kernel = rocket_scratch.kernels[kernel_idx]
    weights = kernel['weights']
    
    fig_kernels.add_trace(go.Scatter(
        x=list(range(len(weights))),
        y=weights,
        mode='lines+markers',
        name=f"Kernel {kernel_idx} (len={kernel['length']}, dil={kernel['dilation']})",
        line=dict(color=colors[idx % len(colors)], width=2),
        marker=dict(size=8)
    ))

fig_kernels.update_layout(
    title="Sample ROCKET Kernels",
    xaxis_title="Position",
    yaxis_title="Weight Value",
    template="plotly_white",
    height=450,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=-0.35,
        xanchor="center",
        x=0.5
    ),
    annotations=[
        dict(
            text="Each kernel has unique weights that detect different patterns in the time series",
            xref="paper", yref="paper",
            x=0.5, y=1.08,
            showarrow=False,
            font=dict(size=11, color="gray")
        )
    ]
)

fig_kernels.show()



### 2. Convolution Outputs for Different Kernels

**Intuition**: When a kernel slides across a time series, it produces high values where the local pattern matches the kernel shape. Different kernels respond to different parts of the series, creating a rich feature representation.



In [None]:
# Visualize convolution outputs for a single time series
from plotly.subplots import make_subplots

# Take one sample time series
sample_ts = X_train_synth[0]

# Apply 3 different kernels
kernel_indices = [0, 25, 75]
conv_outputs = []
for k_idx in kernel_indices:
    conv_out = apply_kernel_to_series(sample_ts, rocket_scratch.kernels[k_idx])
    conv_outputs.append(conv_out)

# Create subplot
fig_conv = make_subplots(
    rows=2, cols=1,
    subplot_titles=("Input Time Series", "Convolution Outputs from Different Kernels"),
    row_heights=[0.35, 0.65],
    vertical_spacing=0.12
)

# Plot original time series
fig_conv.add_trace(
    go.Scatter(x=list(range(len(sample_ts))), y=sample_ts, 
               mode='lines', name='Input Series',
               line=dict(color='black', width=2)),
    row=1, col=1
)

# Plot convolution outputs
colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
for idx, (k_idx, conv_out) in enumerate(zip(kernel_indices, conv_outputs)):
    kernel = rocket_scratch.kernels[k_idx]
    ppv = np.mean(conv_out > 0)
    max_val = np.max(conv_out)
    
    fig_conv.add_trace(
        go.Scatter(
            x=list(range(len(conv_out))), 
            y=conv_out,
            mode='lines',
            name=f"Kernel {k_idx} (PPV={ppv:.2f}, Max={max_val:.2f})",
            line=dict(color=colors[idx], width=1.5)
        ),
        row=2, col=1
    )

# Add zero line to show PPV threshold
fig_conv.add_hline(y=0, line_dash="dash", line_color="red", 
                   annotation_text="PPV threshold (y=0)", row=2, col=1)

fig_conv.update_layout(
    height=550,
    template="plotly_white",
    title="How Kernels Transform Time Series",
    showlegend=True,
    legend=dict(orientation="h", yanchor="bottom", y=-0.25, xanchor="center", x=0.5)
)

fig_conv.update_xaxes(title_text="Time", row=2, col=1)
fig_conv.update_yaxes(title_text="Value", row=1, col=1)
fig_conv.update_yaxes(title_text="Convolution Output", row=2, col=1)

fig_conv.show()



### 3. Feature Importance from Ridge Coefficients

**Intuition**: Ridge regression assigns coefficients to each feature. Large absolute coefficients indicate features that strongly influence predictions. We can analyze which kernels (and which feature type - PPV vs Max) are most important.



In [None]:
# Analyze feature importance from Ridge coefficients
coefficients = rocket_scratch.coefficients

# Separate PPV and Max coefficients
ppv_coefs = coefficients[0::2]  # Even indices are PPV
max_coefs = coefficients[1::2]  # Odd indices are Max

# Create importance visualization
fig_importance = make_subplots(
    rows=1, cols=2,
    subplot_titles=("PPV Feature Coefficients", "Max Feature Coefficients"),
    horizontal_spacing=0.1
)

# Sort by absolute value for visualization
ppv_sorted_idx = np.argsort(np.abs(ppv_coefs))[::-1][:30]  # Top 30
max_sorted_idx = np.argsort(np.abs(max_coefs))[::-1][:30]

fig_importance.add_trace(
    go.Bar(
        x=[f"K{i}" for i in ppv_sorted_idx],
        y=ppv_coefs[ppv_sorted_idx],
        marker_color=['#2ecc71' if c > 0 else '#e74c3c' for c in ppv_coefs[ppv_sorted_idx]],
        name='PPV'
    ),
    row=1, col=1
)

fig_importance.add_trace(
    go.Bar(
        x=[f"K{i}" for i in max_sorted_idx],
        y=max_coefs[max_sorted_idx],
        marker_color=['#3498db' if c > 0 else '#e67e22' for c in max_coefs[max_sorted_idx]],
        name='Max'
    ),
    row=1, col=2
)

fig_importance.update_layout(
    height=400,
    template="plotly_white",
    title="Top 30 Most Important Features by Ridge Coefficient Magnitude",
    showlegend=False,
    annotations=[
        dict(
            text="Green/Blue = positive contribution, Red/Orange = negative contribution",
            xref="paper", yref="paper",
            x=0.5, y=-0.15,
            showarrow=False,
            font=dict(size=10, color="gray")
        )
    ]
)

fig_importance.update_xaxes(title_text="Kernel Index", tickangle=45)
fig_importance.update_yaxes(title_text="Coefficient", row=1, col=1)

fig_importance.show()

# Summary statistics
print("\nFeature Importance Summary:")
print(f"  Mean |PPV coefficient|: {np.mean(np.abs(ppv_coefs)):.4f}")
print(f"  Mean |Max coefficient|: {np.mean(np.abs(max_coefs)):.4f}")
print(f"  Max |PPV coefficient|: {np.max(np.abs(ppv_coefs)):.4f}")
print(f"  Max |Max coefficient|: {np.max(np.abs(max_coefs)):.4f}")



### 4. Predicted vs Actual Scatter Plot

**Intuition**: A perfect regressor would have all points on the diagonal line (y = x). Points close to the diagonal indicate accurate predictions. The spread around the line shows prediction uncertainty.



In [None]:
# Create predicted vs actual plot
fig_pred = go.Figure()

# Scatter plot of predictions
fig_pred.add_trace(go.Scatter(
    x=y_test_synth,
    y=y_pred_scratch,
    mode='markers',
    marker=dict(
        size=10,
        color=np.abs(y_test_synth - y_pred_scratch),  # Color by error
        colorscale='RdYlGn_r',
        colorbar=dict(title="Abs Error"),
        line=dict(width=1, color='white')
    ),
    name='Predictions',
    text=[f"Actual: {a:.2f}<br>Pred: {p:.2f}<br>Error: {a-p:.2f}" 
          for a, p in zip(y_test_synth, y_pred_scratch)],
    hoverinfo='text'
))

# Perfect prediction line
min_val = min(y_test_synth.min(), y_pred_scratch.min())
max_val = max(y_test_synth.max(), y_pred_scratch.max())
margin = (max_val - min_val) * 0.1

fig_pred.add_trace(go.Scatter(
    x=[min_val - margin, max_val + margin],
    y=[min_val - margin, max_val + margin],
    mode='lines',
    line=dict(color='black', dash='dash', width=2),
    name='Perfect Prediction (y=x)'
))

# Add regression line through predictions
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(y_test_synth, y_pred_scratch)
x_line = np.linspace(min_val - margin, max_val + margin, 100)
y_line = slope * x_line + intercept

fig_pred.add_trace(go.Scatter(
    x=x_line,
    y=y_line,
    mode='lines',
    line=dict(color='#3498db', width=2),
    name=f'Fitted Line (R²={r_value**2:.3f})'
))

fig_pred.update_layout(
    title="Predicted vs Actual Values",
    xaxis_title="Actual Value",
    yaxis_title="Predicted Value",
    template="plotly_white",
    height=500,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=-0.2,
        xanchor="center",
        x=0.5
    ),
    annotations=[
        dict(
            text=f"R² = {r_value**2:.4f} | MAE = {np.mean(np.abs(y_test_synth - y_pred_scratch)):.4f}",
            xref="paper", yref="paper",
            x=0.02, y=0.98,
            showarrow=False,
            font=dict(size=12),
            bgcolor="white",
            bordercolor="gray",
            borderwidth=1
        )
    ]
)

# Make axes equal
fig_pred.update_xaxes(range=[min_val - margin, max_val + margin])
fig_pred.update_yaxes(range=[min_val - margin, max_val + margin], scaleanchor="x", scaleratio=1)

fig_pred.show()



---

## Summary

### Key Takeaways

1. **ROCKET is computationally efficient**: Random kernels eliminate the need for expensive feature engineering or kernel search
2. **Two features per kernel**: PPV captures "how often" the pattern matches, Max captures "how strongly" it matches
3. **Ridge regression provides regularization**: Prevents overfitting despite having thousands of features
4. **Scalability**: Linear complexity in both number of kernels and time series length

### When to Use ROCKET Regressor

| Use Case | Recommendation |
|----------|----------------|
| Fast baseline for time series regression | ✅ Excellent choice |
| Limited training data | ✅ Good generalization with regularization |
| Interpretability required | ⚠️ Kernel coefficients provide some insight |
| Real-time predictions needed | ✅ Fast inference |
| Very long time series (>10,000 points) | ⚠️ Consider MiniRocket for efficiency |

### References

- Dempster, A., Petitjean, F., & Webb, G. I. (2020). ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. *Data Mining and Knowledge Discovery*, 34(5), 1454-1495.
- [sktime ROCKET Documentation](https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.transformations.panel.rocket.Rocket.html)

