# TimeSeriesForestRegressor

**Regression counterpart of Time Series Forest: interval-based features + ensemble of decision trees.**

The Time Series Forest Regressor (TSF-R) adapts the classification-focused Time Series Forest algorithm for regression tasks. It extracts summary statistics from random intervals of time series data and uses these as features for an ensemble of decision tree regressors.

## Key Concepts

- **Interval Sampling**: Randomly select subsequences from the time series
- **Feature Extraction**: Compute mean, standard deviation, and slope for each interval
- **Ensemble Learning**: Combine predictions from multiple trees for robust regression
- **Temporal Locality**: Intervals capture local patterns at different time scales

## 1. Mathematical Foundation

### Interval Sampling

For a time series of length $T$, we sample $K$ random intervals. Each interval $[a_k, b_k]$ is drawn uniformly at random:

$$[a_k, b_k] \subset [1, T], \quad \text{where } a_k < b_k$$

The interval endpoints are sampled such that:
- $a_k \sim \text{Uniform}(1, T - \ell_{\min})$
- $b_k \sim \text{Uniform}(a_k + \ell_{\min}, T)$

where $\ell_{\min}$ is the minimum interval length.

### Interval Summary Statistics

For each interval $[a, b]$, we compute three summary statistics:

**Mean (location):**
$$\mu_{[a,b]} = \frac{1}{b-a+1} \sum_{t=a}^{b} x_t$$

**Standard Deviation (spread):**
$$\sigma_{[a,b]} = \sqrt{\frac{1}{b-a} \sum_{t=a}^{b} (x_t - \mu_{[a,b]})^2}$$

**Slope (trend):**
$$\beta_{[a,b]} = \frac{\sum_{t=a}^{b}(t - \bar{t})(x_t - \mu_{[a,b]})}{\sum_{t=a}^{b}(t - \bar{t})^2}$$

where $\bar{t} = \frac{a+b}{2}$ is the mean time index.

### Feature Map

The complete feature representation concatenates all interval features:

$$\phi(x) = [\mu_1, \sigma_1, \beta_1, \mu_2, \sigma_2, \beta_2, ..., \mu_K, \sigma_K, \beta_K] \in \mathbb{R}^{3K}$$

### Ensemble Prediction

The final prediction averages across $B$ decision trees:

$$\hat{y} = \frac{1}{B}\sum_{b=1}^{B} T_b(\phi(x))$$

### MSE Split Criterion

Each tree uses the Mean Squared Error criterion for node splitting:

$$\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i - \bar{y})^2$$

The best split minimizes the weighted sum of child node MSEs:

$$\text{Split Score} = \frac{n_L}{n}\text{MSE}_L + \frac{n_R}{n}\text{MSE}_R$$

## 2. Low-Level NumPy Implementation

Let's build the Time Series Forest Regressor from scratch to understand each component.

In [None]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from typing import List, Tuple


def sample_intervals(series_length: int, n_intervals: int, min_length: int = 3, 
                     random_state: int = None) -> List[Tuple[int, int]]:
    """
    Sample random intervals from a time series.
    
    Parameters
    ----------
    series_length : int
        Length of the time series T
    n_intervals : int
        Number of intervals K to sample
    min_length : int
        Minimum interval length (default: 3)
    random_state : int
        Random seed for reproducibility
        
    Returns
    -------
    intervals : List[Tuple[int, int]]
        List of (start, end) tuples representing intervals
    """
    rng = np.random.default_rng(random_state)
    intervals = []
    
    for _ in range(n_intervals):
        # Sample start position (leave room for min_length)
        max_start = series_length - min_length
        start = rng.integers(0, max_start + 1)
        
        # Sample end position (at least min_length from start)
        min_end = start + min_length
        end = rng.integers(min_end, series_length + 1)
        
        intervals.append((start, end))
    
    return intervals


def compute_interval_features(x: np.ndarray, start: int, end: int) -> Tuple[float, float, float]:
    """
    Compute summary statistics for a single interval.
    
    Parameters
    ----------
    x : np.ndarray
        Time series values (1D array)
    start : int
        Interval start index (inclusive)
    end : int
        Interval end index (exclusive)
        
    Returns
    -------
    mean : float
        Mean of values in interval
    std : float
        Standard deviation of values in interval
    slope : float
        Linear regression slope of values in interval
    """
    segment = x[start:end]
    n = len(segment)
    
    # Mean (location statistic)
    mean = np.mean(segment)
    
    # Standard deviation (spread statistic)
    std = np.std(segment, ddof=1) if n > 1 else 0.0
    
    # Slope via linear regression (trend statistic)
    # Using least squares: slope = Cov(t, x) / Var(t)
    t = np.arange(n)
    t_centered = t - t.mean()
    x_centered = segment - mean
    
    var_t = np.sum(t_centered ** 2)
    if var_t > 0:
        slope = np.sum(t_centered * x_centered) / var_t
    else:
        slope = 0.0
    
    return mean, std, slope


def build_regression_features(X: np.ndarray, intervals: List[Tuple[int, int]]) -> np.ndarray:
    """
    Build feature matrix from time series using interval features.
    
    Parameters
    ----------
    X : np.ndarray
        Time series data, shape (n_samples, series_length)
    intervals : List[Tuple[int, int]]
        List of (start, end) interval tuples
        
    Returns
    -------
    features : np.ndarray
        Feature matrix, shape (n_samples, 3 * n_intervals)
        Each interval contributes [mean, std, slope]
    """
    n_samples = X.shape[0]
    n_intervals = len(intervals)
    features = np.zeros((n_samples, 3 * n_intervals))
    
    for i in range(n_samples):
        for j, (start, end) in enumerate(intervals):
            mean, std, slope = compute_interval_features(X[i], start, end)
            features[i, 3*j] = mean
            features[i, 3*j + 1] = std
            features[i, 3*j + 2] = slope
    
    return features


# Demonstration with a simple example
print("Interval Sampling Example:")
print("-" * 40)

series_length = 50
n_intervals = 5
intervals = sample_intervals(series_length, n_intervals, min_length=5, random_state=42)

for i, (start, end) in enumerate(intervals):
    print(f"Interval {i+1}: [{start:2d}, {end:2d}) → length = {end - start}")

In [None]:
class SimpleTimeSeriesForestRegressor:
    """
    A simplified Time Series Forest Regressor implementation.
    
    This implementation demonstrates the core algorithm:
    1. Sample random intervals for each tree
    2. Extract interval features (mean, std, slope)
    3. Train decision tree regressors on interval features
    4. Average predictions from all trees
    """
    
    def __init__(self, n_estimators: int = 10, n_intervals: int = None,
                 min_interval_length: int = 3, max_depth: int = None,
                 random_state: int = None):
        self.n_estimators = n_estimators
        self.n_intervals = n_intervals
        self.min_interval_length = min_interval_length
        self.max_depth = max_depth
        self.random_state = random_state
        
        self.trees_ = []
        self.intervals_ = []
        
    def fit(self, X: np.ndarray, y: np.ndarray):
        """Fit the ensemble to training data."""
        n_samples, series_length = X.shape
        
        # Default: sqrt(series_length) intervals per tree
        if self.n_intervals is None:
            self.n_intervals = max(1, int(np.sqrt(series_length)))
        
        rng = np.random.default_rng(self.random_state)
        
        for b in range(self.n_estimators):
            # Sample intervals for this tree
            seed = rng.integers(0, 2**31)
            intervals = sample_intervals(
                series_length, 
                self.n_intervals, 
                self.min_interval_length,
                random_state=seed
            )
            self.intervals_.append(intervals)
            
            # Build features and train tree
            features = build_regression_features(X, intervals)
            tree = DecisionTreeRegressor(max_depth=self.max_depth, random_state=seed)
            tree.fit(features, y)
            self.trees_.append(tree)
        
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """Predict using ensemble averaging."""
        predictions = np.zeros((X.shape[0], self.n_estimators))
        
        for b, (tree, intervals) in enumerate(zip(self.trees_, self.intervals_)):
            features = build_regression_features(X, intervals)
            predictions[:, b] = tree.predict(features)
        
        # Ensemble average
        return predictions.mean(axis=1)
    
    def predict_individual(self, X: np.ndarray) -> np.ndarray:
        """Get predictions from each individual tree."""
        predictions = np.zeros((X.shape[0], self.n_estimators))
        
        for b, (tree, intervals) in enumerate(zip(self.trees_, self.intervals_)):
            features = build_regression_features(X, intervals)
            predictions[:, b] = tree.predict(features)
        
        return predictions


# Demonstrate feature extraction on a single series
print("Feature Extraction Example:")
print("-" * 40)

# Create a sample time series with trend and variation
np.random.seed(42)
sample_series = np.sin(np.linspace(0, 4*np.pi, 50)) + np.random.randn(50) * 0.2

print(f"Series length: {len(sample_series)}")
print(f"\nFeatures for each interval:")
for i, (start, end) in enumerate(intervals):
    mean, std, slope = compute_interval_features(sample_series, start, end)
    print(f"  Interval {i+1} [{start:2d}:{end:2d}]: μ={mean:+.3f}, σ={std:.3f}, β={slope:+.4f}")

## 3. Plotly Visualizations

### 3.1 Time Series with Interval Overlays

**Intuition**: Visualizing how random intervals cover different portions of the time series helps understand how TSF captures patterns at multiple scales and locations. Each colored region represents an interval from which features are extracted.

In [None]:
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Generate synthetic regression dataset
np.random.seed(42)
n_samples = 100
series_length = 50

# Create time series with different patterns that influence the target
X_synthetic = np.zeros((n_samples, series_length))
y_synthetic = np.zeros(n_samples)

for i in range(n_samples):
    # Random frequency and amplitude
    freq = np.random.uniform(0.5, 2.0)
    amp = np.random.uniform(0.5, 2.0)
    trend = np.random.uniform(-0.02, 0.02)
    noise = np.random.randn(series_length) * 0.2
    
    t = np.arange(series_length)
    X_synthetic[i] = amp * np.sin(2 * np.pi * freq * t / series_length) + trend * t + noise
    
    # Target is a function of the patterns
    y_synthetic[i] = 2 * amp + 5 * freq + 50 * trend + np.random.randn() * 0.5

# Visualize a sample series with interval overlays
sample_idx = 0
sample_series = X_synthetic[sample_idx]

# Sample intervals for visualization
vis_intervals = sample_intervals(series_length, 5, min_length=5, random_state=123)

# Create the plot
fig = go.Figure()

# Add the time series line
fig.add_trace(go.Scatter(
    x=list(range(series_length)),
    y=sample_series,
    mode='lines',
    name='Time Series',
    line=dict(color='black', width=2)
))

# Color palette for intervals
colors = px.colors.qualitative.Set2

# Add colored regions for each interval
for i, (start, end) in enumerate(vis_intervals):
    color = colors[i % len(colors)]
    
    # Add shaded region
    fig.add_vrect(
        x0=start, x1=end,
        fillcolor=color, opacity=0.3,
        layer="below", line_width=0,
    )
    
    # Add interval segment highlight
    fig.add_trace(go.Scatter(
        x=list(range(start, end)),
        y=sample_series[start:end],
        mode='lines',
        name=f'Interval {i+1}: [{start}, {end})',
        line=dict(color=color, width=3)
    ))

fig.update_layout(
    title="Time Series with Random Interval Overlays",
    xaxis_title="Time Index",
    yaxis_title="Value",
    template="plotly_white",
    height=400,
    showlegend=True,
    legend=dict(x=1.02, y=1)
)

fig

### 3.2 Feature Extraction Demonstration

**Intuition**: This visualization shows how the three summary statistics (mean, std, slope) are computed for each interval. The mean captures the central tendency, std measures local variability, and slope captures the local trend direction.

In [None]:
# Feature extraction visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=[
        "Time Series with Intervals",
        "Mean (μ) per Interval",
        "Std Dev (σ) per Interval",
        "Slope (β) per Interval"
    ],
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)

# Compute features for each interval
interval_names = []
means = []
stds = []
slopes = []

for i, (start, end) in enumerate(vis_intervals):
    mean, std, slope = compute_interval_features(sample_series, start, end)
    interval_names.append(f"[{start},{end})")
    means.append(mean)
    stds.append(std)
    slopes.append(slope)

# Plot 1: Time series with intervals
fig.add_trace(
    go.Scatter(x=list(range(series_length)), y=sample_series, 
               mode='lines', name='Series', line=dict(color='black')),
    row=1, col=1
)

for i, (start, end) in enumerate(vis_intervals):
    color = colors[i % len(colors)]
    # Add horizontal line for mean
    mean = means[i]
    fig.add_trace(
        go.Scatter(
            x=[start, end], y=[mean, mean],
            mode='lines', name=f'μ_{i+1}',
            line=dict(color=color, dash='dash', width=2),
            showlegend=False
        ),
        row=1, col=1
    )

# Plot 2: Mean bar chart
fig.add_trace(
    go.Bar(x=interval_names, y=means, marker_color=colors[:len(vis_intervals)], 
           name='Mean', showlegend=False),
    row=1, col=2
)

# Plot 3: Std bar chart
fig.add_trace(
    go.Bar(x=interval_names, y=stds, marker_color=colors[:len(vis_intervals)],
           name='Std', showlegend=False),
    row=2, col=1
)

# Plot 4: Slope bar chart with color coding
slope_colors = ['green' if s > 0 else 'red' for s in slopes]
fig.add_trace(
    go.Bar(x=interval_names, y=slopes, marker_color=slope_colors,
           name='Slope', showlegend=False),
    row=2, col=2
)

fig.update_layout(
    title="Feature Extraction from Intervals: Mean (μ), Std (σ), Slope (β)",
    template="plotly_white",
    height=500,
    showlegend=False
)

fig.update_xaxes(title_text="Time", row=1, col=1)
fig.update_xaxes(title_text="Interval", row=1, col=2)
fig.update_xaxes(title_text="Interval", row=2, col=1)
fig.update_xaxes(title_text="Interval", row=2, col=2)

fig.update_yaxes(title_text="Value", row=1, col=1)
fig.update_yaxes(title_text="μ", row=1, col=2)
fig.update_yaxes(title_text="σ", row=2, col=1)
fig.update_yaxes(title_text="β", row=2, col=2)

fig

### 3.3 Individual Tree Predictions

**Intuition**: Each tree in the ensemble makes its own prediction based on different random intervals. This visualization shows the variance in individual tree predictions. The ensemble average (dotted line) is typically more stable than any single tree, demonstrating the power of bagging.

In [None]:
# Split data for training/testing
train_size = 70
X_train_synth = X_synthetic[:train_size]
y_train_synth = y_synthetic[:train_size]
X_test_synth = X_synthetic[train_size:]
y_test_synth = y_synthetic[train_size:]

# Train our custom TSF Regressor
n_trees = 20
tsf = SimpleTimeSeriesForestRegressor(
    n_estimators=n_trees, 
    n_intervals=10,
    max_depth=5,
    random_state=42
)
tsf.fit(X_train_synth, y_train_synth)

# Get individual tree predictions
individual_preds = tsf.predict_individual(X_test_synth)
ensemble_preds = tsf.predict(X_test_synth)

# Create scatter plot of individual predictions
fig = go.Figure()

# Add individual tree predictions (jittered for visibility)
for tree_idx in range(n_trees):
    fig.add_trace(go.Scatter(
        x=y_test_synth + np.random.randn(len(y_test_synth)) * 0.1,  # slight jitter
        y=individual_preds[:, tree_idx],
        mode='markers',
        name=f'Tree {tree_idx + 1}',
        marker=dict(size=5, opacity=0.5),
        showlegend=False
    ))

# Add ensemble predictions (larger markers)
fig.add_trace(go.Scatter(
    x=y_test_synth,
    y=ensemble_preds,
    mode='markers',
    name='Ensemble Average',
    marker=dict(size=12, color='red', symbol='diamond', line=dict(width=1, color='black'))
))

# Add perfect prediction line
min_val = min(y_test_synth.min(), individual_preds.min())
max_val = max(y_test_synth.max(), individual_preds.max())
fig.add_trace(go.Scatter(
    x=[min_val, max_val],
    y=[min_val, max_val],
    mode='lines',
    name='Perfect Prediction',
    line=dict(color='gray', dash='dash', width=2)
))

fig.update_layout(
    title=f"Individual Tree Predictions vs Ensemble ({n_trees} trees)",
    xaxis_title="Actual Value",
    yaxis_title="Predicted Value",
    template="plotly_white",
    height=500,
    legend=dict(x=0.02, y=0.98)
)

fig.show()

# Print statistics
tree_mses = np.mean((individual_preds - y_test_synth.reshape(-1, 1))**2, axis=0)
ensemble_mse = np.mean((ensemble_preds - y_test_synth)**2)
print(f"Individual Tree MSE: mean={tree_mses.mean():.3f}, std={tree_mses.std():.3f}")
print(f"Ensemble MSE: {ensemble_mse:.3f}")

### 3.4 Ensemble Prediction vs Actual with Error Bars

**Intuition**: The error bars represent the standard deviation of predictions across all trees in the ensemble. Narrow error bars indicate high agreement among trees (confident prediction), while wide bars suggest uncertainty. This is a form of uncertainty quantification inherent to ensemble methods.

In [None]:
# Calculate prediction uncertainty (std across trees)
pred_std = individual_preds.std(axis=1)

# Sort by actual value for cleaner visualization
sort_idx = np.argsort(y_test_synth)
y_sorted = y_test_synth[sort_idx]
pred_sorted = ensemble_preds[sort_idx]
std_sorted = pred_std[sort_idx]

# Create figure with error bars
fig = go.Figure()

# Add perfect prediction line
fig.add_trace(go.Scatter(
    x=list(range(len(y_sorted))),
    y=y_sorted,
    mode='lines+markers',
    name='Actual',
    line=dict(color='blue', width=2),
    marker=dict(size=8)
))

# Add predictions with error bars
fig.add_trace(go.Scatter(
    x=list(range(len(y_sorted))),
    y=pred_sorted,
    mode='markers',
    name='Ensemble Prediction',
    marker=dict(size=10, color='red', symbol='diamond'),
    error_y=dict(
        type='data',
        array=std_sorted,
        visible=True,
        color='rgba(255, 0, 0, 0.5)',
        thickness=1.5,
        width=4
    )
))

fig.update_layout(
    title="Ensemble Predictions with Uncertainty Intervals (±1 std)",
    xaxis_title="Sample Index (sorted by actual value)",
    yaxis_title="Value",
    template="plotly_white",
    height=450,
    legend=dict(x=0.02, y=0.98)
)

fig.show()

# Additional: Prediction residuals
residuals = ensemble_preds - y_test_synth

fig2 = go.Figure()
fig2.add_trace(go.Scatter(
    x=y_test_synth,
    y=residuals,
    mode='markers',
    marker=dict(
        size=10,
        color=pred_std,
        colorscale='RdYlGn_r',
        colorbar=dict(title="Prediction<br>Uncertainty"),
        showscale=True
    ),
    text=[f"Uncertainty: {s:.2f}" for s in pred_std],
    hovertemplate="Actual: %{x:.2f}<br>Residual: %{y:.2f}<br>%{text}<extra></extra>"
))

fig2.add_hline(y=0, line_dash="dash", line_color="gray")

fig2.update_layout(
    title="Residual Plot (colored by prediction uncertainty)",
    xaxis_title="Actual Value",
    yaxis_title="Residual (Predicted - Actual)",
    template="plotly_white",
    height=400
)

fig2.show()

## 4. Sktime Implementation

Now let's use the production-ready `TimeSeriesForestRegressor` from sktime and compare it with our low-level implementation.

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from sktime.datasets import load_basic_motions, load_unit_test



In [None]:
X_train, y_train = load_unit_test(split="train", return_X_y=True)
X_test, y_test = load_unit_test(split="test", return_X_y=True)



In [None]:
from sktime.regression.interval_based import TimeSeriesForestRegressor
from sklearn.metrics import mean_absolute_error

model = TimeSeriesForestRegressor(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
pred = model.predict(X_test)
print("MAE:", mean_absolute_error(y_test, pred))


## 5. Key Takeaways

### Algorithm Summary

| Component | Description |
|-----------|-------------|
| **Interval Sampling** | Randomly select $K$ subsequences $[a_k, b_k]$ from each time series |
| **Feature Extraction** | Compute mean $\mu$, std $\sigma$, and slope $\beta$ for each interval |
| **Feature Vector** | $\phi(x) \in \mathbb{R}^{3K}$ concatenates all interval features |
| **Ensemble** | $B$ decision trees trained on interval features |
| **Prediction** | Average of tree predictions: $\hat{y} = \frac{1}{B}\sum_{b=1}^{B} T_b(\phi(x))$ |

### Advantages

- ✅ **Interpretable features**: Mean, std, slope have clear meanings
- ✅ **Captures multi-scale patterns**: Random intervals span different lengths
- ✅ **Efficient**: Linear time feature extraction, no expensive distance computations
- ✅ **Uncertainty estimation**: Tree variance provides confidence intervals
- ✅ **Robust**: Ensemble averaging reduces overfitting

### Limitations

- ⚠️ **Fixed statistics**: Only mean, std, slope may miss complex patterns
- ⚠️ **Random intervals**: May miss important specific subsequences
- ⚠️ **No explicit temporal ordering**: Features don't preserve interval order

### Hyperparameters

| Parameter | Typical Range | Effect |
|-----------|--------------|--------|
| `n_estimators` | 100-500 | More trees → lower variance, higher compute |
| `n_intervals` | $\sqrt{T}$ to $T$ | More intervals → richer representation |
| `min_interval_length` | 3-10 | Minimum feature resolution |
| `max_depth` | None or 5-20 | Controls tree complexity |