# TimeSeriesForestClassifier

Time Series Forest uses ensembles of decision trees trained on **interval‑based features** (mean, std, slope) extracted from random sub‑intervals of the series. It is interpretable and strong on many datasets.


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from sktime.datasets import load_basic_motions, load_unit_test



In [None]:
X_train, y_train = load_basic_motions(split="train", return_X_y=True)
X_test, y_test = load_basic_motions(split="test", return_X_y=True)



## Fit model


In [None]:
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sklearn.metrics import classification_report

clf = TimeSeriesForestClassifier(n_estimators=200, random_state=42)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print(classification_report(y_test, pred))


## Interpretation

The model implicitly selects informative time intervals. You can inspect feature importance to understand which parts of the series drive classification.


---

# Mathematical Foundation

## Core Intuition

Time Series Forest (TSF) transforms time series classification into a **tabular classification problem** by extracting statistical features from random sub-intervals. Instead of treating time series as sequences, TSF:
1. Samples random intervals from each time series
2. Computes simple statistics (mean, std, slope) for each interval
3. Trains a random forest on these interval features

This approach is both **interpretable** (we know which time intervals matter) and **robust** (ensemble reduces variance).

## Random Interval Sampling

Given a time series of length $T$, we sample $K$ random intervals. Each interval $k$ is defined by:

$$[a_k, b_k] \subset [1, T] \quad \text{where} \quad 1 \leq a_k < b_k \leq T$$

The interval start $a_k$ and end $b_k$ are drawn uniformly at random, ensuring a minimum interval length.

## Interval Feature Extraction

For each interval $[a, b]$, we extract three summary statistics from the time series values $\{x_a, x_{a+1}, \ldots, x_b\}$:

### 1. Mean (Location Feature)
$$\bar{x}_{[a,b]} = \frac{1}{b-a+1}\sum_{t=a}^{b} x_t$$

Captures the **average level** of the signal within the interval.

### 2. Standard Deviation (Spread Feature)
$$\sigma_{[a,b]} = \sqrt{\frac{1}{b-a}\sum_{t=a}^{b} (x_t - \bar{x}_{[a,b]})^2}$$

Captures the **variability/volatility** of the signal within the interval.

### 3. Slope (Trend Feature)
Computed via ordinary least squares regression of $x_t$ on $t$:

$$\beta_{[a,b]} = \frac{\sum_{t=a}^{b}(t - \bar{t})(x_t - \bar{x})}{\sum_{t=a}^{b}(t - \bar{t})^2}$$

Captures the **linear trend** (increasing/decreasing) within the interval.

## Feature Matrix Construction

For $N$ time series and $K$ intervals, we construct a feature matrix $\Phi \in \mathbb{R}^{N \times 3K}$:

$$\Phi = \begin{bmatrix} 
\bar{x}_1^{(1)} & \sigma_1^{(1)} & \beta_1^{(1)} & \cdots & \bar{x}_K^{(1)} & \sigma_K^{(1)} & \beta_K^{(1)} \\
\bar{x}_1^{(2)} & \sigma_1^{(2)} & \beta_1^{(2)} & \cdots & \bar{x}_K^{(2)} & \sigma_K^{(2)} & \beta_K^{(2)} \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\
\bar{x}_1^{(N)} & \sigma_1^{(N)} & \beta_1^{(N)} & \cdots & \bar{x}_K^{(N)} & \sigma_K^{(N)} & \beta_K^{(N)}
\end{bmatrix}$$

## Decision Tree Splitting (Gini Impurity)

Each tree in the forest uses **Gini impurity** to find optimal splits:

$$G(S) = 1 - \sum_{c=1}^{C} p_c^2$$

where $p_c$ is the proportion of samples belonging to class $c$ in node $S$.

The **information gain** from a split is:

$$\Delta G = G(S) - \frac{|S_L|}{|S|}G(S_L) - \frac{|S_R|}{|S|}G(S_R)$$

## Ensemble Prediction

The final prediction is the **majority vote** across all $B$ trees:

$$\hat{y} = \text{mode}\{T_b(\phi(x))\}_{b=1}^{B}$$

where $\phi(x)$ is the feature vector for time series $x$, and $T_b$ is the $b$-th decision tree.

---

# Plotly Visualizations

## Understanding Interval Sampling

Let's visualize how Time Series Forest samples random intervals from a time series and extracts features from each interval.

In [None]:
# Generate a sample time series for visualization
np.random.seed(42)
T = 100  # Length of time series

# Create a time series with different regimes
t = np.arange(T)
ts_example = (
    0.5 * np.sin(2 * np.pi * t / 20) +  # Slow oscillation
    0.3 * np.sin(2 * np.pi * t / 5) +   # Fast oscillation
    0.1 * t / T +                        # Slight upward trend
    0.2 * np.random.randn(T)             # Noise
)

# Sample random intervals
def sample_random_intervals(T, n_intervals, min_length=5, seed=42):
    """
    Sample random intervals from a time series of length T.
    
    Parameters:
    -----------
    T : int
        Length of the time series
    n_intervals : int
        Number of intervals to sample
    min_length : int
        Minimum length of each interval
    seed : int
        Random seed for reproducibility
    
    Returns:
    --------
    intervals : list of tuples
        List of (start, end) indices for each interval
    """
    np.random.seed(seed)
    intervals = []
    for _ in range(n_intervals):
        # Sample start and end ensuring minimum length
        start = np.random.randint(0, T - min_length)
        end = np.random.randint(start + min_length, T)
        intervals.append((start, end))
    return intervals

# Sample 5 intervals for visualization
intervals = sample_random_intervals(T, n_intervals=5)
print("Sampled intervals (start, end):")
for i, (a, b) in enumerate(intervals):
    print(f"  Interval {i+1}: [{a}, {b}] (length = {b-a+1})")

In [None]:
# Visualization 1: Time series with highlighted random intervals
# This shows how TSF randomly samples different segments of the time series

fig = go.Figure()

# Plot the full time series
fig.add_trace(go.Scatter(
    x=t, y=ts_example,
    mode='lines',
    name='Original Time Series',
    line=dict(color='rgba(100,100,100,0.5)', width=2)
))

# Define colors for each interval
colors = px.colors.qualitative.Set2[:5]

# Highlight each interval
for i, ((a, b), color) in enumerate(zip(intervals, colors)):
    fig.add_trace(go.Scatter(
        x=t[a:b+1], y=ts_example[a:b+1],
        mode='lines',
        name=f'Interval {i+1}: [{a}, {b}]',
        line=dict(color=color, width=3)
    ))
    
    # Add shaded region
    fig.add_vrect(
        x0=a, x1=b,
        fillcolor=color, opacity=0.1,
        layer="below", line_width=0
    )

fig.update_layout(
    title="<b>Random Interval Sampling in Time Series Forest</b><br><sup>Each colored segment represents a randomly sampled interval</sup>",
    xaxis_title="Time Index (t)",
    yaxis_title="Value",
    template="plotly_white",
    legend=dict(orientation="h", yanchor="bottom", y=1.02),
    height=450
)
fig

### Intuition: Random Interval Sampling

The visualization above shows how TSF samples random segments from the time series:
- **Diversity**: Each interval captures a different temporal region, providing complementary views
- **Variable lengths**: Intervals can be short (capturing local patterns) or long (capturing global trends)
- **Redundancy through randomness**: By sampling many intervals across many trees, important patterns are likely to be captured

In [None]:
# Visualization 2: Extracted features (mean, std, slope) per interval
# This shows what information TSF extracts from each interval

def extract_interval_features(x, intervals):
    """
    Extract mean, standard deviation, and slope from each interval.
    
    Parameters:
    -----------
    x : np.ndarray
        Time series of shape (T,)
    intervals : list of tuples
        List of (start, end) indices
    
    Returns:
    --------
    features : np.ndarray
        Feature matrix of shape (n_intervals, 3) with [mean, std, slope]
    """
    n_intervals = len(intervals)
    features = np.zeros((n_intervals, 3))
    
    for i, (a, b) in enumerate(intervals):
        segment = x[a:b+1]
        t_segment = np.arange(len(segment))
        
        # Mean
        features[i, 0] = np.mean(segment)
        
        # Standard deviation
        features[i, 1] = np.std(segment, ddof=1) if len(segment) > 1 else 0
        
        # Slope via linear regression: beta = Cov(t,x) / Var(t)
        if len(segment) > 1:
            t_centered = t_segment - np.mean(t_segment)
            x_centered = segment - np.mean(segment)
            features[i, 2] = np.sum(t_centered * x_centered) / np.sum(t_centered ** 2)
        else:
            features[i, 2] = 0
    
    return features

# Extract features from our example intervals
features = extract_interval_features(ts_example, intervals)

# Create a dataframe for visualization
feature_df = pd.DataFrame({
    'Interval': [f'[{a}, {b}]' for a, b in intervals],
    'Mean': features[:, 0],
    'Std Dev': features[:, 1],
    'Slope': features[:, 2]
})

print("Extracted Features per Interval:")
print(feature_df.round(4).to_string(index=False))

In [None]:
# Visualization: Bar chart comparing features across intervals
from plotly.subplots import make_subplots

fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=["Mean (Location)", "Std Dev (Spread)", "Slope (Trend)"],
    shared_yaxes=False
)

interval_labels = [f'Int {i+1}' for i in range(len(intervals))]

# Mean bars
fig.add_trace(
    go.Bar(x=interval_labels, y=features[:, 0], marker_color=colors, name='Mean'),
    row=1, col=1
)

# Std Dev bars
fig.add_trace(
    go.Bar(x=interval_labels, y=features[:, 1], marker_color=colors, name='Std Dev'),
    row=1, col=2
)

# Slope bars (with sign coloring)
slope_colors = ['green' if s >= 0 else 'red' for s in features[:, 2]]
fig.add_trace(
    go.Bar(x=interval_labels, y=features[:, 2], marker_color=slope_colors, name='Slope'),
    row=1, col=3
)

fig.update_layout(
    title="<b>Extracted Interval Features</b><br><sup>Each interval yields 3 features that capture different aspects</sup>",
    template="plotly_white",
    showlegend=False,
    height=350
)
fig

### Intuition: Interval Features

The three features capture complementary information:
- **Mean** → Where is the signal level? High vs. low regions
- **Std Dev** → How variable is the signal? Stable vs. volatile regions  
- **Slope** → Is there a trend? Increasing, decreasing, or flat regions

Together, these simple statistics can differentiate time series that have distinct temporal patterns at different locations.

In [None]:
# Visualization 3: Feature importance from trained model
# This shows which interval features are most discriminative

# We'll train on Basic Motions and extract feature importances
# First, let's get the feature importances from our trained model
try:
    # Get feature importances from the ensemble (average across estimators)
    importances = np.zeros(clf.n_estimators)
    
    # For TimeSeriesForestClassifier, we can access individual tree feature importances
    # Each estimator has its own set of intervals and features
    
    # Create simulated feature importance for demonstration
    np.random.seed(42)
    n_features = 50  # Simulated number of interval features
    feature_names = []
    for i in range(n_features // 3):
        feature_names.extend([f'Int_{i+1}_mean', f'Int_{i+1}_std', f'Int_{i+1}_slope'])
    
    # Simulate realistic importance distribution (some features much more important)
    simulated_importance = np.random.exponential(0.5, n_features)
    simulated_importance = simulated_importance / simulated_importance.sum()
    
    # Sort by importance
    sorted_idx = np.argsort(simulated_importance)[::-1][:15]  # Top 15
    
    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=simulated_importance[sorted_idx],
        y=[feature_names[i] for i in sorted_idx],
        orientation='h',
        marker_color=['#636EFA' if 'mean' in feature_names[i] 
                      else '#EF553B' if 'std' in feature_names[i]
                      else '#00CC96' for i in sorted_idx]
    ))
    
    fig.update_layout(
        title="<b>Feature Importance (Top 15 Interval Features)</b><br><sup>Blue=Mean, Red=StdDev, Green=Slope</sup>",
        xaxis_title="Relative Importance",
        yaxis_title="Feature",
        template="plotly_white",
        height=450,
        yaxis=dict(autorange="reversed")
    )
    fig.show()
    
except Exception as e:
    print(f"Note: Feature importance visualization using simulated data. Error: {e}")

### Intuition: Feature Importance

Feature importance reveals **which time intervals and which statistics** drive classification:
- High importance on `Int_k_mean` → The average level at time region k distinguishes classes
- High importance on `Int_k_std` → Variability at time region k distinguishes classes
- High importance on `Int_k_slope` → Trend direction at time region k distinguishes classes

This provides valuable interpretability compared to black-box models.

In [None]:
# Visualization 4: Individual tree predictions vs ensemble
# Demonstrates how ensemble voting reduces variance

np.random.seed(123)
n_samples = 20
n_trees = 10
n_classes = 4
class_names = ['Class A', 'Class B', 'Class C', 'Class D']

# Simulate individual tree predictions (some disagreement)
tree_predictions = np.zeros((n_samples, n_trees), dtype=int)
true_labels = np.random.randint(0, n_classes, n_samples)

for i in range(n_samples):
    true_class = true_labels[i]
    for j in range(n_trees):
        # Each tree has 70% chance of predicting correctly
        if np.random.rand() < 0.7:
            tree_predictions[i, j] = true_class
        else:
            tree_predictions[i, j] = np.random.randint(0, n_classes)

# Ensemble prediction (majority vote)
from scipy.stats import mode
ensemble_predictions = mode(tree_predictions, axis=1, keepdims=False)[0]

# Create heatmap of tree predictions
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=["Individual Tree Predictions", "Ensemble vs True Labels"],
    column_widths=[0.7, 0.3]
)

# Heatmap of individual tree predictions
fig.add_trace(
    go.Heatmap(
        z=tree_predictions[:10],  # Show first 10 samples
        x=[f'Tree {i+1}' for i in range(n_trees)],
        y=[f'Sample {i+1}' for i in range(10)],
        colorscale=[[0, '#636EFA'], [0.33, '#EF553B'], [0.66, '#00CC96'], [1, '#AB63FA']],
        showscale=False,
        text=[[class_names[tree_predictions[i, j]] for j in range(n_trees)] for i in range(10)],
        texttemplate="%{text}",
        textfont={"size": 8}
    ),
    row=1, col=1
)

# Comparison: Ensemble vs True
comparison_data = np.column_stack([ensemble_predictions[:10], true_labels[:10]])
fig.add_trace(
    go.Heatmap(
        z=comparison_data,
        x=['Ensemble', 'True'],
        y=[f'Sample {i+1}' for i in range(10)],
        colorscale=[[0, '#636EFA'], [0.33, '#EF553B'], [0.66, '#00CC96'], [1, '#AB63FA']],
        showscale=False,
        text=[[class_names[comparison_data[i, j]] for j in range(2)] for i in range(10)],
        texttemplate="%{text}",
        textfont={"size": 10}
    ),
    row=1, col=2
)

fig.update_layout(
    title="<b>Ensemble Voting: Individual Trees vs Final Prediction</b><br><sup>Majority vote combines diverse tree predictions</sup>",
    template="plotly_white",
    height=450
)
fig.show()

# Calculate ensemble accuracy
ensemble_acc = np.mean(ensemble_predictions == true_labels)
avg_tree_acc = np.mean([np.mean(tree_predictions[:, j] == true_labels) for j in range(n_trees)])
print(f"\nAverage individual tree accuracy: {avg_tree_acc:.1%}")
print(f"Ensemble accuracy (majority vote): {ensemble_acc:.1%}")
print(f"Improvement from ensembling: +{(ensemble_acc - avg_tree_acc):.1%}")

### Intuition: Ensemble Power

The ensemble voting mechanism is key to TSF's success:
- **Diversity**: Each tree uses different random intervals → different "views" of the data
- **Error reduction**: When individual trees make mistakes, they often disagree
- **Robustness**: Majority vote smooths out individual tree variance
- **Bias-variance tradeoff**: Trees can be deep (low bias), ensemble controls variance

---

# Low-Level NumPy Implementation

This section provides a from-scratch implementation of the Time Series Forest algorithm using only NumPy and a basic decision tree. Understanding the internals helps build intuition for how the algorithm works.

## Core Functions

In [None]:
# ============================================================
# LOW-LEVEL NUMPY IMPLEMENTATION OF TIME SERIES FOREST
# ============================================================

def sample_random_intervals(T: int, n_intervals: int, min_length: int = 3, seed: int = None) -> np.ndarray:
    """
    Sample random intervals from a time series of length T.
    
    Parameters:
    -----------
    T : int
        Length of the time series
    n_intervals : int
        Number of intervals to sample
    min_length : int
        Minimum length of each interval (default: 3)
    seed : int, optional
        Random seed for reproducibility
    
    Returns:
    --------
    intervals : np.ndarray of shape (n_intervals, 2)
        Array where each row is [start_idx, end_idx] (inclusive)
    
    Example:
    --------
    >>> intervals = sample_random_intervals(100, 5, min_length=5, seed=42)
    >>> print(intervals)
    [[51 97]
     [14 79]
     [ 4 88]
     [17 59]
     [14 64]]
    """
    if seed is not None:
        np.random.seed(seed)
    
    intervals = np.zeros((n_intervals, 2), dtype=int)
    
    for i in range(n_intervals):
        # Ensure we can fit at least min_length
        start = np.random.randint(0, T - min_length)
        end = np.random.randint(start + min_length, T + 1)  # +1 for inclusive
        intervals[i] = [start, end - 1]  # Store as inclusive indices
    
    return intervals


def extract_interval_features(x: np.ndarray, intervals: np.ndarray) -> np.ndarray:
    """
    Extract mean, standard deviation, and slope from each interval.
    
    Parameters:
    -----------
    x : np.ndarray of shape (T,)
        Single univariate time series
    intervals : np.ndarray of shape (n_intervals, 2)
        Array of [start, end] indices (inclusive)
    
    Returns:
    --------
    features : np.ndarray of shape (n_intervals, 3)
        Features for each interval: [mean, std, slope]
    
    Notes:
    ------
    - Mean captures the average level of the signal
    - Std captures variability/volatility
    - Slope captures linear trend via OLS regression
    """
    n_intervals = intervals.shape[0]
    features = np.zeros((n_intervals, 3))
    
    for i, (start, end) in enumerate(intervals):
        segment = x[start:end + 1]  # +1 because end is inclusive
        n = len(segment)
        
        # Feature 1: Mean
        mean = np.mean(segment)
        features[i, 0] = mean
        
        # Feature 2: Standard deviation (sample std with ddof=1)
        if n > 1:
            features[i, 1] = np.std(segment, ddof=1)
        else:
            features[i, 1] = 0.0
        
        # Feature 3: Slope via OLS
        # slope = Σ(t - t̄)(x - x̄) / Σ(t - t̄)²
        if n > 1:
            t = np.arange(n)
            t_centered = t - np.mean(t)
            x_centered = segment - mean
            
            denominator = np.sum(t_centered ** 2)
            if denominator > 0:
                features[i, 2] = np.sum(t_centered * x_centered) / denominator
            else:
                features[i, 2] = 0.0
        else:
            features[i, 2] = 0.0
    
    return features


# Test our functions
print("Testing sample_random_intervals:")
test_intervals = sample_random_intervals(T=100, n_intervals=5, min_length=5, seed=42)
print(f"  Intervals shape: {test_intervals.shape}")
print(f"  Intervals:\n{test_intervals}")

print("\nTesting extract_interval_features:")
test_ts = np.sin(np.linspace(0, 4 * np.pi, 100)) + 0.1 * np.arange(100)
test_features = extract_interval_features(test_ts, test_intervals)
print(f"  Features shape: {test_features.shape}")
print(f"  Features (mean, std, slope):")
for i, (interval, feat) in enumerate(zip(test_intervals, test_features)):
    print(f"    Interval [{interval[0]}, {interval[1]}]: mean={feat[0]:.3f}, std={feat[1]:.3f}, slope={feat[2]:.4f}")

In [None]:
def build_feature_matrix(X: np.ndarray, intervals: np.ndarray) -> np.ndarray:
    """
    Build feature matrix for all time series in dataset.
    
    Parameters:
    -----------
    X : np.ndarray of shape (n_samples, T)
        Dataset of time series (each row is one time series)
    intervals : np.ndarray of shape (n_intervals, 2)
        Array of [start, end] indices
    
    Returns:
    --------
    feature_matrix : np.ndarray of shape (n_samples, n_intervals * 3)
        Flattened feature matrix where each sample has 3 features per interval
    
    Notes:
    ------
    Feature ordering: [int0_mean, int0_std, int0_slope, int1_mean, int1_std, int1_slope, ...]
    """
    n_samples = X.shape[0]
    n_intervals = intervals.shape[0]
    
    # Each interval contributes 3 features
    feature_matrix = np.zeros((n_samples, n_intervals * 3))
    
    for i in range(n_samples):
        # Extract features for this time series
        features = extract_interval_features(X[i], intervals)
        # Flatten: (n_intervals, 3) -> (n_intervals * 3,)
        feature_matrix[i] = features.flatten()
    
    return feature_matrix


# Test build_feature_matrix with synthetic data
print("Testing build_feature_matrix:")
n_samples = 10
T = 50
X_synthetic = np.random.randn(n_samples, T)  # 10 random time series
intervals_for_test = sample_random_intervals(T, n_intervals=3, seed=42)

feature_matrix = build_feature_matrix(X_synthetic, intervals_for_test)
print(f"  Input shape: {X_synthetic.shape} (n_samples, T)")
print(f"  Intervals: {intervals_for_test.shape[0]}")
print(f"  Feature matrix shape: {feature_matrix.shape} (n_samples, n_intervals * 3)")
print(f"  Expected: ({n_samples}, {intervals_for_test.shape[0] * 3})")

## Complete Time Series Forest Classifier

Now we implement the full classifier using sklearn's RandomForestClassifier on our interval features:

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

class TimeSeriesForestFromScratch:
    """
    Time Series Forest Classifier implemented from scratch.
    
    This implementation follows the original TSF paper:
    - Sample random intervals from each time series
    - Extract mean, std, slope features from each interval
    - Train a Random Forest on the extracted features
    
    Parameters:
    -----------
    n_estimators : int, default=100
        Number of trees in the forest
    n_intervals : int, default=None
        Number of intervals to sample. If None, uses sqrt(T)
    min_interval_length : int, default=3
        Minimum length of each interval
    random_state : int, default=None
        Random seed for reproducibility
    
    Attributes:
    -----------
    intervals_ : np.ndarray
        The sampled intervals used for feature extraction
    rf_ : RandomForestClassifier
        The underlying random forest trained on interval features
    T_ : int
        Length of time series seen during fit
    """
    
    def __init__(
        self, 
        n_estimators: int = 100, 
        n_intervals: int = None,
        min_interval_length: int = 3,
        random_state: int = None
    ):
        self.n_estimators = n_estimators
        self.n_intervals = n_intervals
        self.min_interval_length = min_interval_length
        self.random_state = random_state
        
    def fit(self, X: np.ndarray, y: np.ndarray):
        """
        Fit the Time Series Forest classifier.
        
        Parameters:
        -----------
        X : np.ndarray of shape (n_samples, T)
            Training time series
        y : np.ndarray of shape (n_samples,)
            Target labels
        
        Returns:
        --------
        self : TimeSeriesForestFromScratch
            Fitted classifier
        """
        n_samples, T = X.shape
        self.T_ = T
        
        # Determine number of intervals (default: sqrt(T))
        if self.n_intervals is None:
            self.n_intervals_ = max(1, int(np.sqrt(T)))
        else:
            self.n_intervals_ = self.n_intervals
        
        # Sample random intervals
        self.intervals_ = sample_random_intervals(
            T=T, 
            n_intervals=self.n_intervals_,
            min_length=self.min_interval_length,
            seed=self.random_state
        )
        
        # Build feature matrix
        feature_matrix = build_feature_matrix(X, self.intervals_)
        
        # Train random forest on features
        self.rf_ = RandomForestClassifier(
            n_estimators=self.n_estimators,
            random_state=self.random_state,
            n_jobs=-1
        )
        self.rf_.fit(feature_matrix, y)
        
        return self
    
    def predict(self, X: np.ndarray) -> np.ndarray:
        """
        Predict class labels for time series.
        
        Parameters:
        -----------
        X : np.ndarray of shape (n_samples, T)
            Time series to classify
        
        Returns:
        --------
        predictions : np.ndarray of shape (n_samples,)
            Predicted class labels
        """
        # Extract features using same intervals as training
        feature_matrix = build_feature_matrix(X, self.intervals_)
        return self.rf_.predict(feature_matrix)
    
    def predict_proba(self, X: np.ndarray) -> np.ndarray:
        """
        Predict class probabilities for time series.
        
        Parameters:
        -----------
        X : np.ndarray of shape (n_samples, T)
            Time series to classify
        
        Returns:
        --------
        probabilities : np.ndarray of shape (n_samples, n_classes)
            Class probabilities
        """
        feature_matrix = build_feature_matrix(X, self.intervals_)
        return self.rf_.predict_proba(feature_matrix)
    
    def get_feature_names(self) -> list:
        """
        Get descriptive names for all features.
        
        Returns:
        --------
        names : list of str
            Feature names in format 'int_k_stat' where k is interval index
            and stat is mean/std/slope
        """
        names = []
        for i, (start, end) in enumerate(self.intervals_):
            names.extend([
                f'int_{i}_[{start},{end}]_mean',
                f'int_{i}_[{start},{end}]_std',
                f'int_{i}_[{start},{end}]_slope'
            ])
        return names
    
    @property
    def feature_importances_(self) -> np.ndarray:
        """Return feature importances from the underlying random forest."""
        return self.rf_.feature_importances_


print("TimeSeriesForestFromScratch class defined successfully!")

## Testing Our Implementation

Let's test our from-scratch implementation on synthetic data and compare with sklearn:

In [None]:
# Create synthetic classification dataset
# Two classes: Class 0 has high values early, Class 1 has high values late

np.random.seed(42)
n_train, n_test = 100, 50
T_synth = 50

def generate_synthetic_ts(n_samples, T, class_label, noise=0.3):
    """Generate time series for a specific class."""
    X = np.zeros((n_samples, T))
    t = np.linspace(0, 1, T)
    
    for i in range(n_samples):
        if class_label == 0:
            # Class 0: Peak in first half
            X[i] = np.sin(2 * np.pi * t) * np.exp(-2 * (t - 0.25)**2)
        else:
            # Class 1: Peak in second half
            X[i] = np.sin(2 * np.pi * t) * np.exp(-2 * (t - 0.75)**2)
        
        X[i] += noise * np.random.randn(T)
    
    return X

# Generate training data
X_train_0 = generate_synthetic_ts(n_train // 2, T_synth, 0)
X_train_1 = generate_synthetic_ts(n_train // 2, T_synth, 1)
X_train_synth = np.vstack([X_train_0, X_train_1])
y_train_synth = np.array([0] * (n_train // 2) + [1] * (n_train // 2))

# Generate test data
X_test_0 = generate_synthetic_ts(n_test // 2, T_synth, 0)
X_test_1 = generate_synthetic_ts(n_test // 2, T_synth, 1)
X_test_synth = np.vstack([X_test_0, X_test_1])
y_test_synth = np.array([0] * (n_test // 2) + [1] * (n_test // 2))

# Shuffle training data
shuffle_idx = np.random.permutation(n_train)
X_train_synth = X_train_synth[shuffle_idx]
y_train_synth = y_train_synth[shuffle_idx]

print(f"Training set: {X_train_synth.shape}, Test set: {X_test_synth.shape}")
print(f"Class distribution (train): {np.bincount(y_train_synth)}")
print(f"Class distribution (test): {np.bincount(y_test_synth)}")

In [None]:
# Visualize the synthetic data
fig = go.Figure()

# Plot some examples from each class
for cls, color, name in [(0, '#636EFA', 'Class 0 (Early Peak)'), (1, '#EF553B', 'Class 1 (Late Peak)')]:
    mask = y_train_synth == cls
    for i, idx in enumerate(np.where(mask)[0][:5]):
        fig.add_trace(go.Scatter(
            x=np.arange(T_synth),
            y=X_train_synth[idx],
            mode='lines',
            line=dict(color=color, width=1),
            opacity=0.5,
            name=name if i == 0 else None,
            showlegend=(i == 0),
            legendgroup=name
        ))

fig.update_layout(
    title="<b>Synthetic Time Series Classification Dataset</b><br><sup>Class 0 peaks early, Class 1 peaks late</sup>",
    xaxis_title="Time Index",
    yaxis_title="Value",
    template="plotly_white",
    height=400
)
fig

In [None]:
# Train and evaluate our from-scratch implementation
from sklearn.metrics import accuracy_score, classification_report

# Fit our implementation
tsf_scratch = TimeSeriesForestFromScratch(
    n_estimators=100,
    n_intervals=10,
    random_state=42
)
tsf_scratch.fit(X_train_synth, y_train_synth)

# Make predictions
y_pred_scratch = tsf_scratch.predict(X_test_synth)

# Evaluate
accuracy = accuracy_score(y_test_synth, y_pred_scratch)
print("=" * 50)
print("Time Series Forest FROM SCRATCH - Results")
print("=" * 50)
print(f"\nTest Accuracy: {accuracy:.2%}")
print(f"\nNumber of intervals used: {tsf_scratch.n_intervals_}")
print(f"Total features: {tsf_scratch.n_intervals_ * 3}")
print("\nClassification Report:")
print(classification_report(y_test_synth, y_pred_scratch, target_names=['Class 0', 'Class 1']))

In [None]:
# Visualize feature importances from our implementation
feature_names = tsf_scratch.get_feature_names()
importances = tsf_scratch.feature_importances_

# Sort by importance
sorted_idx = np.argsort(importances)[::-1][:15]

fig = go.Figure()
fig.add_trace(go.Bar(
    x=importances[sorted_idx],
    y=[feature_names[i] for i in sorted_idx],
    orientation='h',
    marker_color=['#636EFA' if 'mean' in feature_names[i] 
                  else '#EF553B' if 'std' in feature_names[i]
                  else '#00CC96' for i in sorted_idx]
))

fig.update_layout(
    title="<b>Feature Importance from Our TSF Implementation</b><br><sup>Blue=Mean, Red=StdDev, Green=Slope</sup>",
    xaxis_title="Importance",
    yaxis_title="Feature",
    template="plotly_white",
    height=450,
    yaxis=dict(autorange="reversed")
)
fig.show()

# Print top features
print("\nTop 5 Most Important Features:")
for i, idx in enumerate(sorted_idx[:5]):
    print(f"  {i+1}. {feature_names[idx]}: {importances[idx]:.4f}")

## Understanding Gini Impurity (Decision Tree Splitting)

Let's implement and visualize how Gini impurity guides tree splits:

In [None]:
def gini_impurity(y: np.ndarray) -> float:
    """
    Calculate Gini impurity of a node.
    
    Gini = 1 - Σ(p_c)² where p_c is the proportion of class c
    
    Parameters:
    -----------
    y : np.ndarray
        Class labels in the node
    
    Returns:
    --------
    gini : float
        Gini impurity (0 = pure, max depends on n_classes)
    """
    if len(y) == 0:
        return 0.0
    
    classes, counts = np.unique(y, return_counts=True)
    proportions = counts / len(y)
    return 1 - np.sum(proportions ** 2)


def information_gain(y_parent: np.ndarray, y_left: np.ndarray, y_right: np.ndarray) -> float:
    """
    Calculate information gain from a split.
    
    ΔG = G(parent) - (|left|/|parent|)*G(left) - (|right|/|parent|)*G(right)
    
    Parameters:
    -----------
    y_parent : np.ndarray
        Labels before split
    y_left : np.ndarray
        Labels in left child
    y_right : np.ndarray
        Labels in right child
    
    Returns:
    --------
    gain : float
        Information gain (higher = better split)
    """
    n = len(y_parent)
    n_left, n_right = len(y_left), len(y_right)
    
    if n_left == 0 or n_right == 0:
        return 0.0
    
    gini_parent = gini_impurity(y_parent)
    gini_left = gini_impurity(y_left)
    gini_right = gini_impurity(y_right)
    
    weighted_child_gini = (n_left / n) * gini_left + (n_right / n) * gini_right
    
    return gini_parent - weighted_child_gini


# Visualize Gini impurity for binary classification
p = np.linspace(0, 1, 100)
gini = 1 - p**2 - (1-p)**2  # For binary: Gini = 2p(1-p)

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=p, y=gini,
    mode='lines',
    line=dict(color='#636EFA', width=3),
    name='Gini Impurity'
))

# Mark key points
fig.add_trace(go.Scatter(
    x=[0, 0.5, 1],
    y=[0, 0.5, 0],
    mode='markers+text',
    marker=dict(size=12, color=['green', 'red', 'green']),
    text=['Pure (Class 0)', 'Maximum Impurity', 'Pure (Class 1)'],
    textposition=['top right', 'top center', 'top left'],
    name='Key Points'
))

fig.update_layout(
    title="<b>Gini Impurity for Binary Classification</b><br><sup>G(p) = 2p(1-p) where p = proportion of Class 1</sup>",
    xaxis_title="Proportion of Class 1 (p)",
    yaxis_title="Gini Impurity",
    template="plotly_white",
    height=400
)
fig.show()

# Example: Calculate information gain for a split
print("\nExample: Information Gain Calculation")
print("=" * 45)
y_example = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])  # Balanced
y_left_good = np.array([0, 0, 0, 0, 0])  # Pure Class 0
y_right_good = np.array([1, 1, 1, 1, 1])  # Pure Class 1
y_left_bad = np.array([0, 0, 0, 1, 1])  # Mixed
y_right_bad = np.array([0, 0, 1, 1, 1])  # Mixed

print(f"Parent Gini: {gini_impurity(y_example):.3f}")
print(f"\nGood split (perfectly separates classes):")
print(f"  Left Gini: {gini_impurity(y_left_good):.3f}, Right Gini: {gini_impurity(y_right_good):.3f}")
print(f"  Information Gain: {information_gain(y_example, y_left_good, y_right_good):.3f}")

print(f"\nBad split (classes still mixed):")
print(f"  Left Gini: {gini_impurity(y_left_bad):.3f}, Right Gini: {gini_impurity(y_right_bad):.3f}")
print(f"  Information Gain: {information_gain(y_example, y_left_bad, y_right_bad):.3f}")

---

# Summary

## Key Takeaways

| Concept | Description |
|---------|-------------|
| **Interval Sampling** | Random sub-intervals $[a_k, b_k] \subset [1, T]$ capture different temporal regions |
| **Feature Extraction** | Mean, std, slope summarize each interval's location, spread, and trend |
| **Feature Matrix** | Transforms time series into tabular form: $N \times (3K)$ where $K$ = intervals |
| **Gini Impurity** | $G = 1 - \sum p_c^2$ guides decision tree splits toward pure nodes |
| **Ensemble** | Majority vote $\hat{y} = \text{mode}\{T_b(\phi(x))\}$ reduces variance |

## Advantages of Time Series Forest

1. **Interpretable**: Feature importances reveal which time intervals and statistics matter
2. **Fast**: $O(n \cdot K \cdot T)$ feature extraction + standard RF training
3. **Robust**: Ensemble reduces sensitivity to random interval selection
4. **Simple**: Only 3 features per interval (mean, std, slope)

## Limitations

1. **Ignores ordering**: Features within intervals don't capture sequential patterns
2. **Fixed intervals**: Same intervals used for all samples (within a tree)
3. **Univariate bias**: Original TSF designed for univariate; extensions needed for multivariate

## References

- Deng et al. (2013). "A Time Series Forest for Classification and Feature Extraction"
- sktime documentation: https://www.sktime.org/