# Ensemble Time Series Classification

Ensembles combine **diverse base classifiers** (distance, interval, dictionary, shapelet,
deep learning, etc.) and aggregate their predictions. The goal is to reduce variance,
stabilize performance across datasets, and benefit from complementary inductive biases.


## Core idea (weighted voting)
Let base classifiers produce class probabilities $p_k(y \mid x)$.
A weighted ensemble computes:
\[s(y \mid x) = \sum_{k=1}^K w_k \, p_k(y \mid x), \quad w_k \ge 0, \sum_k w_k = 1\]
Then predict $\hat{y} = \arg\max_y s(y \mid x)$.


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

classes = ["A", "B", "C"]
base = pd.DataFrame(
    {
        "Distance": [0.60, 0.30, 0.10],
        "Interval": [0.20, 0.55, 0.25],
        "Dictionary": [0.35, 0.25, 0.40],
        "Shapelet": [0.25, 0.50, 0.25],
    },
    index=classes,
)
weights = np.array([0.35, 0.25, 0.20, 0.20])
ensemble = (base.values @ weights).round(3)
base["Ensemble"] = ensemble

fig = px.bar(
    base.reset_index().melt(id_vars="index"),
    x="index",
    y="value",
    color="variable",
    barmode="group",
    title="Base probabilities vs ensemble vote",
)
fig.update_layout(xaxis_title="Class", yaxis_title="Probability")
fig

## Why ensembles help
If base learners are accurate **and** make *different mistakes*, the average prediction
is more stable. A simple variance model for an average of $K$ estimators with pairwise
correlation $\rho$ is:
\[\mathrm{Var}(\bar{f}) = \frac{1}{K}\sigma^2 + \frac{K-1}{K}\rho\sigma^2\]
Lower correlation means stronger variance reduction.


In [None]:
import numpy as np
import plotly.express as px

K = 10
rho = np.linspace(0, 1, 51)
sigma2 = 1.0
var_avg = (1 / K) * sigma2 + ((K - 1) / K) * rho * sigma2

fig = px.line(x=rho, y=var_avg, title="Ensemble variance vs correlation")
fig.update_layout(xaxis_title="Correlation between base learners (rho)", yaxis_title="Var(average prediction)")
fig

## sktime inventory for ensemble classifiers
sktime exposes ensembles via the registry. The exact list depends on your installed version
and optional dependencies. Use the filter below to surface ensemble-style estimators.


In [None]:
try:
    import pandas as pd
    from sktime.registry import all_estimators

    ests = all_estimators(estimator_types="classifier", as_dataframe=True)
    mask = (
        ests["name"].str.contains("Ensemble|HIVE|Proximity|COTE", case=False, na=False)
        | ests["module"].str.contains("ensemble|hive|cote|proximity", case=False, na=False)
    )
    print(ests.loc[mask, ["name", "module"]].sort_values("name").to_string(index=False))
except Exception as exc:
    print("sktime is not installed or registry lookup failed:", exc)


## When to use
- Datasets are heterogeneous and no single model family wins everywhere.
- You need robust accuracy and are willing to trade extra compute for stability.
- You can afford a validation loop to tune ensemble weights or meta-learners.


---

## Low-Level NumPy Implementation

Below we implement core ensemble techniques from scratch using NumPy.
This helps understand the mechanics behind **bagging**, **weighted voting**, and **stacking**.

### 1. Bagging (Bootstrap Aggregating)

Bagging creates diversity by training each base classifier on a **bootstrap sample**
(sampling with replacement). The final prediction is obtained by **majority voting**.

For a dataset with $n$ samples, each bootstrap sample draws $n$ examples with replacement.
On average, about $1 - (1 - 1/n)^n \approx 63.2\%$ of unique samples appear in each bootstrap.

In [None]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


def bootstrap_sample(X: np.ndarray, y: np.ndarray, random_state: int = None) -> tuple:
    """
    Generate a bootstrap sample (sampling with replacement).
    
    Parameters
    ----------
    X : np.ndarray of shape (n_samples, n_features)
    y : np.ndarray of shape (n_samples,)
    random_state : int, optional
    
    Returns
    -------
    X_boot, y_boot : bootstrap samples
    """
    rng = np.random.default_rng(random_state)
    n_samples = X.shape[0]
    indices = rng.integers(0, n_samples, size=n_samples)
    return X[indices], y[indices]


def train_base_classifiers(X: np.ndarray, y: np.ndarray, n_estimators: int = 10,
                            random_state: int = 42) -> list:
    """
    Train multiple base classifiers on bootstrap samples.
    
    Parameters
    ----------
    X : np.ndarray of shape (n_samples, n_features)
    y : np.ndarray of shape (n_samples,)
    n_estimators : int, number of base classifiers
    random_state : int
    
    Returns
    -------
    classifiers : list of fitted classifiers
    """
    classifiers = []
    rng = np.random.default_rng(random_state)
    
    for i in range(n_estimators):
        X_boot, y_boot = bootstrap_sample(X, y, random_state=rng.integers(0, 10000))
        clf = DecisionTreeClassifier(max_depth=5, random_state=rng.integers(0, 10000))
        clf.fit(X_boot, y_boot)
        classifiers.append(clf)
    
    return classifiers


def bagging_predict(classifiers: list, X_test: np.ndarray) -> np.ndarray:
    """
    Aggregate predictions using majority voting.
    
    Parameters
    ----------
    classifiers : list of fitted classifiers
    X_test : np.ndarray of shape (n_samples, n_features)
    
    Returns
    -------
    predictions : np.ndarray of shape (n_samples,)
    """
    # Collect predictions from all classifiers: shape (n_estimators, n_samples)
    all_preds = np.array([clf.predict(X_test) for clf in classifiers])
    
    # Majority vote for each sample
    predictions = []
    for i in range(X_test.shape[0]):
        sample_preds = all_preds[:, i]
        unique, counts = np.unique(sample_preds, return_counts=True)
        predictions.append(unique[np.argmax(counts)])
    
    return np.array(predictions)


# Demo: Bagging on synthetic data
X, y = make_classification(n_samples=500, n_features=20, n_informative=10,
                           n_classes=3, n_clusters_per_class=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

classifiers = train_base_classifiers(X_train, y_train, n_estimators=15)
y_pred = bagging_predict(classifiers, X_test)

print(f"Bagging Ensemble Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Single Tree Accuracy:      {accuracy_score(y_test, classifiers[0].predict(X_test)):.3f}")

### 2. Weighted Voting

Instead of equal votes, we can assign **weights** to each classifier based on their
validation performance. The ensemble prediction becomes:

$$\hat{y} = \arg\max_c \sum_{k=1}^K w_k \cdot \mathbb{1}[\hat{y}_k = c]$$

where $w_k \ge 0$ and $\sum_k w_k = 1$. Optimal weights can be found by minimizing
cross-entropy or maximizing accuracy on a validation set.

In [None]:
from scipy.optimize import minimize


def weighted_vote(predictions: np.ndarray, weights: np.ndarray, n_classes: int) -> np.ndarray:
    """
    Compute weighted voting across multiple classifier predictions.
    
    Parameters
    ----------
    predictions : np.ndarray of shape (n_estimators, n_samples)
        Predictions from each classifier
    weights : np.ndarray of shape (n_estimators,)
        Weight for each classifier (must sum to 1)
    n_classes : int
        Number of classes
    
    Returns
    -------
    final_preds : np.ndarray of shape (n_samples,)
    """
    n_estimators, n_samples = predictions.shape
    final_preds = np.zeros(n_samples, dtype=int)
    
    for i in range(n_samples):
        # Accumulate weighted votes for each class
        class_scores = np.zeros(n_classes)
        for k in range(n_estimators):
            class_scores[predictions[k, i]] += weights[k]
        final_preds[i] = np.argmax(class_scores)
    
    return final_preds


def optimize_weights(predictions: np.ndarray, y_true: np.ndarray, n_classes: int) -> np.ndarray:
    """
    Find optimal weights by maximizing accuracy on validation data.
    Uses constrained optimization with softmax parameterization.
    
    Parameters
    ----------
    predictions : np.ndarray of shape (n_estimators, n_samples)
    y_true : np.ndarray of shape (n_samples,)
    n_classes : int
    
    Returns
    -------
    optimal_weights : np.ndarray of shape (n_estimators,)
    """
    n_estimators = predictions.shape[0]
    
    def neg_accuracy(log_weights):
        # Softmax to ensure weights sum to 1
        weights = np.exp(log_weights) / np.sum(np.exp(log_weights))
        preds = weighted_vote(predictions, weights, n_classes)
        return -accuracy_score(y_true, preds)
    
    # Initial equal weights
    x0 = np.zeros(n_estimators)
    result = minimize(neg_accuracy, x0, method='Nelder-Mead')
    
    optimal_log_weights = result.x
    optimal_weights = np.exp(optimal_log_weights) / np.sum(np.exp(optimal_log_weights))
    
    return optimal_weights


# Demo: Weighted Voting
# Get predictions from all classifiers on validation set
all_predictions = np.array([clf.predict(X_test) for clf in classifiers])
n_classes = len(np.unique(y))

# Optimize weights
optimal_weights = optimize_weights(all_predictions, y_test, n_classes)

# Compare equal vs optimal weights
equal_weights = np.ones(len(classifiers)) / len(classifiers)
equal_preds = weighted_vote(all_predictions, equal_weights, n_classes)
optimal_preds = weighted_vote(all_predictions, optimal_weights, n_classes)

print(f"Equal Weights Accuracy:   {accuracy_score(y_test, equal_preds):.3f}")
print(f"Optimal Weights Accuracy: {accuracy_score(y_test, optimal_preds):.3f}")
print(f"\nOptimal weights: {np.round(optimal_weights, 3)}")

In [None]:
# Visualize weight distribution
import plotly.express as px

weight_df = pd.DataFrame({
    'Classifier': [f'Tree_{i+1}' for i in range(len(classifiers))],
    'Optimal Weight': optimal_weights,
    'Equal Weight': equal_weights
}).melt(id_vars='Classifier', var_name='Weight Type', value_name='Weight')

fig = px.bar(weight_df, x='Classifier', y='Weight', color='Weight Type',
             barmode='group', title='Classifier Weight Distribution')
fig.update_layout(xaxis_tickangle=-45)
fig

### 3. Stacking Meta-Learner

Stacking uses a **meta-learner** (level-1 model) to combine base classifier outputs.
Instead of simple voting, we train a model to learn optimal combination weights.

The process:
1. Generate **meta-features**: predictions (or probabilities) from base classifiers
2. Train a meta-learner (e.g., logistic regression) on these meta-features
3. At inference, pass base predictions through the meta-learner

$$\hat{y}_{stack} = f_{meta}\big([\hat{p}_1(x), \hat{p}_2(x), \ldots, \hat{p}_K(x)]\big)$$

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_predict


def generate_meta_features(classifiers: list, X: np.ndarray, 
                           use_probabilities: bool = True) -> np.ndarray:
    """
    Generate meta-features from base classifier predictions.
    
    Parameters
    ----------
    classifiers : list of fitted classifiers
    X : np.ndarray of shape (n_samples, n_features)
    use_probabilities : bool
        If True, use predicted probabilities; otherwise use class predictions
    
    Returns
    -------
    meta_features : np.ndarray of shape (n_samples, n_meta_features)
    """
    if use_probabilities:
        # Stack probability predictions: shape (n_samples, n_estimators * n_classes)
        proba_list = [clf.predict_proba(X) for clf in classifiers]
        meta_features = np.hstack(proba_list)
    else:
        # Stack class predictions: shape (n_samples, n_estimators)
        pred_list = [clf.predict(X).reshape(-1, 1) for clf in classifiers]
        meta_features = np.hstack(pred_list)
    
    return meta_features


def train_meta_learner(meta_features: np.ndarray, y: np.ndarray) -> LogisticRegression:
    """
    Train a logistic regression meta-learner on meta-features.
    
    Parameters
    ----------
    meta_features : np.ndarray of shape (n_samples, n_meta_features)
    y : np.ndarray of shape (n_samples,)
    
    Returns
    -------
    meta_learner : fitted LogisticRegression
    """
    meta_learner = LogisticRegression(max_iter=1000, multi_class='multinomial', random_state=42)
    meta_learner.fit(meta_features, y)
    return meta_learner


def stacking_predict(base_classifiers: list, meta_learner, X: np.ndarray,
                     use_probabilities: bool = True) -> np.ndarray:
    """
    Make predictions using stacking ensemble.
    
    Parameters
    ----------
    base_classifiers : list of fitted classifiers
    meta_learner : fitted meta-learner model
    X : np.ndarray of shape (n_samples, n_features)
    use_probabilities : bool
    
    Returns
    -------
    predictions : np.ndarray of shape (n_samples,)
    """
    meta_features = generate_meta_features(base_classifiers, X, use_probabilities)
    return meta_learner.predict(meta_features)


# Demo: Stacking with proper train/validation split for meta-features
# To avoid leakage, we use cross-validated predictions for training meta-learner

# Re-split data: train for base classifiers, validation for meta-learner
X_base, X_meta, y_base, y_meta = train_test_split(X_train, y_train, test_size=0.5, random_state=42)

# Train base classifiers on X_base
base_classifiers = train_base_classifiers(X_base, y_base, n_estimators=10)

# Generate meta-features using X_meta (held-out validation data)
meta_features_train = generate_meta_features(base_classifiers, X_meta, use_probabilities=True)

# Train meta-learner on held-out predictions
meta_learner = train_meta_learner(meta_features_train, y_meta)

# Predict on test set
stacking_preds = stacking_predict(base_classifiers, meta_learner, X_test, use_probabilities=True)

print(f"Stacking Ensemble Accuracy: {accuracy_score(y_test, stacking_preds):.3f}")
print(f"Bagging (majority vote):    {accuracy_score(y_test, bagging_predict(base_classifiers, X_test)):.3f}")

In [None]:
# Visualize meta-learner coefficients (importance of each base classifier's predictions)
coef_matrix = meta_learner.coef_  # shape: (n_classes, n_meta_features)
n_classes_viz = coef_matrix.shape[0]
n_classifiers = len(base_classifiers)

# Average absolute coefficient per base classifier (across all classes and their probability outputs)
avg_importance = np.abs(coef_matrix).reshape(n_classes_viz, n_classifiers, n_classes_viz).mean(axis=(0, 2))

fig = px.bar(
    x=[f'Base_{i+1}' for i in range(n_classifiers)],
    y=avg_importance,
    title='Meta-Learner: Average Importance per Base Classifier',
    labels={'x': 'Base Classifier', 'y': 'Average |Coefficient|'}
)
fig

In [None]:
# Compare all methods
methods = ['Single Tree', 'Bagging (Majority)', 'Weighted Voting', 'Stacking']
accuracies = [
    accuracy_score(y_test, base_classifiers[0].predict(X_test)),
    accuracy_score(y_test, bagging_predict(base_classifiers, X_test)),
    accuracy_score(y_test, optimal_preds),
    accuracy_score(y_test, stacking_preds)
]

fig = px.bar(
    x=methods, y=accuracies,
    title='Ensemble Methods Comparison',
    labels={'x': 'Method', 'y': 'Accuracy'},
    color=accuracies,
    color_continuous_scale='Viridis'
)
fig.update_layout(showlegend=False)
fig