# Ensemble Time Series Classification

Ensembles combine **diverse base classifiers** (distance, interval, dictionary, shapelet,
deep learning, etc.) and aggregate their predictions. The goal is to reduce variance,
stabilize performance across datasets, and benefit from complementary inductive biases.


## Core idea (weighted voting)
Let base classifiers produce class probabilities $p_k(y \mid x)$.
A weighted ensemble computes:
\[s(y \mid x) = \sum_{k=1}^K w_k \, p_k(y \mid x), \quad w_k \ge 0, \sum_k w_k = 1\]
Then predict $\hat{y} = \arg\max_y s(y \mid x)$.


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

classes = ["A", "B", "C"]
base = pd.DataFrame(
    {
        "Distance": [0.60, 0.30, 0.10],
        "Interval": [0.20, 0.55, 0.25],
        "Dictionary": [0.35, 0.25, 0.40],
        "Shapelet": [0.25, 0.50, 0.25],
    },
    index=classes,
)
weights = np.array([0.35, 0.25, 0.20, 0.20])
ensemble = (base.values @ weights).round(3)
base["Ensemble"] = ensemble

fig = px.bar(
    base.reset_index().melt(id_vars="index"),
    x="index",
    y="value",
    color="variable",
    barmode="group",
    title="Base probabilities vs ensemble vote",
)
fig.update_layout(xaxis_title="Class", yaxis_title="Probability")
fig.show()


## Why ensembles help
If base learners are accurate **and** make *different mistakes*, the average prediction
is more stable. A simple variance model for an average of $K$ estimators with pairwise
correlation $\rho$ is:
\[\mathrm{Var}(\bar{f}) = \frac{1}{K}\sigma^2 + \frac{K-1}{K}\rho\sigma^2\]
Lower correlation means stronger variance reduction.


In [None]:
import numpy as np
import plotly.express as px

K = 10
rho = np.linspace(0, 1, 51)
sigma2 = 1.0
var_avg = (1 / K) * sigma2 + ((K - 1) / K) * rho * sigma2

fig = px.line(x=rho, y=var_avg, title="Ensemble variance vs correlation")
fig.update_layout(xaxis_title="Correlation between base learners (rho)", yaxis_title="Var(average prediction)")
fig.show()


## sktime inventory for ensemble classifiers
sktime exposes ensembles via the registry. The exact list depends on your installed version
and optional dependencies. Use the filter below to surface ensemble-style estimators.


In [None]:
try:
    import pandas as pd
    from sktime.registry import all_estimators

    ests = all_estimators(estimator_types="classifier", as_dataframe=True)
    mask = (
        ests["name"].str.contains("Ensemble|HIVE|Proximity|COTE", case=False, na=False)
        | ests["module"].str.contains("ensemble|hive|cote|proximity", case=False, na=False)
    )
    print(ests.loc[mask, ["name", "module"]].sort_values("name").to_string(index=False))
except Exception as exc:
    print("sktime is not installed or registry lookup failed:", exc)


## When to use
- Datasets are heterogeneous and no single model family wins everywhere.
- You need robust accuracy and are willing to trade extra compute for stability.
- You can afford a validation loop to tune ensemble weights or meta-learners.
