# Bake off redux: a review and experimental evaluation of recent time series classification algorithms

This is the webpage and repo package to support the paper "Bake off redux: a review and experimental evaluation of recent time series classification algorithms" submitted to Springer Machine Learning (ML).

Our results files are stored [here](https://github.com/time-series-machine-learning/tsml-eval/tree/main/tsml_eval/publications/y2023/tsc_bakeoff/results).

## Datasets

The 112 UCR archive datasets are available at [timeseriesclassification.com](http://www.timeseriesclassification.com/dataset.php).

The 30 new datasets will be uploaded to the [timeseriesclassification.com](http://www.timeseriesclassification.com) website in due course. For now, we provide the following link:

<https://drive.google.com/file/d/1T7-A8XQYISLg-Ne-9glAKXxGy4H8FrQV/view?usp=sharing>

## Install

To install the latest version of the package with up-to-date algorithms, run:

    pip install tsml-eval

To install the package at the time of publication, run:

    pip install tsml-eval==0.1.0

Not all estimator dependencies are installed by default. You can install these individually as required or use the following dependency groups when installing:

    pip install tsml-eval[all_extras,deep_learning]

To install dependency versions used at the time of publication, use the publication requirements.txt:

    pip install -r tsml_eval/publications/2023/tsc_bakeoff/static_publication_reqs.txt

## Usage

### Command Line

Run [run_experiments.py](https://github.com/time-series-machine-learning/tsml-eval/blob/main/tsml_eval/publications/y2023/tsc_bakeoff/run_experiments.py) with the following arguments:

1. Path to the data directory

2. Path to the results directory

3. The name of the model to run (see [set_bakeoff_classifier.py](https://github.com/time-series-machine-learning/tsml-eval/blob/main/tsml_eval/publications/y2023/tsc_bakeoff/set_bakeoff_classifier.py), i.e. R-STSF, HC2, InceptionTime)

4. The name of the problem to run

5. The resample number to run (0 is base train/test split)

i.e. to run ItalyPowerDemand using HIVE-COTE V2 on the base train/test split:

    python tsml_eval/publications/2023/tsc_bakeoff/run_experiments.py data/ results/ HC2 ItalyPowerDemand 0

### Exactly Reproducing Results

To better compare to past results and publications, our results on the 112 UCR datasets use the randomly generated resamples from the Java [tsml](https://github.com/time-series-machine-learning/tsml-java) package. To use these resample with our code, a flag must be toggled in the experiments file main method and individual files for each resample must be present in the data directory. These resamples in .ts file format are available for download here:

https://mega.nz/file/ViMDgCJT#Q70StCshEWFzT8CEN5y-TrB9W-W3tApfPqWWx-qbuUg - 112 UCR datasets using Java tsml resamples

The 30 new datasets used in our experiments use the resampling available by default in our experiments file. An exception to this is ProximityForest, which is implemented in Java and uses the Java resampling as a result.

### Java Classifier Implementations

Three of the classifiers used in our comparison were implemented in Java due to a lack of Python implementations which function reliably and are capable of accurately reproducing published results. These classifiers are the ElasticEnsemble, ProximityForest and TS-CHIEF. We use the implementations from the Java [tsml](https://github.com/time-series-machine-learning/tsml-java) package from revisions where they are available. We make two jar files available for download which contain the implementations of these classifiers:

https://drive.google.com/file/d/1oXxpSa5PT9sBuVAbt57TLMANv4TMEejI/view?usp=sharing - TS-CHIEF and ProximityForest

https://drive.google.com/file/d/1Vmgg5u7SE2jmsakHVlxPxvT_AfaZ151e/view?usp=sharing - ElasticEnsemble

These jar files can be run from the command line using the following commands similar to the above Python classifiers:

    java -jar tsml-ee.jar -dp=data/ -rp=results/  -cn="FastEE" -dn="ItalyPowerDemand" -f=0

or

    java -jar tsml-forest.jar -dp=data/ -rp=results/ -cn="ProximityForest" -dn="ItalyPowerDemand" -f=0

or

    java -jar tsml-forest.jar -dp=data/ -rp=results/  -cn="TS-CHIEF" -dn="ItalyPowerDemand" -f=0

### Using Classifiers

Most of our classifiers are available in the `aeon` Python package.

The classifiers used in our experiments extend the `scikit-learn` interface and can also be used like their estimators:

In [1]:
import warnings

warnings.filterwarnings("ignore")

from aeon.classification.interval_based import TimeSeriesForestClassifier
from sklearn.metrics import accuracy_score
from tsml.datasets import load_minimal_chinatown

from tsml_eval.estimators import SklearnToTsmlClassifier
from tsml_eval.publications.y2023.tsc_bakeoff import _set_bakeoff_classifier
from tsml_eval.utils.validation import is_sklearn_classifier

Data can be loaded using whichever method is most convenient, but should be formatted as either a 3D numpy array of shape (n_samples, n_channels, n_timesteps) or a list of length (n_samples) containing 2D numpy arrays of shape (n_channels, n_timesteps).

A function is available for loading from .ts files.

In [2]:
# load example classification dataset
X_train, y_train = load_minimal_chinatown("TRAIN")
X_test, y_test = load_minimal_chinatown("TEST")

# data can be loaded from .ts files using the following function
# from tsml.datasets import load_from_ts_file
# X, y = load_from_ts_file("data/data.ts")

print(type(X_train), type(y_train))
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
X_train[:5]

<class 'numpy.ndarray'> <class 'numpy.ndarray'>
(20, 1, 24) (20,)
(20, 1, 24) (20,)


array([[[ 573.,  375.,  301.,  212.,   55.,   34.,   25.,   33.,  113.,
          143.,  303.,  615., 1226., 1281., 1221., 1081.,  866., 1096.,
         1039.,  975.,  746.,  581.,  409.,  182.]],

       [[ 394.,  264.,  140.,  144.,  104.,   28.,   28.,   25.,   70.,
          153.,  401.,  649., 1216., 1399., 1249., 1240., 1109., 1137.,
         1290., 1137.,  791.,  638.,  597.,  316.]],

       [[ 603.,  348.,  176.,  177.,   47.,   30.,   40.,   42.,  101.,
          180.,  401.,  777., 1344., 1573., 1408., 1243., 1141., 1178.,
         1256., 1114.,  814.,  635.,  304.,  168.]],

       [[ 428.,  309.,  199.,  117.,   82.,   43.,   24.,   64.,  152.,
          183.,  408.,  797., 1288., 1491., 1523., 1460., 1365., 1520.,
         1700., 1797., 1596., 1139.,  910.,  640.]],

       [[ 372.,  310.,  203.,  133.,   65.,   39.,   27.,   36.,  107.,
          139.,  329.,  651.,  990., 1027., 1041.,  971., 1104.,  844.,
         1023., 1019.,  862.,  643.,  591.,  452.]]])

Classifiers can be built using the `fit` method and predictions can be made using `predict`.

In [3]:
# build a TSF classifier and make predictions
tsf = TimeSeriesForestClassifier(n_estimators=100, random_state=0)
tsf.fit(X_train, y_train)
tsf.predict(X_test)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 1., 2., 2., 1., 2.,
       2., 2., 2.])

`predict_proba` can be used to get class probabilities.

In [4]:
tsf.predict_proba(X_test)

array([[0.86, 0.14],
       [0.76, 0.24],
       [0.72, 0.28],
       [0.98, 0.02],
       [0.78, 0.22],
       [0.85, 0.15],
       [0.94, 0.06],
       [0.85, 0.15],
       [0.85, 0.15],
       [0.79, 0.21],
       [0.16, 0.84],
       [0.12, 0.88],
       [0.59, 0.41],
       [0.19, 0.81],
       [0.13, 0.87],
       [0.97, 0.03],
       [0.16, 0.84],
       [0.03, 0.97],
       [0.  , 1.  ],
       [0.37, 0.63]])

Here we run some of the classifiers from the publication and find the accuracy for them on our example dataset.

In [5]:
classifiers = [
    "RDST",
    "R-STSF",
    "WEASEL-D",
    "Hydra-MultiROCKET",
]

accuracies = []
for classifier_name in classifiers:
    # Select a classifier by name, see set_bakeoff_classifier.py for options
    classifier = _set_bakeoff_classifier(classifier_name, random_state=0)

    # if it is a sklearn classifier, wrap it to work with time series data
    if is_sklearn_classifier(classifier):
        classifier = SklearnToTsmlClassifier(
            classifier=classifier, concatenate_channels=True, random_state=0
        )

    # fit and predict
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

accuracies

[0.85, 0.9, 0.9, 0.85]