# User Guide Tutorial 09: Benchmarks

TemporAI provides some useful benchmarking tools in `tempor.benchmarks`, these are demonstrated here.

*Uncomment the below to install the library:*

In [1]:
# !pip install temporai

# Or from the repo, for the latest version:
# !pip install git+https://github.com/vanderschaarlab/temporai.git

## Using `tempor.benchmarks.benchmark_models`

The `tempor.benchmarks.benchmark_models` function provides a quick way to benchmark a number of models (plugins) for
a particular task.

It takes a list of models (these may also be a `Pipeline`) and a dataset, and performs cross-validation to
get the mean and standard deviation of the various metrics.

It returns a tuple `(results_readable, results)` as below.

In [2]:
from tempor.benchmarks import benchmark_models
from tempor.utils.dataloaders import SineDataLoader
from tempor.plugins import plugin_loader

from IPython.display import display

dataset = SineDataLoader(random_state=42, no=25).load()

results_readable, results = benchmark_models(
    task_type="classification",
    tests=[
        ("model_1", plugin_loader.get("prediction.one_off.classification.nn_classifier", n_iter=50)),
        ("model_2", plugin_loader.get("prediction.one_off.classification.ode_classifier", n_iter=50)),
    ],
    data=dataset,
    n_splits=3,
)

print("Results in easily-readable format:")
display(results_readable)

print("Full results:\n")
for model, value in results.items():
    print(f"{model}:")
    display(value)

Results in easily-readable format:


Unnamed: 0,model_1,model_2
aucroc,0.333 +/- 0.109,0.422 +/- 0.11
aucprc,0.453 +/- 0.099,0.456 +/- 0.107
accuracy,0.528 +/- 0.137,0.519 +/- 0.105
f1_score_micro,0.528 +/- 0.137,0.519 +/- 0.105
f1_score_macro,0.484 +/- 0.112,0.338 +/- 0.048
f1_score_weighted,0.514 +/- 0.128,0.361 +/- 0.116
kappa,0.001 +/- 0.232,0.0 +/- 0.0
kappa_quadratic,0.001 +/- 0.232,0.0 +/- 0.0
precision_micro,0.528 +/- 0.137,0.519 +/- 0.105
precision_macro,0.497 +/- 0.122,0.259 +/- 0.053


Full results:

model_1:


Unnamed: 0,mean,stddev
aucroc,0.333333,0.108866
aucprc,0.452844,0.099425
accuracy,0.527778,0.137493
f1_score_micro,0.527778,0.137493
f1_score_macro,0.484091,0.112494
f1_score_weighted,0.514141,0.127851
kappa,0.001,0.232335
kappa_quadratic,0.001,0.232335
precision_micro,0.527778,0.137493
precision_macro,0.497222,0.12178


model_2:


Unnamed: 0,mean,stddev
aucroc,0.422222,0.109994
aucprc,0.456349,0.106738
accuracy,0.518519,0.105369
f1_score_micro,0.518519,0.105369
f1_score_macro,0.338162,0.047609
f1_score_weighted,0.360713,0.115623
kappa,0.0,0.0
kappa_quadratic,0.0,0.0
precision_micro,0.518519,0.105369
precision_macro,0.259259,0.052684


## Supported tasks

> ⚠️ Not all task types are supported by `benchmark_models` yet.

Supported tasks (for each `task_type` argument):
* `task_type="classification"`: `prediction.one_off.classification` models.
* `task_type="regression"`: `prediction.one_off.regression` models.
* `task_type="time_to_event"`: `time_to_event` models.
