[![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vanderschaarlab/temporai/blob/main/tutorials/usage/tutorial09_automl.ipynb)

# User Guide Tutorial 09: AutoML

TemporAI provides AutoML tools for finding the best model for your use case in `tempor.automl`, these are demonstrated here.

## AutoML in TemporAI Overview

TemporAI provides two AutoML approaches ("seekers") under the `tempor.automl.seekers` module.

1. `MethodSeeker`: Search the hyperparameter space of a particular predictive method.
2. `PipelineSeeker`: Search the hyperparameter space of a pipeline like `preprocessing steps -> predictive step`.

The optimization strategies are facilitated by [`optuna`](https://optuna.readthedocs.io/) and the currently supported strategies are:
* [Bayesian, specifically Tree-structured Parzen estimator](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html) (`"bayesian"`),
* [Random](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.RandomSampler.html) (`"random"`),
* [CMA-ES](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.CmaEsSampler.html) (`"cmaes"`),
* [QMC](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.QMCSampler.html) (`"qmc"`),
* [Grid](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.GridSampler.html) (`"grid"`).

## Using `MethodSeeker`

Use a `MethodSeeker` to search for best algorithm and hyperparameters parameters for a particular task.
No preprocessing (data transformation) steps are carried out in this approach, so preprocess the data using
`tempor.methods.preprocessing` first, as needed.

A `MethodSeeker` can be initialized as follows.

In [None]:
from tempor import plugin_loader
from tempor.automl.seeker import MethodSeeker

# Load your dataset.
dataset = plugin_loader.get("prediction.one_off.sine", plugin_type="datasource").load()

seeker = MethodSeeker(
    # Name your AutoML study:
    study_name="my_automl_study",
    # Select the type of task:
    task_type="prediction.one_off.classification",
    # Choose which predictive methods to use in the search:
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    # Choose a metric. Metric maximization/minimization will be determined automatically.
    metric="aucroc",
    # Pass in your dataset.
    dataset=dataset,
    # How many best models to return:
    return_top_k=3,
    # Number of AutoML iterations:
    num_iter=100,
    # Type of AutoML tuner to use:
    tuner_type="bayesian",
    # You can also provide some other options like early stopping patience, number of cross-validation folds etc.
)

2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_set_up_tuners:365 | Setting up estimators and tuners for study my_automl_study.
2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator cde_classifier.
2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator ode_classifier.
2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator nn_classifier.


You can then run the AutoML search as below.

The below example also shows how you can provide a custom hyperparameter space (*override* the default hyperparameter
space for a model).

In [None]:
from tempor.methods.core.params import IntegerParams, CategoricalParams

# Provide a custom hyperparameter space to search for each type of model.
# NOTE: For the sake of speed of this example, we limit epochs to 2.
hp_space = {
    "cde_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_temporal_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ode_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "nn_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `MethodSeeker` and provide `override_hp_space`.
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
)

2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_set_up_tuners:365 | Setting up estimators and tuners for study my_automl_study.
2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator cde_classifier.
2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator ode_classifier.
2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator nn_classifier.


In [None]:
# Execute the search.

best_methods, best_scores = seeker.search()

2023-10-10 16:57:26 | INFO     | tempor.automl.seeker:search:424 | Running  search for estimator 'cde_classifier' 1/3.
2023-10-10 16:57:26 | INFO     | tempor.automl.tuner:tune:205 | Baseline score computation skipped
2023-10-10 16:57:26 | INFO     | tempor.automl.tuner:objective:227 | 
Hyperparameters sampled from CDEClassifier:
{'n_iter': 2, 'n_temporal_units_hidden': 13, 'lr': 0.01}
2023-10-10 16:57:31 | INFO     | tempor.automl.tuner:objective:227 | 
Hyperparameters sampled from CDEClassifier:
{'n_iter': 2, 'n_temporal_units_hidden': 11, 'lr': 0.0001}
2023-10-10 16:57:34 | INFO     | tempor.automl.tuner:objective:227 | 
Hyperparameters sampled from CDEClassifier:
{'n_iter': 2, 'n_temporal_units_hidden': 20, 'lr': 0.001}
2023-10-10 16:57:38 | INFO     | tempor.automl.seeker:search:424 | Running  search for estimator 'ode_classifier' 2/3.
2023-10-10 16:57:38 | INFO     | tempor.automl.tuner:tune:205 | Baseline score computation skipped
2023-10-10 16:57:38 | INFO     | tempor.automl.t

In [None]:
# The best methods are returned, and can be used by calling .predict() and so on.

import rich.pretty  # For pretty printing only.

for method in best_methods:
    rich.pretty.pprint(method, indent_guides=False)

## Using `PipelineSeeker`

Use a `PipelineSeeker` to search for best *pipeline* (`preprocessing steps -> prediction step`) for a particular task.

This seeker will create pipelines comprised of:
- A static imputer (if at lease one candidate in ``static_imputers`` provided),
- A static scaler (if at lease one candidate in ``static_scalers`` provided),
- A temporal imputer (if at lease one candidate in ``temporal_imputers`` provided),
- A temporal scaler (if at lease one candidate in ``temporal_scalers`` provided),
- The final predictor, from the ``estimator_names`` options.

The imputer/scaler candidates will be sampled as a categorical hyperparameter. The hyperparameter spaces of these,
and of the final predictor, will be sampled.

A `PipelineSeeker` uses a very similar interface to `MethodSeeker`, and can be initialized as follows.

In [None]:
from tempor.automl.seeker import PipelineSeeker

seeker = PipelineSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    # The estimators here will be the final step of the pipeline:
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=100,
    tuner_type="bayesian",
    # The following arguments specify the candidates of the different preprocessing steps, e.g.:
    static_imputers=["static_tabular_imputer"],
    static_scalers=[],
    temporal_imputers=["ffill", "bfill"],
    temporal_scalers=["ts_minmax_scaler"],
)

2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_set_up_tuners:365 | Setting up estimators and tuners for study my_automl_study.
2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_init_estimator:753 | Creating estimator <Pipeline with cde_classifier>.
2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_init_estimator:753 | Creating estimator <Pipeline with ode_classifier>.
2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_init_estimator:753 | Creating estimator <Pipeline with nn_classifier>.


By default, the following preprocessing candidates will be used, if you do not specify the argument:

In [None]:
from tempor.automl.seeker import (
    DEFAULT_STATIC_IMPUTERS,
    DEFAULT_STATIC_SCALERS,
    DEFAULT_TEMPORAL_IMPUTERS,
    DEFAULT_TEMPORAL_SCALERS,
)

print("Static imputer candidates:", DEFAULT_STATIC_IMPUTERS)
print("Static scaler candidates:", DEFAULT_STATIC_SCALERS)
print("Temporal imputer candidates:", DEFAULT_TEMPORAL_IMPUTERS)
print("Temporal scaler candidates:", DEFAULT_TEMPORAL_SCALERS)

Static imputer candidates: ['static_tabular_imputer']
Static scaler candidates: ['static_minmax_scaler', 'static_standard_scaler']
Temporal imputer candidates: ['ffill', 'ts_tabular_imputer', 'bfill']
Temporal scaler candidates: ['ts_minmax_scaler', 'ts_standard_scaler']


You can execute the search as follows.

In [None]:
from tempor.methods.core.params import IntegerParams, CategoricalParams

# Provide a custom hyperparameter space to search for each type of model.
# These can be provided for the final (predictive) step of the pipeline.
# Default hyperparameter space will be sampled for the preprocessing steps.
# NOTE: For the sake of speed of this example, we limit epochs to 2.
hp_space = {
    "cde_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_temporal_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "ode_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
    "nn_classifier": [
        IntegerParams(name="n_iter", low=2, high=2),
        IntegerParams(name="n_units_hidden", low=5, high=20),
        CategoricalParams(name="lr", choices=[1e-2, 1e-3, 1e-4]),
    ],
}

# Initialize a `PipelineSeeker` and provide `override_hp_space`.
seeker = PipelineSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    return_top_k=3,
    num_iter=3,  # For the sake of speed of this example, only 3 AutoML iterations.
    tuner_type="bayesian",
    # Override hyperparameter space:
    override_hp_space=hp_space,
    # Specify preprocessing candidates:
    static_imputers=["static_tabular_imputer"],
    static_scalers=["static_minmax_scaler", "static_standard_scaler"],
    temporal_imputers=[],
    temporal_scalers=["ts_minmax_scaler", "ts_standard_scaler"],
)

2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_set_up_tuners:365 | Setting up estimators and tuners for study my_automl_study.
2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_init_estimator:753 | Creating estimator <Pipeline with cde_classifier>.
2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_init_estimator:753 | Creating estimator <Pipeline with ode_classifier>.
2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:_init_estimator:753 | Creating estimator <Pipeline with nn_classifier>.


In [None]:
best_pipelines, best_scores = seeker.search()

2023-10-10 16:57:56 | INFO     | tempor.automl.seeker:search:424 | Running  search for estimator '<Pipeline with cde_classifier>' 1/3.
2023-10-10 16:57:56 | INFO     | tempor.automl.tuner:tune:205 | Baseline score computation skipped
2023-10-10 16:57:56 | INFO     | tempor.automl.tuner:objective:227 | 
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_standard_scaler->prediction.one_off.classification.cde_classifier:
{'plugin_params': {'static_tabular_imputer': {'imputer': 'softimpute'}, 'static_minmax_scaler': {'clip': True}, 'ts_standard_scaler': {}, 'cde_classifier': {'n_iter': 2, 'n_temporal_units_hidden': 17, 'lr': 0.001}}}
2023-10-10 16:58:14 | INFO     | tempor.automl.tuner:objective:227 | 
Hyperparameters sampled from preprocessing.imputation.static.static_tabular_imputer->preprocessing.scaling.static.static_minmax_scaler->preprocessing.scaling.temporal.ts_min

In [None]:
# The best performing pipelines are returned, and can be used by calling .predict() and so on.

for method in best_pipelines:
    rich.pretty.pprint(method, indent_guides=False)

## Advanced customization

You may further customize the AutoML tuning behavior by specifying the sampler an pruner, if desired.

See the below example.

In [None]:
# 1. Import a Tuner:
from tempor.automl.tuner import OptunaTuner

# 2. Customize this as needed:
import optuna

custom_tuner = OptunaTuner(
    study_name="my_automl_study",
    direction="maximize",
    # Customized sampler:
    study_sampler=optuna.samplers.TPESampler(seed=12345, n_startup_trials=3),
    # Customized pruner:
    study_pruner=optuna.pruners.MedianPruner(interval_steps=2),
    # Using default optuna storage object here, but may a provide custom one, e.g. redis.
    study_storage=None,
)

# 3. Pass the Tuner to the {Method/Pipeline}Seeker:
seeker = MethodSeeker(
    study_name="my_automl_study",
    task_type="prediction.one_off.classification",
    estimator_names=[
        "cde_classifier",
        "ode_classifier",
        "nn_classifier",
    ],
    metric="aucroc",
    dataset=dataset,
    # Like so:
    custom_tuner=custom_tuner,
)

# 4. Execute search:
# results = seeker.search() ...

2023-10-10 16:59:22 | INFO     | tempor.automl.seeker:_set_up_tuners:365 | Setting up estimators and tuners for study my_automl_study.
2023-10-10 16:59:22 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator cde_classifier.
2023-10-10 16:59:22 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator ode_classifier.
2023-10-10 16:59:22 | INFO     | tempor.automl.seeker:_init_estimator:579 | Creating estimator nn_classifier.


## Supported tasks

> ⚠️ The tasks for which benchmarking is supported are supported by AutoML. See the benchmarking tutorial.
