[![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vanderschaarlab/temporai/blob/main/tutorials/usage/tutorial07_pipeline.ipynb)

# User Guide Tutorial 07: Pipeline

This tutorial shows how to use TemporAI `Pipeline`s.

## TemporAI `Pipeline`

A TemporAI `Pipeline` allows you to combine multiple plugins into one;
inspired by be [scikit-learn pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html).

* All but the final plugin in the pipeline need to be data transformers (the `preprocessing` plugin category),
* The final one must be a predictive plugin (any of the `prediction`, `time_to_event`, `treatments` plugin categories).

When fitting, all the stages will be fitted, and the data will be sequentially transformed by all the preprocessing
steps before fitting the final predictive method plugin.

When predicting, the data will be again transformed by the preprocessing steps, and prediction carried out using the
final predictive method plugin.

**Note:**

All pipelines follow `PipelineBase` interface, see API reference for details.

## Example

Below is an example of a pipeline ending with `prediction.one_off.nn_classifier`.

Initializing the `Pipeline` follows the following steps.
1. Use the `pipeline()` function to create a *pipeline class* from a list of strings denoting its steps.
1. Instantiate the pipeline class. The initialization arguments to each component plugin can be passed as a dictionary at this step.
1. Use the pipeline like any other TemporAI estimator (call `.fit(...)`, `.predict(...)` and so on).

In [1]:
from rich.pretty import pprint  # For fancy printing only.

In [2]:
from tempor.methods.pipeline import pipeline

# 1. Create a pipeline class based on your desired definition of the pipeline.
PipelineClass = pipeline(
    # Provide plugin names for the pipeline, in order.
    [
        # Preprocessing (data transformer) plugins:
        "preprocessing.imputation.temporal.bfill",
        "preprocessing.imputation.static.static_tabular_imputer",
        "preprocessing.imputation.temporal.ts_tabular_imputer",
        "preprocessing.scaling.temporal.ts_minmax_scaler",
        # Prediction plugin:
        "prediction.one_off.classification.nn_classifier",
    ],
)
print("Pipeline class:")
print(PipelineClass)

print("\nPipeline base classes (note `PipelineBase`):")
pprint(PipelineClass.mro())

pipe = PipelineClass(
    # You can provide initialization arguments to each plugin comprising the pipeline as a dictionary, as follows:
    {
        "static_imputer": {"static_imputer": "ice", "random_state": 42},
        "nn_classifier": {"n_iter": 100},
    }
)

print("Pipeline instance:")
pprint(pipe)

Pipeline class:
<class 'tempor.methods.pipeline.pipeline.<locals>.Pipeline'>

Pipeline base classes (note `PipelineBase`):


2023-10-09 18:03:05 | INFO     | hyperimpute.logger:log_and_print:65 | Iteration imputation: select_model_by_column: True, select_model_by_iteration: True
2023-10-09 18:03:05 | INFO     | hyperimpute.logger:log_and_print:65 | Iteration imputation: select_model_by_column: True, select_model_by_iteration: True


Pipeline instance:


Using the `Pipeline`:

In [3]:
from tempor.data.datasources import SineDataSource

dataset = SineDataSource(random_state=42).load()

# Fit:
pipe.fit(dataset)

# Predict:
pipe.predict(dataset)  # This will transform the data ant then predict.

2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > HyperImpute using inner optimization
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 0
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 1
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 2
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 3
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 4
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 5
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > Imputation iter 6
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |      >>>> Early stopping on objective diff iteration
2023-10-09 18:03:06 | INFO     | hyperimpute.logger:log_and_print:65 |   > HyperImpute using inner optimization
2

Unnamed: 0_level_0,feat_0
sample_idx,Unnamed: 1_level_1
0,1.0
1,1.0
2,1.0
3,1.0
4,0.0
...,...
95,1.0
96,1.0
97,1.0
98,1.0
