# Pipelines


**Adjutorium** provides **pipelines**, a way to chain multiple Plugins together, and sample from their joint hyperparameter space.

### Pipelines 101

Every **Adjutorium pipeline** consists of an arbitrary number of **Adjutorium plugins**.

Every plugin must be included in the pipeline **at most once**.
There can be only one **prediction plugin** and it has to be on the last layer of the Pipeline.

### Setup

In [None]:
import sys
import warnings
import time
from tqdm import tqdm
import random
import json

from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

from adjutorium.plugins.utils.simulate import simulate_nan


from IPython.display import HTML, display
import tabulate

if not sys.warnoptions:
    warnings.simplefilter("ignore")

### Loading the plugins

Make sure that you have installed Adjutorium in your workspace.

You can do that by running `pip install .` in the root of the repository.

In [None]:
from adjutorium.plugins import Plugins

plugins = Plugins()

### List the plugins

In [None]:
print(json.dumps(plugins.list(), indent=2))

### Creating a pipeline

The Pipeline constructor expects a list of strings with the format `<plugin_type>.<plugin_name>`.


In [None]:
from adjutorium.plugins.pipeline import Pipeline

pipeling_t = Pipeline(["preprocessor.dimensionality_reduction.pca", "prediction.classifier.neural_nets"])

pipeling_t.name()

## Testing the pipelines

Testing parameters:
 - __Dataset__ : Breast cancer wisconsin dataset.
 - __Amputation__: MAR with 20% missingness.

In [None]:
def get_metrics(pipeline, X_train, y_train, X_test, y_test):
    pipeline.fit(X_train, y_train)

    y_pred = pipeline.predict(X_test)
    
    score = metrics.accuracy_score(y_test, y_pred)
    
    fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred)
    auroc = metrics.auc(fpr, tpr)
    
    prec, recall, thresholds = metrics.precision_recall_curve(y_test, y_pred)
    aurpc = metrics.auc(recall, prec)
    
    return round(score, 4), round(auroc, 4), round(aurpc, 4)

def ampute_dataset(x, mechanism, p_miss):
    x_simulated = simulate_nan(x, p_miss, mechanism)
 
    mask = x_simulated["mask"]
    x_miss = x_simulated["X_incomp"]
 
    return x, x_miss, mask

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

_, X_train, _ = ampute_dataset(X_train, "MAR", 0.2)

In [None]:
metrics_headers = ["Pipeline", "Accuracy", "AUROC", "AURPC"]

plugin_subset = {
  "imputer.default": [
    "most_frequent",
    "median",
    "mean",
  ],
  "prediction.classifier": [
    "adaboost",
    "xgboost",
    "decision_trees",
    "gradient_boosting",
    "logistic_regression",
  ],
  "preprocessor.feature_scaling": [
    "maxabs_scaler",
    "minmax_scaler"
  ]
}

test_score = []

for experiment in range(20):
    plugin_sample = []
    for cat in ["imputer.default", "preprocessor.feature_scaling", "prediction.classifier"]:
        plugin = random.choice(plugin_subset[cat])
        plugin_sample.append(cat + "." + plugin)
        
    pipeling_t = Pipeline(plugin_sample)
    pipeline = pipeling_t()
    
    score, auroc, aurpc = get_metrics(pipeline, X_train, y_train, X_test, y_test)
    
    test_score.append([pipeling_t.name(), score, auroc, aurpc])


In [None]:
display(
    HTML(tabulate.tabulate(test_score, headers=metrics_headers, tablefmt="html"))
)

# Congratulations!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement towards Machine learning and AI for medicine, you can do so in the following ways!

### Star Adjutorium on GitHub

The easiest way to help our community is just by starring the Repos! This helps raise awareness of the tools we're building.

- [Star Adjutorium](https://github.com/vanderschaarlab/adjutorium)
- [Star Clairvoyance](https://github.com/vanderschaarlab/clairvoyance)
