# Example: Multilabel classification
--------------------------------

This example shows how to use ATOM to solve a multilabel classification problem.

The data used is a synthetic dataset created using sklearn's [make_multilabel_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_multilabel_classification.html) function.

## Load the data

In [1]:
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification

In [2]:
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)

## Run the pipeline

In [3]:
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)

In [4]:
# Show the models that natively support multilabel tasks
atom.available_models(native_multilabel=True)

In [5]:
atom.run(models=["LDA", "RF"], metric="recall_weighted")

In [6]:
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")

## Add custom multilabel models

To use your own meta-estimator with custom parameters, add it as a [custom model](https://tvdboom.github.io/ATOM/latest/user_guide/models/#custom-models).
It's also possible to tune the hyperparameters of this custom meta-estimator.

In [7]:
from atom import ATOMModel
from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from optuna.distributions import CategoricalDistribution, IntDistribution

custom_model = ATOMModel(
    estimator=ClassifierChain(LogisticRegression(), cv=3),
    name="chain",
    needs_scaling=True,
    native_multilabel=True,
)

atom.run(
    models=custom_model,
    n_trials=5,
    ht_params={
        "distributions": {
            "order": CategoricalDistribution([[0, 1, 2], [2, 1, 0], [1, 2, 0]]),
            "base_estimator__max_iter": IntDistribution(100, 200, step=10),
            "base_estimator__solver": CategoricalDistribution(["lbfgs", "newton-cg"]),            
        }
    }
)

## Analyze the results

In [8]:
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")

In [9]:
atom.rf.evaluate(threshold=thresholds)

In [10]:
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)

In [11]:
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="chain", target=(2, 1))

In [12]:
with atom.canvas(figsize=(900, 600)):
    atom.plot_calibration(target=0)
    atom.plot_calibration(target=1)