# Example: Multilabel classification
--------------------------------

This example shows how to use ATOM to solve a multilabel classification problem.

The data used is a synthetic dataset created using sklearn's [make_multilabel_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_multilabel_classification.html) function.

## Load the data

In [1]:
# Import packages
import pandas as pd
from atom import ATOMClassifier
from sklearn.datasets import make_multilabel_classification

In [2]:
# Create data
X, y = make_multilabel_classification(n_samples=300, n_classes=3, random_state=1)

## Run the pipeline

In [3]:
# Note that for multioutput tasks, you must specify the `y` keyword
atom = ATOMClassifier(X, y=y, verbose=2, random_state=1)

Algorithm task: multilabel classification.

Shape: (300, 23)
Train set size: 240
Test set size: 60
-------------------------------------
Memory: 51.73 kB
Scaled: False
Outlier values: 29 (0.5%)



In [4]:
# Show the models that natively support multilabel tasks
atom.available_models()[["acronym", "model", "native_multilabel"]]

Unnamed: 0,acronym,model,native_multilabel
0,AdaB,AdaBoost,False
1,Bag,Bagging,False
2,BNB,BernoulliNB,False
3,CatB,CatBoost,False
4,CatNB,CategoricalNB,False
5,CNB,ComplementNB,False
6,Tree,DecisionTree,True
7,Dummy,Dummy,False
8,ETree,ExtraTree,True
9,ET,ExtraTrees,True


In [5]:
atom.run(models=["LDA", "RF"], metric="recall_weighted")


Models: LDA, RF
Metric: recall_weighted


Results for LinearDiscriminantAnalysis:
Fit ---------------------------------------------
Train evaluation --> recall_weighted: 0.9124
Test evaluation --> recall_weighted: 0.8351
Time elapsed: 0.022s
-------------------------------------------------
Total time: 0.022s


Results for RandomForest:
Fit ---------------------------------------------
Train evaluation --> recall_weighted: 1.0
Test evaluation --> recall_weighted: 0.8763
Time elapsed: 0.154s
-------------------------------------------------
Total time: 0.154s


Total time: 0.178s
-------------------------------------
LinearDiscriminantAnalysis --> recall_weighted: 0.8351
RandomForest               --> recall_weighted: 0.8763 !


In [6]:
# Note that non-native multioutput models use a meta-estimator wrapper
print(f"Estimator for LDA is: {atom.lda.estimator}")
print(f"Estimator for RF is: {atom.rf.estimator}")

Estimator for LDA is: ClassifierChain(base_estimator=LinearDiscriminantAnalysis(), random_state=1)
Estimator for RF is: RandomForestClassifier(n_jobs=1, random_state=1)


### Add custom multilabel models

To use your own meta-estimator with custom parameters, add it as a [custom model](https://tvdboom.github.io/ATOM/latest/user_guide/models/#custom-models).
It's also possible to tune the hyperparameters of this custom meta-estimator.

In [8]:
from atom import ATOMModel
from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from optuna.distributions import CategoricalDistribution, IntDistribution

custom_model = ATOMModel(
    estimator=ClassifierChain(LogisticRegression(), cv=3),
    name="chain",
    needs_scaling=True,
    native_multilabel=True,
)

atom.run(
    models=custom_model,
    n_trials=5,
    ht_params={
        "distributions": {
            "order": CategoricalDistribution([[0, 1, 2], [2, 1, 0], [1, 2, 0]]),
            "base_estimator__max_iter": IntDistribution(100, 200, step=10),
            "base_estimator__solver": CategoricalDistribution(["lbfgs", "newton-cg"]),            
        }
    },
)

Trial 0 failed with parameters: {'order': [3, 2, 1],
 'base_estimator__max_iter': 130,
 'base_estimator__solver': 'lbfgs'} because of the following error: TypeError("_BaseChain.__init__() got an unexpected keyword argument 'base_estimator__C'").
Traceback (most recent call last):
  File "C:\Users\Mavs\Documents\Python\ATOM\venv310\lib\site-packages\optuna\study\_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "C:\Users\Mavs\Documents\Python\ATOM\atom\basemodel.py", line 919, in objective
    estimator = self._get_est(
  File "C:\Users\Mavs\Documents\Python\ATOM\atom\models.py", line 199, in _get_est
    return super()._get_est(**{**self._params, **params})
  File "C:\Users\Mavs\Documents\Python\ATOM\atom\basemodel.py", line 420, in _get_est
    estimator = self._inherit(self._est_class(**params))
TypeError: _BaseChain.__init__() got an unexpected keyword argument 'base_estimator__C'
Trial 0 failed with value None.



Models: chain
Metric: recall_weighted


Running hyperparameter tuning for ClassifierChain...
| trial |     order | base_estimator__max_iter | base_estimator__solver | recall_weighted | best_recall_weighted | time_trial | time_ht |    state |
| ----- | --------- | ------------------------ | ---------------------- | --------------- | -------------------- | ---------- | ------- | -------- |

Exception encountered while running the chain model.
TypeError: _BaseChain.__init__() got an unexpected keyword argument 'base_estimator__C'


RuntimeError: All models failed to run. Use the logger to investigate the exceptions.

## Analyze the results

In [None]:
thresholds = atom.rf.get_best_threshold()
print(f"Best threshold per target column: {thresholds}")

In [None]:
atom.rf.evaluate(threshold=thresholds)

In [None]:
# Use the target parameter in plots to specify which target column to use
atom.plot_roc(target=2)

In [None]:
# When the target parameter also specifies the class, use format (column, class)
atom.plot_probabilities(models="MLP", target=(2, 1))

In [None]:
with atom.canvas(figsize=(900, 600)):
    atom.plot_calibration(target=0)
    atom.plot_calibration(target=1)