# Example: Metadata
-------------------

This example shows how to add metadata like `groups` and `sample_weight` to atom.

Import the wine dataset from [sklearn.datasets](https://scikit-learn.org/stable/datasets/index.html#breast-cancer-wisconsin-diagnostic-dataset). This is a small and easy to train dataset whose goal is to predict wines into three groups (which cultivator it's from) using features based on the results of chemical analysis.

## Load the data

In [1]:
# Import packages
import numpy as np
from sklearn.datasets import load_wine
from atom import ATOMClassifier

In [2]:
# Load data
X, y = load_wine(return_X_y=True, as_frame=True)

# Let's have a look
X.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0


In [3]:
# Create (dummy) groups and sample_weights for the rows
groups = np.random.randint(5, size=X.shape[0])
sample_weight = np.random.randint(5, size=X.shape[0])
groups

array([0, 4, 0, 3, 3, 3, 4, 4, 2, 3, 3, 4, 4, 3, 0, 1, 0, 2, 3, 4, 0, 1,
       2, 3, 3, 2, 1, 2, 2, 0, 4, 4, 2, 2, 4, 4, 3, 1, 2, 2, 1, 0, 1, 4,
       2, 2, 4, 3, 4, 4, 3, 4, 3, 3, 1, 3, 3, 0, 1, 4, 0, 0, 2, 0, 1, 3,
       4, 3, 3, 4, 4, 1, 0, 2, 3, 1, 1, 3, 0, 1, 1, 3, 4, 4, 4, 0, 1, 4,
       2, 3, 3, 1, 4, 4, 3, 4, 4, 4, 4, 4, 1, 2, 2, 1, 1, 3, 3, 1, 1, 0,
       4, 0, 0, 2, 4, 4, 2, 3, 1, 3, 1, 4, 1, 3, 0, 4, 1, 0, 2, 1, 0, 4,
       1, 2, 3, 3, 2, 3, 2, 3, 3, 0, 3, 1, 3, 1, 3, 4, 3, 2, 4, 4, 4, 3,
       0, 3, 3, 3, 1, 2, 1, 2, 0, 4, 1, 0, 0, 1, 1, 4, 0, 3, 2, 1, 1, 1,
       2, 2])

## Run the pipeline

Add the metadata to the constructor. We leave `index=True` to prove the group funtioality works.  
When groups are specified, `test_size` specifies the number of groups in the test set.

In [4]:
atom = ATOMClassifier(
    X,
    y=y,
    index=True,
    metadata={"groups": groups, "sample_weight": sample_weight},
    test_size=1,
    verbose=2,
    random_state=1,
)


Algorithm task: Multiclass classification.

Shape: (178, 14)
Train set size: 149
Test set size: 29
-------------------------------------
Memory: 24.82 kB
Scaled: False
Outlier values: 9 (0.4%)



In [5]:
# Show all rows in the test set belong to the same group
atom.metadata["groups"].loc[atom.test.index]

138    2
172    2
88     2
25     2
133    2
116    2
128    2
38     2
102    2
22     2
159    2
62     2
39     2
113    2
101    2
44     2
45     2
177    2
32     2
8      2
33     2
27     2
161    2
176    2
17     2
28     2
149    2
73     2
136    2
Name: target, dtype: int64

## Analyze the results

In [6]:
atom.results

None


In [7]:
# Show the score for some different metrics
atom.evaluate(["precision_macro", "recall_macro", "jaccard_weighted"])

NotFittedError: This ATOMClassifier instance is not yet fitted. Call run with appropriate arguments before using this object.

In [None]:
# Some plots allow you to choose the target class to look at
atom.rf.plot_probabilities(rows="train", target=0)

In [None]:
atom.lda.plot_shap_heatmap(target=2, show=7)