# Overview

**GENERAL THOUGHTS:**  
Use AutoML (AutoGluon.Tabular) as a general way to investigate which algorithm, pre-processing, feature engineering options are (well) suited for the given tasks, as well as to investigate the potential performance based on a (large) varity of configurations of those options.
The notebook includes multiple scenarios of using AutoML:
- including and excluding custom data pre-processing (see below)
- including auto pre-processing by AutoGluon.Tabular
- including auto feature engineering by AutoGluon.Tabular
https://auto.gluon.ai/stable/tutorials/tabular/tabular-feature-engineering.html
- including multiple classifiers by using:
  - multiple ml algorithms
  - "standard" HPO for each algorithm defined by AutoGluon.Tabular
  - ensables of algorithms (bagging and stacking with possible multiple layers)

**DATA PREPROCESSING:**  
Imbalanced data:
- over_sampling for imbalanced data.
- cost-sensitive learning for imbalanced data.

continuous data:
- Impute missing data: SimpleImputer(strategy='median').
- Standardize data: StandardScaler().

categorical data:
- Impute missing data: SimpleImputer(strategy='most_frequent').
- Ordinal & Nominal data encoding: OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1).
- Unknown values ecoding and reordering of ordinal encoding: custom encoder "OrdinalEncoderExtensionUnknowns()".

target data:
- target encoding: OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)

**AUTOML MULTI-CLASS CLASSIFIERS:**
- Overview of models to be considered using AutoML (AutoGluon.Tabular):  
  - [X] RandomForest
  - [X] ExtraTrees
  - [X] XGBoost
  - [X] LightGBM
  - [X] KNeighbors
  - [X] CatBoost
  - [X] Multiple Neural Nets

**FINAL MODEL PERFORMANCE:**  
- Evaluation of the best model from AutoML, including Experiment checkpointing.
- Loading final model from checkpoint for prediction on test set for evaluation based on classification report
- Tracking of the best model with MLFlow for performance benchmarking with other approaches (Baseline, PyCaret, PyTorch, ...) within the Repository.

**Configurations for running the notebook**  
Set the following configurations befor running the notebook under the section [OVERVIEW](#overview):
- Infrastructure to run the notebook on: local, colab, cloud (azure)
- Infrastructure to run the notebook on, set ```compute``` to: local, colab, cloud (azure)
  - project_directory
  - data_directory (source data)
  - autogluon_exp_storage_directory
- General settings for experiments (SEED, time_limit, data_samples)

In [None]:
# NOTE: Configure the compute target. The NB handles realted configs. Options supported in this NB:
#       Run on local machine, set "compute" to: None
#       Run in google colab, set "compute" to: "colab"
#       Run on azure compute instanze within ML Service, set "compute" to: "azure"
from typing import Literal

compute: Literal[None, "colab", "azure"] = None  # Dafault None

In [None]:
if compute == "colab":
    # Import the library to mount Google Drive
    from google.colab import drive
    # Mount the Google Drive at /content/drive
    drive.mount('/content/drive')
    # Verify by listing the files in the drive
    # !ls /content/drive/My\ Drive/
    # current dir in colab
    !pwd

In [None]:
if compute == "colab":
    !pip install --upgrade optuna==3.5.0
    # !pip install --upgrade optuna.integration
    !pip install --upgrade mlflow
    !pip install --upgrade PyCaret

In [None]:
import os
import sys
import yaml
import datetime

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import classification_report
from imblearn.over_sampling import RandomOverSampler

from autogluon.tabular import TabularDataset, TabularPredictor

import mlflow

# ignore warnings
import warnings

warnings.filterwarnings("ignore")

In [None]:
# get config
if compute == "colab":
    env_file = "./env_vars_colab.yml"
elif compute == "azure":
    env_file = "../env_vars_azureml_compute.yml"
else:
    env_file = "../env_vars.yml"
# NOTE: if used in Google Colab, upload env_vars_colab.yml to the current Google Colab directory.
try:
    with open(env_file, "r") as file:
        config = yaml.safe_load(file)
except FileNotFoundError:
    print(f"Error: {env_file} not found.")
    if compute == "colab":
        print("Please upload it to the current Google Colab directory.")

# custom imports of local modules
sys.path.append(config["project_directory"])

# from src import utils, nb_utils

In [None]:
# General settings within the data science workflow

pd.set_option("display.max_columns", None)

SEED = 42  # Ensure same data split as in other notebooks

# NOTE: for dev only
subsample = False
subsample_size = 1000  # subsample subset of data for faster development

# time limit for AutoGluon training
experiment_time_limit = 8 * 60 * 60  # 3*60*60

# get current date and time
now = datetime.datetime.now()
formatted_date_time = now.strftime("%Y-%m-%d_%H:%M:%S")
print(formatted_date_time)

2024-11-14_21:50:31


# Load and prepare data

In [None]:
df = pd.read_csv(f"{config['data_directory']}/output/df_ml.csv", sep="\t")

df["material_number"] = df["material_number"].astype("object")

df_sub = df[
    [
        "material_number",
        "brand",
        "product_area",
        "core_segment",
        "component",
        "manufactoring_location",
        "characteristic_value",
        "material_weight",
        "packaging_code",
        "packaging_category",
    ]
]

In [None]:
# Draw a random subsample
if subsample is True:
    df_sub = df_sub.sample(n=subsample_size, random_state=SEED)

# AutoML: without custom pre-processing; restricted selection of models including HPO and model ensembling

## Split data into train and test

In [None]:
# Define features and target
X = df_sub.iloc[:, :-1]
y = df_sub.iloc[:, -1]  # the last column is the target

# Generate train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=SEED
)

## Transform to AutoML data format

In [None]:
df_train = pd.concat([X_train, y_train], axis=1)

train_data = TabularDataset(df_train)

## AutoML training pipeline

In [None]:
label = "packaging_category"
automl_predictor = TabularPredictor(
    label=label, problem_type="multiclass", eval_metric="f1_macro", sample_weight="balance_weight"
).fit(
    train_data=train_data,
    tuning_data=None,  # If tuning_data = None, fit() will automatically hold out some random validation examples from train_data.
    holdout_frac=0.2,  # Default value (if None) is selected based on the number of rows in the training data.
    time_limit=experiment_time_limit,  # 3*60*60
    presets=["high_quality"],  # ['high_quality'] # default = ['medium_quality'], any user-specified arguments in fit() will override the values used by presets.
    # auto_stack=False, # Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy.
    # included_model_types=['LR', 'KNN', 'RF', 'XT', 'GBM', 'XGB', 'CAT', 'NN'],
    # excluded_model_types=['FASTAI', 'AG_AUTOMM'],
    hyperparameter_tune_kwargs={  # HPO is not performed unless hyperparameter_tune_kwargs is specified. Searchspaces are provided for some models, but not for all. Where no searchspace is provided, a fixed set of hyper-parameters is defined. (see /searchspace under each model: https://github.com/autogluon/autogluon/tree/master/tabular/src/autogluon/tabular/models).
        # 'num_trials': 15, # try at most n different hyperparameter configurations for each type of model
        "scheduler": "local",
        "searcher": "auto",  # ‘auto’: Perform bayesian optimization search on NN_TORCH and FASTAI models. Perform random search on other models.
    },  # Refer to TabularPredictor.fit docstring for all valid values
)

2024-11-15 05:26:58,413	INFO timeout.py:54 -- Reached timeout of 87.04428333044052 seconds. Stopping all trials.
2024-11-15 05:26:58,510	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/mnt/batch/tasks/shared/LS_root/mounts/clusters/packaginge4dsv5/code/Users/david.tiefenthaler/ml_packaging_classification/notebooks/AutogluonModels/ag-20241114_215127/models/NeuralNetTorch_r89_BAG_L2' in 0.0954s.
- c52bacfb: FileNotFoundError('Could not fetch metrics for c52bacfb: both result.json and progress.csv were not found at /mnt/batch/tasks/shared/LS_root/mounts/clusters/packaginge4dsv5/code/Users/david.tiefenthaler/ml_packaging_classification/notebooks/AutogluonModels/ag-20241114_215127/models/NeuralNetTorch_r89_BAG_L2/c52bacfb')
- ba75539e: FileNotFoundError('Could not fetch metrics for ba75539e: both result.json and progress.csv were not found at /mnt/batch/tasks/shared/LS_root/mounts/clusters/packaginge4dsv5/code/Users/david.tiefenthaler/ml_packaging

In [13]:
# Evaluation of models on training data
automl_predictor.leaderboard()

Unnamed: 0,model,score_val,eval_metric,pred_time_val,fit_time,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L3,0.879455,f1_macro,1261.411261,4437.336939,0.032782,23.833099,3,False,113
1,RandomForestGini_BAG_L2,0.852922,f1_macro,1247.370432,3819.342807,12.017075,98.319932,2,False,64
2,ExtraTrees_r126_BAG_L2,0.842777,f1_macro,1283.196565,3824.214319,47.843208,103.191444,2,False,107
3,ExtraTreesGini_BAG_L2,0.837848,f1_macro,1282.713961,3829.324050,47.360603,108.301176,2,False,66
4,ExtraTrees_r49_BAG_L2,0.837848,f1_macro,1287.946344,3840.135314,52.592986,119.112439,2,False,90
...,...,...,...,...,...,...,...,...,...,...
221,CatBoost_r137_BAG_L1_FULL,,f1_macro,,6.826723,,6.826723,1,True,127
222,CatBoost_r12_BAG_L2_FULL,,f1_macro,,779.294848,,27.160381,2,True,219
223,CatBoost_r12_BAG_L1_FULL,,f1_macro,,14.765523,,14.765523,1,True,169
224,CatBoost_r128_BAG_L2_FULL,,f1_macro,,778.728704,,26.594238,2,True,206


## Evaluate AutoML experiment and best model

In [14]:
# Evaluation of models on test data
df_test = pd.concat([X_test, y_test], axis=1)
test_data = TabularDataset(df_test)

automl_std_leaderboard_testdata = automl_predictor.leaderboard(test_data)
automl_std_leaderboard_testdata.head(10)

Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L3_FULL,0.821938,,f1_macro,244.44007,,997.536045,0.036715,,23.833099,3,True,226
1,ExtraTrees_r126_BAG_L2_FULL,0.796722,,f1_macro,244.419985,,803.335521,4.862669,47.843208,51.201054,2,True,220
2,ExtraTreesGini_BAG_L2_FULL,0.796112,,f1_macro,247.921698,,806.8713,8.364383,47.360603,54.736833,2,True,179
3,ExtraTrees_r49_BAG_L2_FULL,0.796112,,f1_macro,248.116858,,812.897786,8.559542,52.592986,60.763319,2,True,203
4,RandomForestGini_BAG_L2_FULL,0.787727,,f1_macro,240.776236,,837.266125,1.21892,12.017075,85.131658,2,True,177
5,XGBoost_r194_BAG_L2_FULL,0.786827,,f1_macro,240.148214,,775.415514,0.590898,,23.281047,2,True,193
6,RandomForest_r166_BAG_L2_FULL,0.780531,,f1_macro,247.999114,,830.917141,8.441798,48.200726,78.782675,2,True,211
7,WeightedEnsemble_L2_FULL,0.774823,,f1_macro,13.754647,,44.847857,0.030144,,23.637924,2,True,176
8,XGBoost_r95_BAG_L2_FULL,0.769648,,f1_macro,240.980194,,774.021567,1.422878,,21.8871,2,True,223
9,LightGBMLarge_BAG_L2_FULL,0.768676,,f1_macro,239.986061,,775.715333,0.428746,,23.580867,2,True,181


In [None]:
# For a single specified model: make predictions and perform detailed evaluation on hold out test data
# i = -1  # index of model to use
# model_to_use = automl_predictor.model_names()[i]
model_to_use = automl_std_leaderboard_testdata.iloc[0, 0]  # use best model from leaderboard
print(f"Model to be evaluated: {model_to_use}")
preds_y_test = automl_predictor.predict(X_test, model=model_to_use)
print("Predictions:  ", list(preds_y_test)[:5])

print(classification_report(y_test, preds_y_test))

Model to be evaluated: WeightedEnsemble_L3_FULL
Predictions:   ['Blister and Insert Card', 'Corrugated carton', 'Plastic bag with header', 'Tube', 'Shrink film and insert o']
                            precision    recall  f1-score   support

   Blister and Insert Card       0.92      0.92      0.92      1749
  Blister and sealed blist       0.92      0.92      0.92      1582
            Book packaging       0.00      0.00      0.00         2
Cardb. Sleeve w - w/o Shr.       0.82      0.73      0.77       135
  Cardboard hanger w/o bag       1.00      0.85      0.92        80
    Carton cover (Lid box)       0.79      0.68      0.73       130
   Carton tube with or w/o       1.00      1.00      1.00         9
                      Case       0.66      0.94      0.78        97
         Corrugated carton       0.85      0.79      0.82       774
        Countertop display       1.00      1.00      1.00        30
                  Envelope       0.93      0.97      0.95        59
        

# AutoML: custom pre-processing; restricted selection of models including HPO and model ensembling

## Define features and target, performe oversampling, split data into train and test

In [None]:
# Define features and target
X = df_sub.iloc[:, :-1]
y = df_sub.iloc[:, -1]  # the last column is the target

# Oversampling
distribution_classes = y.value_counts()
print("Class distribution before oversmapling")
print(distribution_classes.to_dict())
# NOTE: Oversampling so each class has at least 100 sample; to properly apply CV and evaluation
dict_oversmapling = {
    "Metal Cassette": 100,
    "Carton tube with or w/o": 100,
    "Wooden box": 100,
    "Fabric packaging": 100,
    "Book packaging": 100,
}
# define oversampling strategy
oversampler = RandomOverSampler(sampling_strategy=dict_oversmapling, random_state=SEED)
# fit and apply the transform
X_oversample, y_oversample = oversampler.fit_resample(X, y)
print("Class distribution after oversmapling")
print(y_oversample.value_counts().to_dict())

# Generate train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X_oversample, y_oversample, test_size=0.2, stratify=y_oversample, random_state=SEED
)

Class distribution before oversmapling
{'Hanger/ Clip': 13543, 'Tube': 11687, 'Blister and Insert Card': 8744, 'TightPack': 8296, 'Folding carton': 8219, 'Blister and sealed blist': 7912, 'Corrugated carton': 3872, 'Paperboard pouch': 3478, 'Trap Folding Card': 2188, 'Plastic Pouch': 1904, 'Plastic bag with header': 1850, 'Plastic Cassette': 1708, 'Shrink film and insert o': 1499, 'Plastic Box': 1491, 'Unpacked': 1415, 'Skincard': 1143, 'Trap Card': 804, 'Cardb. Sleeve w - w/o Shr.': 676, 'Carton cover (Lid box)': 652, 'Case': 485, 'Tray Packer': 431, 'Cardboard hanger w/o bag': 400, 'Envelope': 295, 'Countertop display': 150, 'Metal Cassette': 50, 'Carton tube with or w/o': 44, 'Wooden box': 16, 'Fabric packaging': 15, 'Book packaging': 10}
Class distribution after oversmapling
{'Hanger/ Clip': 13543, 'Tube': 11687, 'Blister and Insert Card': 8744, 'TightPack': 8296, 'Folding carton': 8219, 'Blister and sealed blist': 7912, 'Corrugated carton': 3872, 'Paperboard pouch': 3478, 'Trap Fo

In [None]:
# DEFINE & EXECUTE PIPELINE
# Define pipeline
numerical_features = X_train.select_dtypes(include="number").columns.tolist()
numeric_feature_pipeline = Pipeline(
    steps=[
        ("impute", SimpleImputer(strategy="median")),
        ("log_transform", PowerTransformer()),
        # ('scale', MinMaxScaler())
    ]
)
categorical_features = X_train.select_dtypes(exclude="number").columns.tolist()
categorical_feature_pipeline = Pipeline(
    steps=[
        ("impute", SimpleImputer(strategy="most_frequent")),
        ("ordinal", OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1)),
    ]
)
preprocess_pipeline = ColumnTransformer(
    transformers=[
        ("number", numeric_feature_pipeline, numerical_features),
        ("category", categorical_feature_pipeline, categorical_features),
    ],
    verbose_feature_names_out=False,
).set_output(transform="pandas")
# transform data
X_train_transformed = preprocess_pipeline.fit_transform(X_train)

# encode target variable
label_encoder = OrdinalEncoder(
    handle_unknown="use_encoded_value", unknown_value=-1, encoded_missing_value=-1
)
y_train_transformed = label_encoder.fit_transform(y_train.to_frame())
y_train_transformed = pd.DataFrame(
    data=y_train_transformed, index=y_train.index, columns=[y_train.name]
)

## Transform to AutoML data format

In [18]:
df_train = pd.concat([X_train_transformed, y_train_transformed], axis=1)

In [None]:
train_data = TabularDataset(df_train)

## AutoML training pipeline

In [None]:
label = "packaging_category"
automl_predictor = TabularPredictor(
    label=label, problem_type="multiclass", eval_metric="f1_macro", sample_weight="balance_weight"
).fit(
    train_data=train_data,
    tuning_data=None,  # If tuning_data = None, fit() will automatically hold out some random validation examples from train_data.
    holdout_frac=0.2,  # Default value (if None) is selected based on the number of rows in the training data.
    time_limit=experiment_time_limit,  # 3*60*60
    presets=["high_quality"],  # ['high_quality'] # default = ['medium_quality'], any user-specified arguments in fit() will override the values used by presets.
    # auto_stack=False, # Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy.
    # included_model_types=['LR', 'KNN', 'RF', 'XT', 'GBM', 'XGB', 'CAT', 'NN'],
    # excluded_model_types=['FASTAI', 'AG_AUTOMM'],
    hyperparameter_tune_kwargs={  # HPO is not performed unless hyperparameter_tune_kwargs is specified. Searchspaces are provided for some models, but not for all. Where no searchspace is provided, a fixed set of hyper-parameters is defined. (see /searchspace under each model: https://github.com/autogluon/autogluon/tree/master/tabular/src/autogluon/tabular/models).
        # 'num_trials': 15, # try at most n different hyperparameter configurations for each type of model
        "scheduler": "local",
        "searcher": "auto",  # ‘auto’: Perform bayesian optimization search on NN_TORCH and FASTAI models. Perform random search on other models.
    },  # Refer to TabularPredictor.fit docstring for all valid values
)

2024-11-15 16:30:21,831	INFO timeout.py:54 -- Reached timeout of 50.07890300949415 seconds. Stopping all trials.
2024-11-15 16:30:21,916	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/mnt/batch/tasks/shared/LS_root/mounts/clusters/packaginge4dsv5/code/Users/david.tiefenthaler/ml_packaging_classification/notebooks/AutogluonModels/ag-20241115_073223/models/NeuralNetTorch_r89_BAG_L2' in 0.0819s.
- 086e0292: FileNotFoundError('Could not fetch metrics for 086e0292: both result.json and progress.csv were not found at /mnt/batch/tasks/shared/LS_root/mounts/clusters/packaginge4dsv5/code/Users/david.tiefenthaler/ml_packaging_classification/notebooks/AutogluonModels/ag-20241115_073223/models/NeuralNetTorch_r89_BAG_L2/086e0292')
- 340dff9b: FileNotFoundError('Could not fetch metrics for 340dff9b: both result.json and progress.csv were not found at /mnt/batch/tasks/shared/LS_root/mounts/clusters/packaginge4dsv5/code/Users/david.tiefenthaler/ml_packaging

In [22]:
# Evaluation of models on training data
automl_predictor.leaderboard()

Unnamed: 0,model,score_val,eval_metric,pred_time_val,fit_time,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L3,0.824392,f1_macro,771.757639,3815.627307,0.033280,25.249252,3,False,111
1,ExtraTreesGini_BAG_L2,0.821937,f1_macro,748.384896,2842.372934,31.539009,83.596222,2,False,64
2,ExtraTrees_r49_BAG_L2,0.821862,f1_macro,745.445762,2834.502270,28.599875,75.725558,2,False,88
3,ExtraTreesEntr_BAG_L2,0.815051,f1_macro,749.449926,2836.742735,32.604039,77.966023,2,False,65
4,ExtraTrees_r126_BAG_L2,0.814788,f1_macro,756.840240,2847.845663,39.994352,89.068951,2,False,105
...,...,...,...,...,...,...,...,...,...,...
217,CatBoost_r137_BAG_L1_FULL,,f1_macro,,9.027504,,9.027504,1,True,125
218,CatBoost_r12_BAG_L2_FULL,,f1_macro,,657.698145,,24.702838,2,True,215
219,CatBoost_r12_BAG_L1_FULL,,f1_macro,,9.576006,,9.576006,1,True,167
220,CatBoost_r128_BAG_L2_FULL,,f1_macro,,659.308987,,26.313679,2,True,202


## Evaluate AutoML experiment and best model

In [None]:
# Evaluation of models on test data

# NOTE: Load a TabularPredictor object previously produced by fit() from file and returns this object.
try:
    # NOTE: set the directory to the saved model
    specific_path = None  # Default: None ; Path fromat: 'AutogluonModels/ag-20241113_002120'
    autogluon_saved_model_path = specific_path if specific_path else automl_predictor.path
    automl_predictor = (
        automl_predictor
        if automl_predictor
        else TabularPredictor.load(
            f"{config['autogluon_exp_storage_directory']}/{autogluon_saved_model_path}"
        )
    )
    print(f"Model loaded from: {automl_predictor.path}")
except Exception as e:
    print(f"Model could not be loaded. An error occurred: {e}")

# process X_test for evaluation and predictions
X_test_transformed = preprocess_pipeline.transform(X_test)

# evaluate models on test data
y_test_transformed = label_encoder.transform(y_test.to_frame())
y_test_transformed = pd.DataFrame(
    data=y_test_transformed, index=y_test.index, columns=[y_test.name]
)
df_test = pd.concat([X_test_transformed, y_test_transformed], axis=1)
test_data = TabularDataset(df_test)

automl_custom_leaderboard_testdata = automl_predictor.leaderboard(test_data)
automl_custom_leaderboard_testdata.head(10)

Model loaded from: AutogluonModels/ag-20241115_073223


Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,ExtraTreesGini_BAG_L2_FULL,0.786872,,f1_macro,218.199805,,679.799691,6.36683,31.539009,46.804384,2,True,175
1,ExtraTrees_r49_BAG_L2_FULL,0.786106,,f1_macro,217.565829,,676.014511,5.732855,28.599875,43.019204,2,True,199
2,ExtraTrees_r126_BAG_L2_FULL,0.785957,,f1_macro,217.883931,,678.098134,6.050957,39.994352,45.102827,2,True,216
3,ExtraTreesEntr_BAG_L2_FULL,0.781232,,f1_macro,218.401219,,673.400998,6.568244,32.604039,40.40569,2,True,176
4,WeightedEnsemble_L3_FULL,0.778957,,f1_macro,225.464116,,906.921374,0.039502,,25.249252,3,True,222
5,WeightedEnsemble_L2_FULL,0.775156,,f1_macro,54.031088,,119.503682,0.048011,,25.609248,2,True,174
6,RandomForest_r166_BAG_L2_FULL,0.769835,,f1_macro,213.99526,,678.514629,2.162286,11.719561,45.519322,2,True,207
7,ExtraTrees_r197_BAG_L1_FULL,0.761419,,f1_macro,13.178643,3.091014,8.78458,13.178643,3.091014,8.78458,1,True,165
8,ExtraTrees_r197_BAG_L1,0.761419,0.781275,f1_macro,13.381466,3.091014,22.797157,13.381466,3.091014,22.797157,1,True,54
9,ExtraTrees_r42_BAG_L1_FULL,0.757995,,f1_macro,14.510466,3.319732,7.316921,14.510466,3.319732,7.316921,1,True,124


In [24]:
automl_custom_leaderboard_testdata.model.unique()

array(['ExtraTreesGini_BAG_L2_FULL', 'ExtraTrees_r49_BAG_L2_FULL',
       'ExtraTrees_r126_BAG_L2_FULL', 'ExtraTreesEntr_BAG_L2_FULL',
       'WeightedEnsemble_L3_FULL', 'WeightedEnsemble_L2_FULL',
       'RandomForest_r166_BAG_L2_FULL', 'ExtraTrees_r197_BAG_L1_FULL',
       'ExtraTrees_r197_BAG_L1', 'ExtraTrees_r42_BAG_L1_FULL',
       'ExtraTrees_r42_BAG_L1', 'LightGBM_r130_BAG_L1_FULL',
       'ExtraTreesGini_BAG_L1_FULL', 'ExtraTreesGini_BAG_L1',
       'ExtraTrees_r49_BAG_L1_FULL', 'ExtraTrees_r49_BAG_L1',
       'ExtraTreesEntr_BAG_L1_FULL', 'ExtraTreesEntr_BAG_L1',
       'RandomForestEntr_BAG_L1_FULL', 'RandomForestEntr_BAG_L1',
       'RandomForestGini_BAG_L1_FULL', 'RandomForest_r166_BAG_L1_FULL',
       'RandomForest_r166_BAG_L1', 'RandomForestGini_BAG_L1',
       'RandomForest_r195_BAG_L1_FULL', 'RandomForest_r195_BAG_L1',
       'RandomForest_r39_BAG_L1_FULL', 'RandomForest_r39_BAG_L1',
       'LightGBMLarge_BAG_L1_FULL', 'RandomForest_r16_BAG_L1_FULL',
       'RandomFores

In [None]:
automl_custom_leaderboard_testdata[
    automl_custom_leaderboard_testdata["model"].str.contains("ExtraTreesGini")
]

Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,ExtraTreesGini_BAG_L2_FULL,0.786872,,f1_macro,218.199805,,679.799691,6.36683,31.539009,46.804384,2,True,175
12,ExtraTreesGini_BAG_L1_FULL,0.74954,,f1_macro,14.247414,3.345275,6.09411,14.247414,3.345275,6.09411,1,True,116
13,ExtraTreesGini_BAG_L1,0.74954,0.776159,f1_macro,15.086132,3.345275,21.938596,15.086132,3.345275,21.938596,1,True,5
132,ExtraTreesGini_BAG_L2,,0.821937,f1_macro,,748.384896,2842.372934,,31.539009,83.596222,2,False,64


In [None]:
# For a single specified model: make predictions and perform detailed evaluation on hold out test data
# i = -1  # index of model to use
# model_to_use = automl_predictor.model_names()[i]
# model_to_use = automl_custom_leaderboard_testdata.iloc[0, 0] # use best model from leaderboard
model_to_use = automl_predictor.model_best
print(f"Model to be evaluated: {model_to_use}")
preds_y_test = automl_predictor.predict(X_test_transformed, model=model_to_use)
print("Predictions:  ", list(preds_y_test)[:5])

preds_y_test_inverse = label_encoder.inverse_transform(preds_y_test.to_frame())

# print classification report for holdout test data
print(classification_report(y_test, preds_y_test_inverse))
report = classification_report(y_test, preds_y_test_inverse, output_dict=True)
f1_score = report["accuracy"]
f1_macro = report["macro avg"]["f1-score"]

# get best model parameters for mlflow tracking
trainer = automl_predictor._trainer
best_model = trainer.load_model(trainer.model_best)

Model to be evaluated: WeightedEnsemble_L3_FULL


Predictions:   [24.0, 7.0, 1.0, 26.0, 26.0]
                            precision    recall  f1-score   support

   Blister and Insert Card       0.79      0.84      0.82      1749
  Blister and sealed blist       0.79      0.80      0.79      1582
            Book packaging       1.00      1.00      1.00        20
Cardb. Sleeve w - w/o Shr.       0.73      0.60      0.66       135
  Cardboard hanger w/o bag       0.73      0.28      0.40        80
    Carton cover (Lid box)       0.69      0.52      0.59       130
   Carton tube with or w/o       0.94      0.85      0.89        20
                      Case       0.35      0.75      0.47        97
         Corrugated carton       0.81      0.77      0.79       774
        Countertop display       0.84      0.87      0.85        30
                  Envelope       0.93      0.90      0.91        59
          Fabric packaging       1.00      1.00      1.00        20
            Folding carton       0.81      0.78      0.80      1644
   

## Track performance using MLflow

In [None]:
# NOTE: Change to a meaningful name
EXPERIMENT_NAME = "AutoPackagingCategories"
RUN_NAME = "run_AutoML_AutoGluonTabular"

# with open('../env_vars_azureml_compute.yml', 'r') as file:
#     env_vars = yaml.safe_load(file)

mlflow_dir = config["mlflow_benchmark_directory"]
os.makedirs(mlflow_dir, exist_ok=True)
mlflow.set_tracking_uri("file://" + mlflow_dir)

try:
    experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
    EXPERIMENT_ID = experiment.experiment_id
except AttributeError:
    EXPERIMENT_ID = mlflow.create_experiment(
        EXPERIMENT_NAME,
        # mlflow.set_artifact_uri("file://" + project_dir + "/artifacts/")
    )

current_time = datetime.datetime.now()
time_stamp = str(current_time)
# NOTE: Change to a meaningful name for the single trial
# exp_run_name = f"run_MeaningfulTrialName_{time_stamp}"
exp_run_name = f"{RUN_NAME}_{time_stamp}"

# Start MLflow
with mlflow.start_run(experiment_id=EXPERIMENT_ID, run_name=exp_run_name) as run:

    # Retrieve run id
    RUN_ID = run.info.run_id

    # Track parameters
    # track pipeline configs: preprocessing_pipeline
    mlflow.log_dict(
        {"oversampler": type(oversampler), "label_encoder": type(label_encoder)}
        | preprocess_pipeline.named_transformers_,
        "preprocessing_pipeline.json",
    )

    # mode specfic parameters
    mlflow.log_param("model", f"{type(best_model)}: {best_model.base_model_names}")
    mlflow.log_param("model_configs", best_model.get_trained_params())

    # Track metrics
    mlflow.log_dict(report, "classification_report.json")
    mlflow.log_metric("Report_Test_f1_score", f1_score)
    mlflow.log_metric("Report_Test_f1_macro", f1_macro)

    # Track model
    # mlflow.sklearn.log_model(clf, "classifier")