# Multi-label tabular with AutoGluon (binary one-vs-rest)

Using two binary TabularPredictor models as a workaround because MultiLabelPredictor is unavailable on this Python version.

In [1]:
from autogluon.tabular import TabularPredictor
import pandas as pd
from sklearn.model_selection import train_test_split
from pathlib import Path

DATA_PATH = Path("Assignment 6/AutoGluon/extra-credit/multilabel/data/train_multi.csv")
LABEL_COLS = ["class_>50K", "class_<=50K"]
TIME_LIMIT = 30

In [2]:
df = pd.read_csv(DATA_PATH)
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df["class"])
print("Train", train_df.shape, "Val", val_df.shape)
train_df.head()

Train (31258, 17) Val (7815, 17)


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class,class_>50K,class_<=50K
30271,21,Private,138768,Some-college,10,Never-married,Sales,Other-relative,White,Male,0,0,40,United-States,<=50K,0,1
36316,59,Private,296253,HS-grad,9,Divorced,Exec-managerial,Not-in-family,White,Female,0,0,40,United-States,<=50K,0,1
2694,50,Private,158948,7th-8th,4,Married-civ-spouse,Craft-repair,Husband,White,Male,3411,0,40,United-States,<=50K,0,1
39056,39,Private,185084,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,55,United-States,<=50K,0,1
30189,46,Private,295334,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,60,United-States,>50K,1,0


In [3]:
results = {}
pred_frames = []
for label in LABEL_COLS:
    path = Path("Assignment 6/AutoGluon/extra-credit/multilabel/ag_multilabel_models") / label
    predictor = TabularPredictor(label=label, path=path, problem_type="binary", eval_metric="f1")
    predictor.fit(train_df, tuning_data=val_df, time_limit=TIME_LIMIT, presets="medium_quality_faster_train")
    metrics = predictor.evaluate(val_df)
    results[label] = metrics
    preds = predictor.predict(val_df)
    prob = predictor.predict_proba(val_df)[1]
    pred_frames.append(pd.DataFrame({f"pred_{label}": preds, f"prob_{label}": prob}))
combined = pd.concat(pred_frames, axis=1)
combined.head()

Preset alias specified: 'medium_quality_faster_train' maps to 'medium_quality'.


Verbosity: 2 (Standard Logging)


AutoGluon Version:  1.1.1
Python Version:     3.11.14
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 25.1.0: Mon Oct 20 19:32:41 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6000
CPU Count:          8
Memory Avail:       3.75 GB / 16.00 GB (23.4%)
Disk Space Avail:   268.95 GB / 460.43 GB (58.4%)


Presets specified: ['medium_quality_faster_train']


Beginning AutoGluon training ... Time limit = 30s


AutoGluon will save models to "Assignment 6/AutoGluon/extra-credit/multilabel/ag_multilabel_models/class_>50K"


Train Data Rows:    31258


Train Data Columns: 16


Tuning Data Rows:    7815


Tuning Data Columns: 16


Label Column:       class_>50K


Problem Type:       binary


Preprocessing data ...


Selected class <--> label mapping:  class 1 = 1, class 0 = 0


Using Feature Generators to preprocess the data ...


Fitting AutoMLPipelineFeatureGenerator...


	Available Memory:                    3859.00 MB


	Train Data (Original)  Memory Usage: 24.50 MB (0.6% of available memory)


	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.


	Stage 1 Generators:


		Fitting AsTypeFeatureGenerator...


			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.


	Stage 2 Generators:


		Fitting FillNaFeatureGenerator...


	Stage 3 Generators:


		Fitting IdentityFeatureGenerator...


		Fitting CategoryFeatureGenerator...


			Fitting CategoryMemoryMinimizeFeatureGenerator...


	Stage 4 Generators:


		Fitting DropUniqueFeatureGenerator...


	Stage 5 Generators:


		Fitting DropDuplicatesFeatureGenerator...


	Types of features in original data (raw dtype, special dtypes):


		('int', [])    : 7 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('object', []) : 9 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


	Types of features in processed data (raw dtype, special dtypes):


		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('int', ['bool']) : 3 | ['sex', 'class', 'class_<=50K']


	0.1s = Fit runtime


	16 features in original data used to generate 16 features in processed data.


	Train Data (Processed) Memory Usage: 2.16 MB (0.1% of available memory)


Data preprocessing and feature engineering runtime = 0.15s ...


AutoGluon will gauge predictive performance using evaluation metric: 'f1'


	To change this, specify the eval_metric parameter of Predictor()


User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}


Fitting 13 L1 models ...


Fitting model: KNeighborsUnif ... Training model for up to 29.85s of the 29.85s of remaining time.


	0.4021	 = Validation score   (f1)


	0.74s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: KNeighborsDist ... Training model for up to 29.06s of the 29.06s of remaining time.


	0.4434	 = Validation score   (f1)


	0.02s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: LightGBMXT ... Training model for up to 29.02s of the 29.02s of remaining time.


	1.0	 = Validation score   (f1)


	0.58s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: LightGBM ... Training model for up to 28.43s of the 28.43s of remaining time.


	1.0	 = Validation score   (f1)


	0.4s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: RandomForestGini ... Training model for up to 28.02s of the 28.02s of remaining time.


	1.0	 = Validation score   (f1)


	0.55s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: RandomForestEntr ... Training model for up to 27.42s of the 27.42s of remaining time.


	1.0	 = Validation score   (f1)


	0.41s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: CatBoost ... Training model for up to 26.96s of the 26.96s of remaining time.


	1.0	 = Validation score   (f1)


	4.16s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: ExtraTreesGini ... Training model for up to 22.79s of the 22.79s of remaining time.


	1.0	 = Validation score   (f1)


	0.35s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: ExtraTreesEntr ... Training model for up to 22.39s of the 22.39s of remaining time.


	1.0	 = Validation score   (f1)


	0.35s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: NeuralNetFastAI ... Training model for up to 22.0s of the 21.99s of remaining time.




		Import fastai failed. A quick tip is to install via `pip install autogluon.tabular[fastai]==1.1.1`. 


Fitting model: XGBoost ... Training model for up to 21.93s of the 21.93s of remaining time.


	1.0	 = Validation score   (f1)


	1.33s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: NeuralNetTorch ... Training model for up to 20.59s of the 20.59s of remaining time.


	1.0	 = Validation score   (f1)


	7.43s	 = Training   runtime


	0.02s	 = Validation runtime


Fitting model: LightGBMLarge ... Training model for up to 13.13s of the 13.13s of remaining time.


	1.0	 = Validation score   (f1)


	0.45s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.85s of the 12.67s of remaining time.


	Ensemble Weights: {'ExtraTreesGini': 1.0}


	1.0	 = Validation score   (f1)


	0.48s	 = Training   runtime


	0.0s	 = Validation runtime


AutoGluon training complete, total runtime = 17.82s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 185921.3 rows/s (7815 batch size)


TabularPredictor saved. To load, use: predictor = TabularPredictor.load("Assignment 6/AutoGluon/extra-credit/multilabel/ag_multilabel_models/class_>50K")


Preset alias specified: 'medium_quality_faster_train' maps to 'medium_quality'.


Verbosity: 2 (Standard Logging)


AutoGluon Version:  1.1.1
Python Version:     3.11.14
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 25.1.0: Mon Oct 20 19:32:41 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6000
CPU Count:          8
Memory Avail:       3.31 GB / 16.00 GB (20.7%)
Disk Space Avail:   268.93 GB / 460.43 GB (58.4%)


Presets specified: ['medium_quality_faster_train']


Beginning AutoGluon training ... Time limit = 30s


AutoGluon will save models to "Assignment 6/AutoGluon/extra-credit/multilabel/ag_multilabel_models/class_<=50K"


Train Data Rows:    31258


Train Data Columns: 16


Tuning Data Rows:    7815


Tuning Data Columns: 16


Label Column:       class_<=50K


Problem Type:       binary


Preprocessing data ...


Selected class <--> label mapping:  class 1 = 1, class 0 = 0


Using Feature Generators to preprocess the data ...


Fitting AutoMLPipelineFeatureGenerator...


	Available Memory:                    3413.36 MB


	Train Data (Original)  Memory Usage: 24.50 MB (0.7% of available memory)


	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.


	Stage 1 Generators:


		Fitting AsTypeFeatureGenerator...


			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.


	Stage 2 Generators:


		Fitting FillNaFeatureGenerator...


	Stage 3 Generators:


		Fitting IdentityFeatureGenerator...


		Fitting CategoryFeatureGenerator...


			Fitting CategoryMemoryMinimizeFeatureGenerator...


	Stage 4 Generators:


		Fitting DropUniqueFeatureGenerator...


	Stage 5 Generators:


		Fitting DropDuplicatesFeatureGenerator...


	Unused Original Features (Count: 1): ['class_>50K']


		These features were not used to generate any of the output features. Add a feature generator compatible with these features to utilize them.


		Features can also be unused if they carry very little information, such as being categorical but having almost entirely unique values or being duplicates of other features.


		These features do not need to be present at inference time.


		('int', []) : 1 | ['class_>50K']


	Types of features in original data (raw dtype, special dtypes):


		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('object', []) : 9 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


	Types of features in processed data (raw dtype, special dtypes):


		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('int', ['bool']) : 2 | ['sex', 'class']


	0.2s = Fit runtime


	15 features in original data used to generate 15 features in processed data.


	Train Data (Processed) Memory Usage: 2.13 MB (0.1% of available memory)


Data preprocessing and feature engineering runtime = 0.19s ...


AutoGluon will gauge predictive performance using evaluation metric: 'f1'


	To change this, specify the eval_metric parameter of Predictor()


User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}


Fitting 13 L1 models ...


Fitting model: KNeighborsUnif ... Training model for up to 29.81s of the 29.81s of remaining time.


	0.8551	 = Validation score   (f1)


	0.02s	 = Training   runtime


	0.02s	 = Validation runtime


Fitting model: KNeighborsDist ... Training model for up to 29.77s of the 29.76s of remaining time.


	0.8508	 = Validation score   (f1)


	0.02s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: LightGBMXT ... Training model for up to 29.73s of the 29.73s of remaining time.


	1.0	 = Validation score   (f1)


	0.44s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: LightGBM ... Training model for up to 29.28s of the 29.28s of remaining time.


	1.0	 = Validation score   (f1)


	0.52s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: RandomForestGini ... Training model for up to 28.75s of the 28.75s of remaining time.


	1.0	 = Validation score   (f1)


	0.46s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: RandomForestEntr ... Training model for up to 28.23s of the 28.23s of remaining time.


	1.0	 = Validation score   (f1)


	0.48s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: CatBoost ... Training model for up to 27.71s of the 27.71s of remaining time.


	1.0	 = Validation score   (f1)


	5.14s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: ExtraTreesGini ... Training model for up to 22.56s of the 22.56s of remaining time.


	1.0	 = Validation score   (f1)


	0.37s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: ExtraTreesEntr ... Training model for up to 22.15s of the 22.15s of remaining time.


	1.0	 = Validation score   (f1)


	0.36s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: NeuralNetFastAI ... Training model for up to 21.73s of the 21.73s of remaining time.




		Import fastai failed. A quick tip is to install via `pip install autogluon.tabular[fastai]==1.1.1`. 


Fitting model: XGBoost ... Training model for up to 21.66s of the 21.66s of remaining time.


	1.0	 = Validation score   (f1)


	1.32s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: NeuralNetTorch ... Training model for up to 20.32s of the 20.32s of remaining time.


	1.0	 = Validation score   (f1)


	7.19s	 = Training   runtime


	0.02s	 = Validation runtime


Fitting model: LightGBMLarge ... Training model for up to 13.11s of the 13.11s of remaining time.


	1.0	 = Validation score   (f1)


	0.94s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: WeightedEnsemble_L2 ... Training model for up to 29.81s of the 12.16s of remaining time.


	Ensemble Weights: {'ExtraTreesGini': 1.0}


	1.0	 = Validation score   (f1)


	0.48s	 = Training   runtime


	0.0s	 = Validation runtime


AutoGluon training complete, total runtime = 18.33s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 192862.3 rows/s (7815 batch size)


TabularPredictor saved. To load, use: predictor = TabularPredictor.load("Assignment 6/AutoGluon/extra-credit/multilabel/ag_multilabel_models/class_<=50K")


Unnamed: 0,pred_class_>50K,prob_class_>50K,pred_class_<=50K,prob_class_<=50K
29579,0,0.0,1,1.0
26121,0,0.0,1,1.0
25570,0,0.0,1,1.0
16496,0,0.0,1,1.0
33535,0,0.0,1,1.0


In [4]:
results

{'class_>50K': {'f1': 1.0,
  'accuracy': 1.0,
  'balanced_accuracy': 1.0,
  'mcc': 1.0,
  'roc_auc': 1.0,
  'precision': 1.0,
  'recall': 1.0},
 'class_<=50K': {'f1': 1.0,
  'accuracy': 1.0,
  'balanced_accuracy': 1.0,
  'mcc': 1.0,
  'roc_auc': 1.0,
  'precision': 1.0,
  'recall': 1.0}}