# AutoGluon Tabular on GPU (Extra Credit 1.5)

Small run of the tabular GPU tutorial on the Adult Income dataset; will fall back to CPU if CUDA is unavailable.

In [1]:
from autogluon.tabular import TabularPredictor
from sklearn.model_selection import train_test_split
import pandas as pd
from pathlib import Path
import torch

try:
    HERE = Path(__file__).resolve().parent
except NameError:
    HERE = Path('.').resolve()
DATA_PATH = (HERE / "data" / "train.csv").resolve()
TIME_LIMIT = 60
SAMPLE_FRAC = 0.2

In [2]:
print("CUDA available:", torch.cuda.is_available())

CUDA available: False


In [3]:
df = pd.read_csv(DATA_PATH)
df_sample = df.sample(frac=SAMPLE_FRAC, random_state=42)
train_df, test_df = train_test_split(df_sample, test_size=0.2, stratify=df_sample["class"], random_state=42)
print("Train", train_df.shape, "Test", test_df.shape)
train_df.head()

Train (6252, 15) Test (1563, 15)


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
4335,25,Private,135603,HS-grad,9,Married-civ-spouse,Transport-moving,Husband,White,Male,0,0,40,United-States,>50K
29019,61,Private,241013,7th-8th,4,Widowed,Farming-fishing,Not-in-family,Black,Male,0,0,40,United-States,<=50K
1704,21,Private,83141,HS-grad,9,Never-married,Farming-fishing,Not-in-family,White,Male,0,0,53,United-States,<=50K
36853,31,Private,271933,Some-college,10,Married-civ-spouse,Machine-op-inspct,Wife,White,Female,0,0,40,United-States,<=50K
2097,20,Private,122166,Some-college,10,Never-married,Adm-clerical,Not-in-family,White,Female,0,0,40,United-States,<=50K


In [4]:
predictor = TabularPredictor(label="class", path="Assignment 6/AutoGluon/extra-credit/gpu/ag_gpu_models", problem_type="binary", eval_metric="roc_auc")
predictor.fit(train_df, tuning_data=test_df, presets="medium_quality_faster_train", time_limit=TIME_LIMIT, num_gpus="auto")

Preset alias specified: 'medium_quality_faster_train' maps to 'medium_quality'.


Verbosity: 2 (Standard Logging)


AutoGluon Version:  1.1.1
Python Version:     3.11.14
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 25.1.0: Mon Oct 20 19:32:41 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6000
CPU Count:          8
Memory Avail:       3.15 GB / 16.00 GB (19.7%)
Disk Space Avail:   268.82 GB / 460.43 GB (58.4%)


Presets specified: ['medium_quality_faster_train']


Beginning AutoGluon training ... Time limit = 60s


AutoGluon will save models to "Assignment 6/AutoGluon/extra-credit/gpu/ag_gpu_models"


Train Data Rows:    6252


Train Data Columns: 14


Tuning Data Rows:    1563


Tuning Data Columns: 14


Label Column:       class


Problem Type:       binary


Preprocessing data ...


Selected class <--> label mapping:  class 1 =  >50K, class 0 =  <=50K


	Note: For your binary classification, AutoGluon arbitrarily selected which label-value represents positive ( >50K) vs negative ( <=50K) class.
	To explicitly set the positive_class, either rename classes to 1 and 0, or specify positive_class in Predictor init.


Using Feature Generators to preprocess the data ...


Fitting AutoMLPipelineFeatureGenerator...


	Available Memory:                    3230.10 MB


	Train Data (Original)  Memory Usage: 4.37 MB (0.1% of available memory)


	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.


	Stage 1 Generators:


		Fitting AsTypeFeatureGenerator...


			Note: Converting 1 features to boolean dtype as they only contain 2 unique values.


	Stage 2 Generators:


		Fitting FillNaFeatureGenerator...


	Stage 3 Generators:


		Fitting IdentityFeatureGenerator...


		Fitting CategoryFeatureGenerator...


			Fitting CategoryMemoryMinimizeFeatureGenerator...


	Stage 4 Generators:


		Fitting DropUniqueFeatureGenerator...


	Stage 5 Generators:


		Fitting DropDuplicatesFeatureGenerator...


	Types of features in original data (raw dtype, special dtypes):


		('int', [])    : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('object', []) : 8 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


	Types of features in processed data (raw dtype, special dtypes):


		('category', [])  : 7 | ['workclass', 'education', 'marital-status', 'occupation', 'relationship', ...]


		('int', [])       : 6 | ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', ...]


		('int', ['bool']) : 1 | ['sex']


	0.0s = Fit runtime


	14 features in original data used to generate 14 features in processed data.


	Train Data (Processed) Memory Usage: 0.42 MB (0.0% of available memory)


Data preprocessing and feature engineering runtime = 0.06s ...


AutoGluon will gauge predictive performance using evaluation metric: 'roc_auc'


	This metric expects predicted probabilities rather than predicted class labels, so you'll need to use predict_proba() instead of predict()


	To change this, specify the eval_metric parameter of Predictor()


User-specified model hyperparameters to be fit:
{
	'NN_TORCH': {},
	'GBM': [{'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}}, {}, 'GBMLarge'],
	'CAT': {},
	'XGB': {},
	'FASTAI': {},
	'RF': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'XT': [{'criterion': 'gini', 'ag_args': {'name_suffix': 'Gini', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'entropy', 'ag_args': {'name_suffix': 'Entr', 'problem_types': ['binary', 'multiclass']}}, {'criterion': 'squared_error', 'ag_args': {'name_suffix': 'MSE', 'problem_types': ['regression', 'quantile']}}],
	'KNN': [{'weights': 'uniform', 'ag_args': {'name_suffix': 'Unif'}}, {'weights': 'distance', 'ag_args': {'name_suffix': 'Dist'}}],
}


Fitting 13 L1 models ...


Fitting model: KNeighborsUnif ... Training model for up to 59.94s of the 59.94s of remaining time.


	0.6288	 = Validation score   (roc_auc)


	0.01s	 = Training   runtime


	0.03s	 = Validation runtime


Fitting model: KNeighborsDist ... Training model for up to 59.9s of the 59.9s of remaining time.


	0.6364	 = Validation score   (roc_auc)


	0.0s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: LightGBMXT ... Training model for up to 59.88s of the 59.88s of remaining time.


	0.9068	 = Validation score   (roc_auc)


	0.77s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: LightGBM ... Training model for up to 59.1s of the 59.1s of remaining time.


	0.9097	 = Validation score   (roc_auc)


	0.48s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: RandomForestGini ... Training model for up to 58.62s of the 58.61s of remaining time.


	0.8946	 = Validation score   (roc_auc)


	0.43s	 = Training   runtime


	0.03s	 = Validation runtime


Fitting model: RandomForestEntr ... Training model for up to 58.13s of the 58.13s of remaining time.


	0.8939	 = Validation score   (roc_auc)


	0.34s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: CatBoost ... Training model for up to 57.72s of the 57.72s of remaining time.


	0.9144	 = Validation score   (roc_auc)


	6.0s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: ExtraTreesGini ... Training model for up to 51.71s of the 51.71s of remaining time.


	0.8932	 = Validation score   (roc_auc)


	0.33s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: ExtraTreesEntr ... Training model for up to 51.3s of the 51.3s of remaining time.


	0.8944	 = Validation score   (roc_auc)


	0.29s	 = Training   runtime


	0.04s	 = Validation runtime


Fitting model: NeuralNetFastAI ... Training model for up to 50.94s of the 50.94s of remaining time.




		Import fastai failed. A quick tip is to install via `pip install autogluon.tabular[fastai]==1.1.1`. 


Fitting model: XGBoost ... Training model for up to 50.88s of the 50.88s of remaining time.


	0.9111	 = Validation score   (roc_auc)


	0.7s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: NeuralNetTorch ... Training model for up to 50.17s of the 50.17s of remaining time.


	0.8908	 = Validation score   (roc_auc)


	4.44s	 = Training   runtime


	0.01s	 = Validation runtime


Fitting model: LightGBMLarge ... Training model for up to 45.72s of the 45.71s of remaining time.


	0.9025	 = Validation score   (roc_auc)


	2.49s	 = Training   runtime


	0.0s	 = Validation runtime


Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.94s of the 43.2s of remaining time.


	Ensemble Weights: {'CatBoost': 0.52, 'XGBoost': 0.32, 'ExtraTreesEntr': 0.08, 'KNeighborsUnif': 0.04, 'NeuralNetTorch': 0.04}


	0.9157	 = Validation score   (roc_auc)


	0.05s	 = Training   runtime


	0.0s	 = Validation runtime


AutoGluon training complete, total runtime = 16.86s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 18512.0 rows/s (1563 batch size)


TabularPredictor saved. To load, use: predictor = TabularPredictor.load("Assignment 6/AutoGluon/extra-credit/gpu/ag_gpu_models")


<autogluon.tabular.predictor.predictor.TabularPredictor at 0x13c145b50>

In [5]:
lb = predictor.leaderboard(test_df, silent=True)
lb.head()

Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,0.915695,0.915695,roc_auc,0.11588,0.084432,11.482621,0.001575,0.000266,0.050801,2,True,13
1,CatBoost,0.914429,0.914429,roc_auc,0.008857,0.003739,5.99675,0.008857,0.003739,5.99675,1,True,7
2,XGBoost,0.91109,0.91109,roc_auc,0.010275,0.005284,0.696352,0.010275,0.005284,0.696352,1,True,10
3,LightGBM,0.909682,0.909682,roc_auc,0.005929,0.004073,0.480061,0.005929,0.004073,0.480061,1,True,4
4,LightGBMXT,0.906838,0.906838,roc_auc,0.006783,0.004826,0.765619,0.006783,0.004826,0.765619,1,True,3


In [6]:
metrics = predictor.evaluate(test_df)
preds = predictor.predict(test_df.head(5))
probs = predictor.predict_proba(test_df.head(5))
metrics

{'roc_auc': 0.9156945257442581,
 'accuracy': 0.8611644273832374,
 'balanced_accuracy': 0.7663116873539484,
 'mcc': 0.5837500593402173,
 'f1': 0.6625194401244168,
 'precision': 0.7553191489361702,
 'recall': 0.590027700831025}