# AutoML Library Showcase: H2O, AutoGluon, and FLAML

This notebook provides a comparative demonstration of three popular open-source AutoML libraries: **H2O AutoML**, **AutoGluon**, and **FLAML**. We will walk through a standard machine learning workflow for a classification task, showcasing how each library automates key steps such as data preprocessing, model selection, training, and evaluation. This serves as a practical guide for students and practitioners looking for low-code alternatives to PyCaret.

## 1. Setup and Data Loading

First, we import the necessary libraries and load the dataset. For this demonstration, we will use the classic Iris dataset, a simple yet effective multi-class classification problem.

In [1]:
!python --version

Python 3.11.13


In [2]:
!uv pip install h2o autogluon flaml scikit-learn pandas numpy xgboost ipywidgets

[2mUsing Python 3.11.13 environment at: /Users/tarekatwan/Repos/MyWork/Teach/repos/adv_ml_ds/dev3[0m
[2mAudited [1m8 packages[0m [2min 29ms[0m[0m


In [3]:
import pandas as pd
from sklearn.datasets import load_iris, load_breast_cancer 
from sklearn.model_selection import train_test_split

# Load the dataset
bc = load_breast_cancer()
X = pd.DataFrame(bc.data, columns=bc.feature_names)
y = pd.Series(bc.target, name='target')

# Create a single dataframe
data = pd.concat([X, y], axis=1)

# Split the data
train, test = train_test_split(data, test_size=0.2, random_state=42, stratify=data['target'])

print("Training data shape:", train.shape)
print("Test data shape:", test.shape)
train.head()

Training data shape: (455, 31)
Test data shape: (114, 31)


Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
546,10.32,16.35,65.31,324.9,0.09434,0.04994,0.01012,0.005495,0.1885,0.06201,...,21.77,71.12,384.9,0.1285,0.08842,0.04384,0.02381,0.2681,0.07399,1
432,20.18,19.54,133.8,1250.0,0.1133,0.1489,0.2133,0.1259,0.1724,0.06053,...,25.07,146.0,1479.0,0.1665,0.2942,0.5308,0.2173,0.3032,0.08075,0
174,10.66,15.15,67.49,349.6,0.08792,0.04302,0.0,0.0,0.1928,0.05975,...,19.2,73.2,408.3,0.1076,0.06791,0.0,0.0,0.271,0.06164,1
221,13.56,13.9,88.59,561.3,0.1051,0.1192,0.0786,0.04451,0.1962,0.06303,...,17.13,101.1,686.6,0.1376,0.2698,0.2577,0.0909,0.3065,0.08177,1
289,11.37,18.89,72.17,396.0,0.08713,0.05008,0.02399,0.02173,0.2013,0.05955,...,26.14,79.29,459.3,0.1118,0.09708,0.07529,0.06203,0.3267,0.06994,1


## 2. H2O AutoML

**H2O** is a popular open-source, distributed machine learning platform developed by H2O.ai. H2O AutoML is designed to be easy to use and automates the process of training and tuning a large number of models, returning a leaderboard of the best models.

### Key Features:
- Automatic model training and hyperparameter tuning
- Model stacking and ensembling
- Supports various algorithms: GLM, Random Forests, GBM, Deep Learning
- Scalable for large datasets
- Built-in model explainability

In [4]:
import h2o
from h2o.automl import H2OAutoML

# Initialize H2O cluster
h2o.init()

# Convert data to H2OFrame
h2o_train = h2o.H2OFrame(train)
h2o_test = h2o.H2OFrame(test)

# Convert target to factor (categorical) for classification
h2o_train['target'] = h2o_train['target'].asfactor()
h2o_test['target'] = h2o_test['target'].asfactor()

# Identify predictors and response
x = h2o_train.columns
y = 'target'
x.remove(y)

# Run AutoML - set a time limit for the search
aml_h2o = H2OAutoML(max_models=10, seed=42, max_runtime_secs=120)
aml_h2o.train(x=x, y=y, training_frame=h2o_train)

# View the AutoML Leaderboard
lb_h2o = aml_h2o.leaderboard
print("\nH2O AutoML Leaderboard:")
lb_h2o.head(rows=lb_h2o.nrows)

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "24.0.1" 2025-04-15; OpenJDK Runtime Environment Homebrew (build 24.0.1); OpenJDK 64-Bit Server VM Homebrew (build 24.0.1, mixed mode, sharing)
  Starting server from /Users/tarekatwan/Repos/MyWork/Teach/repos/adv_ml_ds/dev3/lib/python3.11/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/48/j6k669vx63qd_68k2_502cl40000gn/T/tmppl_448xo
  JVM stdout: /var/folders/48/j6k669vx63qd_68k2_502cl40000gn/T/tmppl_448xo/h2o_tarekatwan_started_from_python.out
  JVM stderr: /var/folders/48/j6k669vx63qd_68k2_502cl40000gn/T/tmppl_448xo/h2o_tarekatwan_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Asia/Amman
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.8
H2O_cluster_version_age:,"14 days, 19 hours and 20 minutes"
H2O_cluster_name:,H2O_from_python_tarekatwan_9q9iyt
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,7.980 Gb
H2O_cluster_total_cores:,10
H2O_cluster_allowed_cores:,10


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
AutoML progress: |
10:42:34.824: AutoML: XGBoost is not available; skipping it.

███████████████████████████████████████████████████████████████| (done) 100%

H2O AutoML Leaderboard:


model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
GBM_4_AutoML_1_20251023_104234,0.994448,0.0949278,0.996606,0.0411249,0.168567,0.0284147
GBM_3_AutoML_1_20251023_104234,0.994221,0.0975407,0.996469,0.0352425,0.167118,0.0279283
StackedEnsemble_AllModels_1_AutoML_1_20251023_104234,0.99418,0.0788495,0.996077,0.0258514,0.140149,0.0196417
GLM_1_AutoML_1_20251023_104234,0.994097,0.0790274,0.9958,0.0164603,0.139666,0.0195067
GBM_grid_1_AutoML_1_20251023_104234_model_1,0.994056,0.090784,0.996185,0.0323013,0.159784,0.025531
GBM_2_AutoML_1_20251023_104234,0.99356,0.101833,0.995994,0.0375645,0.171217,0.0293153
GBM_1_AutoML_1_20251023_104234,0.993333,0.0938893,0.995457,0.0311146,0.159312,0.0253803
StackedEnsemble_BestOfFamily_1_AutoML_1_20251023_104234,0.993271,0.081063,0.995298,0.0229102,0.141139,0.0199201
GBM_5_AutoML_1_20251023_104234,0.992446,0.11676,0.995338,0.048194,0.182521,0.0333138
DRF_1_AutoML_1_20251023_104234,0.988132,0.183858,0.989522,0.0416409,0.180192,0.032469


In [14]:
# Get the best model
best_model_h2o = aml_h2o.leader
print("\nBest Model:", best_model_h2o.model_id)

# Make predictions on test data
predictions_h2o = best_model_h2o.predict(h2o_test)
print("\nPredictions:")
display(predictions_h2o.head())

# Evaluate performance
performance_h2o = best_model_h2o.model_performance(h2o_test)
print("\nTest Set Performance:")
display(performance_h2o)


Best Model: GBM_4_AutoML_1_20251023_104234
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%

Predictions:


predict,p0,p1
0,0.999773,0.000227151
1,0.000156023,0.999844
0,0.996515,0.0034847
0,0.961774,0.0382256
0,0.999715,0.000285086
1,0.00391418,0.996086
1,0.000391938,0.999608
0,0.999743,0.000256791
0,0.999741,0.000259156
0,0.999732,0.000267668



Test Set Performance:


Unnamed: 0,0,1,Error,Rate
0,40.0,2.0,0.0476,(2.0/42.0)
1,1.0,71.0,0.0139,(1.0/72.0)
Total,41.0,73.0,0.0263,(3.0/114.0)

metric,threshold,value,idx
max f1,0.4418902,0.9793103,72.0
max f2,0.0382256,0.9863014,76.0
max f0point5,0.4418902,0.9752747,72.0
max accuracy,0.4418902,0.9736842,72.0
max precision,0.9998686,1.0,0.0
max recall,0.0382256,1.0,76.0
max specificity,0.9998686,1.0,0.0
max absolute_mcc,0.4418902,0.9433398,72.0
max min_per_class_accuracy,0.8705182,0.952381,70.0
max mean_per_class_accuracy,0.4418902,0.969246,72.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0175439,0.9998632,1.5833333,1.5833333,1.0,0.9998667,1.0,0.9998667,0.0277778,0.0277778,58.3333333,58.3333333,0.0277778
2,0.0263158,0.9998506,1.5833333,1.5833333,1.0,0.9998523,1.0,0.9998619,0.0138889,0.0416667,58.3333333,58.3333333,0.0416667
3,0.0350877,0.9998451,1.5833333,1.5833333,1.0,0.9998458,1.0,0.9998579,0.0138889,0.0555556,58.3333333,58.3333333,0.0555556
4,0.0438596,0.9998376,1.5833333,1.5833333,1.0,0.999844,1.0,0.9998551,0.0138889,0.0694444,58.3333333,58.3333333,0.0694444
5,0.0526316,0.9998145,1.5833333,1.5833333,1.0,0.9998316,1.0,0.9998512,0.0138889,0.0833333,58.3333333,58.3333333,0.0833333
6,0.1052632,0.9997793,1.5833333,1.5833333,1.0,0.9997934,1.0,0.9998223,0.0833333,0.1666667,58.3333333,58.3333333,0.1666667
7,0.1491228,0.9997424,1.5833333,1.5833333,1.0,0.9997583,1.0,0.9998035,0.0694444,0.2361111,58.3333333,58.3333333,0.2361111
8,0.2017544,0.9996694,1.5833333,1.5833333,1.0,0.9997063,1.0,0.9997781,0.0833333,0.3194444,58.3333333,58.3333333,0.3194444
9,0.2982456,0.9994252,1.5833333,1.5833333,1.0,0.9995652,1.0,0.9997092,0.1527778,0.4722222,58.3333333,58.3333333,0.4722222
10,0.4035088,0.9985792,1.5833333,1.5833333,1.0,0.9990404,1.0,0.9995347,0.1666667,0.6388889,58.3333333,58.3333333,0.6388889


## 3. AutoGluon

**AutoGluon**, developed by Amazon Web Services (AWS), is an AutoML toolkit that simplifies machine learning for tabular, text, and image data. It is known for its high performance and ease of use, often achieving state-of-the-art results with just a few lines of code.

### Key Features:
- Multi-layered model ensembling
- Automated hyperparameter tuning
- Deep learning integration
- Supports multimodal data (tabular, text, images)
- Minimal user intervention required

In [6]:
from autogluon.tabular import TabularDataset, TabularPredictor

# AutoGluon requires the target column to be specified
# We'll use the original column name 'target'

# Initialize the TabularPredictor
predictor_ag = TabularPredictor(label='target', eval_metric='accuracy', path='./ag_models')

# Fit the models - set a time limit for the search
predictor_ag.fit(train_data=train, time_limit=120)

print("\nAutoGluon training complete!")

Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.4.0
Python Version:     3.11.13
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 24.6.0: Mon Aug 11 21:16:21 PDT 2025; root:xnu-11417.140.69.701.11~1/RELEASE_ARM64_T6000
CPU Count:          10
Memory Avail:       13.29 GB / 32.00 GB (41.5%)
Disk Space Avail:   372.17 GB / 926.35 GB (40.2%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme' : New in v1.4: Massively better than 'best' on datasets <30000 samples by using new models meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'    : Maximize accuracy. Recommended for most users


AutoGluon training complete!


In [10]:
# View the leaderboard
leaderboard_ag = predictor_ag.leaderboard(test, silent=True)
print("\nAutoGluon Leaderboard:")
display(leaderboard_ag)

# Get predictions
predictions_ag = predictor_ag.predict(test.drop('target', axis=1))
print("\nPredictions:")
display(predictions_ag.head())

# Evaluate performance
performance_ag = predictor_ag.evaluate(test)
print("\nTest Set Performance:")
display(performance_ag)


AutoGluon Leaderboard:


Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,NeuralNetFastAI,0.982456,0.978022,accuracy,0.005486,0.002919,0.555502,0.005486,0.002919,0.555502,1,True,8
1,LightGBM,0.964912,0.967033,accuracy,0.009686,0.00058,1.345233,0.009686,0.00058,1.345233,1,True,2
2,RandomForestEntr,0.964912,0.945055,accuracy,0.031535,0.026531,0.234879,0.031535,0.026531,0.234879,1,True,4
3,CatBoost,0.95614,0.967033,accuracy,0.002056,0.000522,1.036938,0.002056,0.000522,1.036938,1,True,5
4,LightGBMXT,0.95614,0.978022,accuracy,0.003411,0.000644,2.241947,0.003411,0.000644,2.241947,1,True,1
5,XGBoost,0.95614,0.967033,accuracy,0.005471,0.001204,0.539935,0.005471,0.001204,0.539935,1,True,9
6,ExtraTreesGini,0.95614,0.945055,accuracy,0.029971,0.027042,0.252663,0.029971,0.027042,0.252663,1,True,6
7,RandomForestGini,0.95614,0.945055,accuracy,0.031822,0.025361,0.324225,0.031822,0.025361,0.324225,1,True,3
8,ExtraTreesEntr,0.947368,0.934066,accuracy,0.030237,0.025805,0.24924,0.030237,0.025805,0.24924,1,True,7
9,LightGBMLarge,0.938596,0.956044,accuracy,0.006558,0.001417,5.278504,0.006558,0.001417,5.278504,1,True,11



Predictions:


256    0
428    1
501    0
363    0
564    0
Name: target, dtype: int64


Test Set Performance:


{'accuracy': 0.9298245614035088,
 'balanced_accuracy': np.float64(0.9345238095238095),
 'mcc': 0.8544784126535336,
 'roc_auc': np.float64(0.9940476190476191),
 'f1': 0.9428571428571428,
 'precision': 0.9705882352941176,
 'recall': 0.9166666666666666}

## 4. FLAML (Fast and Lightweight AutoML)

**FLAML** is a lightweight and efficient AutoML library from Microsoft Research. It is designed to find accurate models with low computational cost, making it ideal for scenarios where speed and resource efficiency are important.

### Key Features:
- Cost-effective hyperparameter optimization
- Budget-aware optimization strategies
- Supports classification, regression, time series, NLP
- Integrated with scikit-learn, XGBoost, LightGBM
- Fast and resource-efficient

In [15]:
from flaml import AutoML
from sklearn.metrics import accuracy_score

# Initialize AutoML
automl_flaml = AutoML()

# Define settings for the AutoML run
settings = {
    "time_budget": 120,  # Total time in seconds
    "metric": "accuracy",
    "task": "classification",
    "log_file_name": "flaml.log",
    "verbose": 1,
}

# Train the models
automl_flaml.fit(X_train=train.drop("target", axis=1), y_train=train["target"], **settings)

print("\nFLAML training complete!")

INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune
INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune



FLAML training complete!


In [16]:
# Print the best model and its score
print("\nBest ML model:", automl_flaml.model.estimator)
print("\nBest hyperparameters:", automl_flaml.best_config)
print("\nBest accuracy on validation data: {0:.4f}".format(1 - automl_flaml.best_loss))
print("Training duration of best run: {0:.4f} s".format(automl_flaml.best_config_train_time))

# Evaluate on the test set
y_pred = automl_flaml.predict(test.drop("target", axis=1))
test_accuracy = accuracy_score(test["target"], y_pred)
print("\nAccuracy on test data: {0:.4f}".format(test_accuracy))


Best ML model: LGBMClassifier(colsample_bytree=np.float64(0.5599839457811349),
               learning_rate=np.float64(0.5674032121832172), max_bin=255,
               min_child_samples=3, n_estimators=37, n_jobs=-1, num_leaves=6,
               reg_alpha=0.0009765625,
               reg_lambda=np.float64(0.00591154936356641), verbose=-1)

Best hyperparameters: {'n_estimators': 37, 'num_leaves': 6, 'min_child_samples': 3, 'learning_rate': np.float64(0.5674032121832172), 'log_max_bin': 8, 'colsample_bytree': np.float64(0.5599839457811349), 'reg_alpha': 0.0009765625, 'reg_lambda': np.float64(0.00591154936356641)}

Best accuracy on validation data: 0.9802
Training duration of best run: 0.0507 s

Accuracy on test data: 0.9561


## 5. Comparison Summary

This notebook demonstrated three powerful and popular open-source AutoML libraries: **H2O AutoML**, **AutoGluon**, and **FLAML**. Each library offers a unique approach to automated machine learning, providing different levels of abstraction, performance, and customization.

### Key Takeaways:

**H2O AutoML** is a robust and scalable platform that is great for enterprise use and for those who want a comprehensive AutoML solution with a user-friendly web interface. It provides extensive model selection and automatic ensembling, making it suitable for production environments.

**AutoGluon** excels in performance, often achieving state-of-the-art results with minimal configuration. Its support for multimodal data (tabular, text, and images) makes it a versatile choice for various machine learning tasks. The multi-layered ensembling approach often leads to superior predictive performance.

**FLAML** is a lightweight and efficient library that is perfect for scenarios where computational resources are a concern. Its focus on finding good models quickly through budget-aware optimization makes it a great tool for rapid prototyping and resource-constrained environments.

### When to Use Each Library:

- **Use H2O** when you need enterprise-grade scalability, distributed computing, or want a comprehensive platform with web UI support.
- **Use AutoGluon** when you want the best possible performance with minimal effort, or when working with multimodal data.
- **Use FLAML** when computational efficiency is critical, or when you need fast iteration during the prototyping phase.

By exploring these alternatives to PyCaret, students can gain a broader understanding of the AutoML landscape and choose the right tool for their specific needs and constraints.

In [17]:
# Cleanup (optional)
h2o.cluster().shutdown()

H2O session _sid_8609 closed.
