### Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the time-consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.

### Traditional machine learning model development is resource-intensive and requires a significant time commitment to produce and compare dozens of models. With automated machine learning, you can accelerate the time it takes to get production-ready ML models with great ease and efficiency.

### Python has a growing ecosystem of open-source AutoML libraries. like : **H2O library**

# **H2O**
### H2O.ai's AutoML is an open-source machine learning library that automates the process of building and optimizing machine learning models. It supports tasks like classification, regression, time-series forecasting, and more. H2O AutoML automates tasks such as data preprocessing, feature engineering, model selection, and hyperparameter tuning.

# Steps for using **H2O** AutoML:

### 1- Install the H2O library: You can install the H2O library using the following command:

In [1]:
# pip install h2o

### 2- Initialize H2O: Before using H2O AutoML, you need to initialize the H2O environment.

In [2]:
# import h2o 
# h2o.init()

### 3- Load your dataset: H2O uses its own data structures called H2OFrame. You can load data from CSV, Pandas, or any other source and convert it to H2OFrame.

In [3]:
# data = h2o.import_file("your_data.csv")

### 4- Split the dataset: It's good practice to split the dataset into training and testing sets:

In [4]:
# train , test = data.split_frame(rations = [0.8] , seed = 1234 ) # 80% train, 20% test

### 5- Define the target and features: Specify the target (dependent) variable and the feature (independent) variables.

In [5]:
# target = 'target_column' # your target column 
# features = data.columns 
# features.remove(target)  # remove target from features 

### 6- Run AutoML: The H2OAutoML class automates the entire machine learning pipeline. You can set parameters like the maximum run time or the number of models to build

In [6]:
# from h2o.automl import H2OAutoML

# aml = H2OAutoML(max_runtime_secs = 600, # Run for 10 minutes
#                seed = 1234,
#                max_models = 10 # Set a limit for the number of models 
#                )
# aml.train(x = features , y = target , training_frame = train)

### 7- View the leaderboard: After training, you can view the leaderboard of models ranked by performance metrics.

In [7]:
# lb = aml.leaderboard
# print(lb)

### 8- Make predictions: You can use the best model (leader) from the AutoML process to make predictions on new data.

In [8]:
# predictions = aml.leader.predict(test)
# print(predictions)

### 9- Evaluate the performance of the model on the test set : 

In [9]:
# performance = aml.leader.model_performance(test)
# print(performance)

### 10- Save the model: You can save the best model to use later.

In [10]:
# model_path = h2o.save_model(model = aml.leader , path = 'best_model' , force = True)

# Example for **Classification** problem

In [11]:
# Step 1: Import and Initialize H2O
import h2o
h2o.init()

# Step 2: Load the dataset (Iris dataset in this case)
from h2o.datasets import load_dataset
data = load_dataset("iris")

# Step 3: Split the dataset into training and testing sets
train, test = data.split_frame(ratios=[0.8], seed=1234)  # 80% training, 20% testing

# Step 4: Define the target column and features
target = 'class'  # The target column (species in the Iris dataset)
features = data.columns
features.remove(target)  # Remove the target column from the features

# Step 5: Run H2O AutoML for Classification
from h2o.automl import H2OAutoML
aml = H2OAutoML(max_runtime_secs=300,  # Limit the runtime to 5 minutes
                seed=1234,  # Set a seed for reproducibility
                max_models=10)  # Optional: limit the number of models

aml.train(x=features, y=target, training_frame=train)

# Step 6: View the leaderboard of the models
lb = aml.leaderboard
print(lb)

# Step 7: Make predictions on the test set using the best model (the leader)
predictions = aml.leader.predict(test)
print(predictions)

# Step 8: Evaluate the performance of the model on the test set
performance = aml.leader.model_performance(test)
print(performance)

# Step 9: Save the best model for future use
model_path = h2o.save_model(model=aml.leader, path="best_classification_model", force=True)
print(f"Model saved to: {model_path}")

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.24" 2024-07-16; OpenJDK Runtime Environment (build 11.0.24+8-post-Ubuntu-1ubuntu320.04); OpenJDK 64-Bit Server VM (build 11.0.24+8-post-Ubuntu-1ubuntu320.04, mixed mode, sharing)
  Starting server from /opt/conda/lib/python3.10/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpp_rcins1
  JVM stdout: /tmp/tmpp_rcins1/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpp_rcins1/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.5
H2O_cluster_version_age:,1 month and 16 days
H2O_cluster_name:,H2O_from_python_unknownUser_3xay5h
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,7.500 Gb
H2O_cluster_total_cores:,4
H2O_cluster_allowed_cores:,4


ModuleNotFoundError: No module named 'h2o.datasets'

# Example for **Regression**

In [None]:
# Step 1: Import and Initialize H2O
import h2o
h2o.init()

# Step 2: Load the Boston Housing dataset
from h2o.datasets import load_dataset
data = load_dataset("boston")

# Step 3: Split the dataset into training and testing sets (80% train, 20% test)
train, test = data.split_frame(ratios=[0.8], seed=1234)

# Step 4: Define the target and features
target = 'medv'  # 'medv' is the target column (median value of owner-occupied homes)
features = data.columns
features.remove(target)  # Remove target from features

# Step 5: Run H2O AutoML for Regression
from h2o.automl import H2OAutoML

aml = H2OAutoML(max_runtime_secs=300,  # Limit the training time to 5 minutes
                max_models=10,         # Limit the number of models to 10
                seed=1234)             # Set seed for reproducibility
aml.train(x=features, y=target, training_frame=train)

# Step 6: View the leaderboard of models
lb = aml.leaderboard
print(lb)

# Step 7: Make predictions on the test set using the best model (the leader)
predictions = aml.leader.predict(test)
print(predictions)

# Step 8: Evaluate the performance of the model on the test set
performance = aml.leader.model_performance(test)
print(performance)

# Step 9: Save the best model for future use
model_path = h2o.save_model(model=aml.leader, path="best_regression_model", force=True)
print(f"Model saved to: {model_path}")