Check azure-ai-ml

In [1]:
pip show azure-ai-ml

Name: azure-ai-ml
Version: 1.29.0
Summary: Microsoft Azure Machine Learning Client Library for Python
Home-page: https://github.com/Azure/azure-sdk-for-python
Author: Microsoft Corporation
Author-email: azuresdkengsysadmins@microsoft.com
License: MIT License
Location: /home/tshen/dp-100/mslearn-azure-ml/.venv/lib/python3.12/site-packages
Requires: azure-common, azure-core, azure-mgmt-core, azure-monitor-opentelemetry, azure-storage-blob, azure-storage-file-datalake, azure-storage-file-share, colorama, isodate, jsonschema, marshmallow, pydash, pyjwt, pyyaml, strictyaml, tqdm, typing-extensions
Required-by: 
Note: you may need to restart the kernel to use updated packages.


Connect to your workspace

In [2]:
#import required libraries
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
#Enter    details    of    your    Azure    Machine    Learning    workspace
subscription_id = 'c6cdfd9c-3767-4be7-a343-0acd27ddefb9'
resource_group = 'rg-tshen-0917'
workspace = 'ml-ws-tshen-0917'
#connect to the workspace
ml_client    =    MLClient(DefaultAzureCredential(),    subscription_id,
resource_group,    workspace)
ml_client

MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x7008d3f99700>,
         subscription_id=c6cdfd9c-3767-4be7-a343-0acd27ddefb9,
         resource_group_name=rg-tshen-0917,
         workspace_name=ml-ws-tshen-0917)

Verify GPU

In [16]:
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

2025-09-20 13:47:17.408404: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


Custom tracking with MLflow
- The regularization rate as a parameter. 
- The accuracy and AUC as metrics.
- The plotted ROC curve as an artifact.

In [4]:
import os
# create a folder for the script files
script_folder = 'src'
os.makedirs(script_folder, exist_ok=True)
print(script_folder, 'folder created')

src folder created


In [5]:
%%writefile $script_folder/train-model-mlflow.py
# import libraries
import tensorflow as tf
import mlflow
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

def main(args):
    # read data
    df = get_data(args.training_data)

    # split data
    X_train, X_test, y_train, y_test = split_data(df)

    # train model
    model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)

    # evaluate model
    eval_model(model, X_test, y_test)

# function that reads the data
def get_data(path):
    print("Reading data...")
    df = pd.read_csv(path)
    
    return df

# function that splits the data
def split_data(df):
    print("Splitting data...")
    X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness',
    'SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

    return X_train, X_test, y_train, y_test

# function that trains the model
def train_model(reg_rate, X_train, X_test, y_train, y_test):
    mlflow.log_param("Regularization rate", reg_rate)
    print("Training model...")
    model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)

    return model

# function that evaluates the model
def eval_model(model, X_test, y_test):
    # calculate accuracy
    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)
    print('Accuracy:', acc)
    mlflow.log_metric("Accuracy", acc)

    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test,y_scores[:,1])
    print('AUC: ' + str(auc))
    mlflow.log_metric("AUC", auc)

    # plot ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
    fig = plt.figure(figsize=(6, 4))
    # Plot the diagonal 50% line
    plt.plot([0, 1], [0, 1], 'k--')
    # Plot the FPR and TPR achieved by our model
    plt.plot(fpr, tpr)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC Curve')
    plt.savefig("ROC-Curve.png")
    mlflow.log_artifact("ROC-Curve.png")    

def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--training_data", dest='training_data',
                        type=str)
    parser.add_argument("--reg_rate", dest='reg_rate',
                        type=float, default=0.01)

    # parse args
    args = parser.parse_args()

    # return args
    return args

# run script
if __name__ == "__main__":
    # add space in logs
    print("\n\n")
    print("*" * 60)

    # parse args
    args = parse_args()

    # run main function
    main(args)

    # add space in logs
    print("*" * 60)
    print("\n\n")


Writing src/train-model-mlflow.py


Now, you can submit the script as a command job.

Run the cell below to train the model. 

In [6]:
%%writefile $script_folder/submit-job.py
from azure.ai.ml import command

# configure job

job = command(
    code="./src",
    command="python train-model-mlflow.py --training_data diabetes.csv",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="diabetes-train-mlflow",
    experiment_name="diabetes-training", 
    tags={"model_type": "LogisticRegression"}
    )

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

Writing src/submit-job.py


In the Studio, navigate to the **diabetes-train-mlflow** job to explore the overview of the command job you ran:

- Find the logged parameters in the **Overview** tab, under **Params**.
- Find the logged metrics in the **Metrics** tab.
- Find the logged artifacts in the **Images** tab (specifically for images), and in the **Outputs + logs** tab (all files).

Autologging with MLflow

In [19]:
%%writefile $script_folder/train-model-autolog.py
# import libraries
import tensorflow as tf
import mlflow
import argparse
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

def main(args):
    # enable autologging
    mlflow.autolog()

    # read data
    df = get_data(args.training_data)

    # split data
    X_train, X_test, y_train, y_test = split_data(df)

    # train model
    if args.gpu.lower() == "yes":
        model = train_model_gpu(args.reg_rate, X_train, X_test, y_train, y_test)
    else:   
        model = train_model(args.reg_rate, X_train, X_test, y_train, y_test)

    # evaluate model
    eval_model(model, X_test, y_test)

# function that reads the data
def get_data(path):
    print("Reading data...")
    df = pd.read_csv(path)

    return df

# function that splits the data
def split_data(df):
    print("Splitting data...")
    X, y = df[['Pregnancies','PlasmaGlucose','DiastolicBloodPressure','TricepsThickness',
    'SerumInsulin','BMI','DiabetesPedigree','Age']].values, df['Diabetic'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

    return X_train, X_test, y_train, y_test

# function that trains the model
def train_model(reg_rate, X_train, X_test, y_train, y_test):
    print("Training model...")
    model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)

    return model

# function that trains the model
def train_model_gpu(reg_rate, X_train, X_test, y_train, y_test):
    print("Training model...")
    
    with tf.device('/GPU:0'):
        model = LogisticRegression(C=1/reg_rate, solver="liblinear").fit(X_train, y_train)
        return model

# function that evaluates the model
def eval_model(model, X_test, y_test):
    # calculate accuracy
    y_hat = model.predict(X_test)
    acc = np.average(y_hat == y_test)
    print('Accuracy:', acc)

    # calculate AUC
    y_scores = model.predict_proba(X_test)
    auc = roc_auc_score(y_test,y_scores[:,1])
    print('AUC: ' + str(auc))

    # plot ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
    fig = plt.figure(figsize=(6, 4))
    # Plot the diagonal 50% line
    plt.plot([0, 1], [0, 1], 'k--')
    # Plot the FPR and TPR achieved by our model
    plt.plot(fpr, tpr)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC Curve')
    plt.savefig("ROC-Curve.png") 

def parse_args():
    # setup arg parser
    parser = argparse.ArgumentParser()

    # add arguments
    parser.add_argument("--training_data", dest='training_data', type=str)
    parser.add_argument("--reg_rate", dest='reg_rate', type=float, default=0.01)
    parser.add_argument("--gpu", dest='gpu', type=str, default="no")

    # parse args
    args = parser.parse_args()

    # return args
    return args

# run script
if __name__ == "__main__":
    # add space in logs
    print("\n\n")
    print("*" * 60)

    # parse args
    args = parse_args()

    # run main function
    main(args)

    # add space in logs
    print("*" * 60)
    print("\n\n")


Overwriting src/train-model-autolog.py


Now, you can submit the script as a command job.

Run the cell below to train the model. 

In [11]:
%%writefile $script_folder/submit-job-2.py
from azure.ai.ml 
import command

# configure job

job = command(
    code="./src",
    command="python train-model-autolog.py --training_data diabetes.csv",
    environment="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu@latest",
    compute="aml-cluster",
    display_name="diabetes-train-autolog",
    experiment_name="diabetes-training"
    )

# submit job
returned_job = ml_client.create_or_update(job)
aml_url = returned_job.studio_url
print("Monitor your job at", aml_url)

Writing src/submit-job-2.py


Run train-model-autolog.py

In [34]:
!python src/train-model-autolog.py --training_data src/diabetes.csv --reg_rate 0.01 --gpu no

2025-09-20 17:57:46.177638: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.



************************************************************
2025/09/20 17:57:50 INFO mlflow.tracking.fluent: Autologging successfully enabled for keras.
2025/09/20 17:57:51 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2025/09/20 17:57:51 INFO mlflow.tracking.fluent: Autologging successfully enabled for tensorflow.
Reading data...
Splitting data...
Training model...
2025/09/20 17:57:51 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'bbf24de8781f4596ad0c755b1c1f8cc3', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
Accuracy: 0.774
AUC: 0.84

Use MLflow to view and search for experiments

In [27]:
import mlflow
experiments = mlflow.search_experiments()
for exp in experiments:
    print(exp.name)

Default


To retrieve a specific experiment, you can get it by its name:

In [29]:
# experiment_name = "diabetes-training"
experiment_name = "Default"
exp = mlflow.get_experiment_by_name(experiment_name)
print(exp)

<Experiment: artifact_location='file:///home/tshen/dp-100/mslearn-azure-ml/Labs/08/mlruns/0', creation_time=1758392893042, experiment_id='0', last_update_time=1758392893042, lifecycle_stage='active', name='Default', tags={}>


Using an experiment name, you can retrieve all jobs of that experiment:

In [30]:
mlflow.search_runs(exp.experiment_id)

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.training_roc_auc,metrics.training_log_loss,metrics.training_precision_score,metrics.training_recall_score,...,params.warm_start,params.dual,tags.mlflow.source.type,tags.estimator_name,tags.mlflow.user,tags.mlflow.autologging,tags.mlflow.runName,tags.mlflow.source.git.commit,tags.estimator_class,tags.mlflow.source.name
0,8cae1a96afea4f6f8a9dedc66430fad9,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:06:59.770000+00:00,2025-09-20 19:07:03.525000+00:00,0.861918,0.434346,0.786392,0.791429,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,bittersweet-ray-169,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
1,0070940da63d4067bb13708b963a0e2e,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:05:38.999000+00:00,2025-09-20 19:05:42.734000+00:00,0.861918,0.434346,0.786392,0.791429,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,nervous-flea-83,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
2,937ac8fde54e4cfe9557de8daa8c806f,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:05:04.676000+00:00,2025-09-20 19:05:08.552000+00:00,0.861918,0.434346,0.786392,0.791429,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,mysterious-foal-715,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
3,e16e5e09b70645d2aa5fb5a377572de9,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:04:21.143000+00:00,2025-09-20 19:04:24.979000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,wise-lark-607,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
4,727db86023374ae0a7b4f08f0786ce90,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:02:52.092000+00:00,2025-09-20 19:02:55.704000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,receptive-gnu-202,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
5,e32f477154c646ca822981abf99815e9,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:01:49.421000+00:00,2025-09-20 19:01:53.105000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,enchanting-midge-632,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
6,0c4fb6b3b87945608cd6c7fdb2031eed,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:01:17.684000+00:00,2025-09-20 19:01:24.051000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,lyrical-grub-160,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
7,293c1743d28440069311fa9cf184e8b6,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 18:40:48.112000+00:00,2025-09-20 18:40:51.793000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,nosy-asp-409,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
8,1c01a12f4a2d4806b4be02107c0d4883,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 18:38:12.530000+00:00,2025-09-20 18:38:16.481000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,unruly-lynx-653,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
9,45aa6008df654fc99c8e458694d5ed7a,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 18:36:16.070000+00:00,2025-09-20 18:36:23.166000+00:00,0.861916,0.434346,0.786236,0.791286,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,selective-cow-729,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py


To more easily compare job runs and outputs, you can configure the search to order the results. For example, the following cell orders the results by `start_time`, and only shows a maximum of `2` results: 

In [31]:
mlflow.search_runs(exp.experiment_id, order_by=["start_time DESC"], max_results=2)

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.training_roc_auc,metrics.training_log_loss,metrics.training_precision_score,metrics.training_recall_score,...,params.warm_start,params.dual,tags.mlflow.source.type,tags.estimator_name,tags.mlflow.user,tags.mlflow.autologging,tags.mlflow.runName,tags.mlflow.source.git.commit,tags.estimator_class,tags.mlflow.source.name
0,8cae1a96afea4f6f8a9dedc66430fad9,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:06:59.770000+00:00,2025-09-20 19:07:03.525000+00:00,0.861918,0.434346,0.786392,0.791429,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,bittersweet-ray-169,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py
1,0070940da63d4067bb13708b963a0e2e,0,FINISHED,file:///home/tshen/dp-100/mslearn-azure-ml/Lab...,2025-09-20 19:05:38.999000+00:00,2025-09-20 19:05:42.734000+00:00,0.861918,0.434346,0.786392,0.791429,...,False,False,LOCAL,LogisticRegression,tshen,sklearn,nervous-flea-83,454125fc77067fba1afb9336be83f9786058e06e,sklearn.linear_model._logistic.LogisticRegression,src/train-model-autolog.py


You can even create a query to filter the runs. Filter query strings are written with a simplified version of the SQL `WHERE` clause. 

To filter, you can use two classes of comparators:

- Numeric comparators (metrics): =, !=, >, >=, <, and <=.
- String comparators (params, tags, and attributes): = and !=.

Learn more about [how to track experiments with MLflow](https://learn.microsoft.com/azure/machine-learning/how-to-track-experiments-mlflow).

In [32]:
query = "metrics.AUC > 0.8 and tags.model_type = 'LogisticRegression'"
mlflow.search_runs(exp.experiment_id, filter_string=query)

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time
