Workflow


1. Initialize Workspace & create workspace handle
2. Initialize
    - compute Cluster
    - Environment
3. Fetch Input Data
4. Create a .py script to Train & Register Model
5. Configure & Submit Command Job

Step 1: Initializing Workspace and creating Workspace handle

In [1]:
from azureml.core import Workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Initialize  workspace
ws = Workspace.from_config()  

# Get a handle to the workspace
credential = DefaultAzureCredential()  # authenticate
ml_client = MLClient( credential=credential,
                      subscription_id=ws.subscription_id,
                      resource_group_name=ws.resource_group,
                      workspace_name=ws.name,
                    )


Step 2: Initializing Compute Cluster & Environment

In [2]:
from azure.ai.ml.entities import AmlCompute

# Name assigned to the compute cluster
compute = "ML-Pipeline-Cluster"

try:
    # let's see if the compute target already exists
    cpu_cluster = ml_client.compute.get(compute)
    print(f"You already have a cluster named {compute}, we'll reuse it as is.")

except Exception:
    print("Creating a new cpu compute target...")
    cpu_cluster = AmlCompute(
        name=compute,
        type="amlcompute",
        size="STANDARD_DS3_V2",
        min_instances=0,
        max_instances=4,
        idle_time_before_scale_down=300,
        tier="Dedicated",
    )
    print(f"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}")
    
    # Now, we pass the object to MLClient's create_or_update method
    cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

You already have a cluster named ML-Pipeline-Cluster, we'll reuse it as is.


In [3]:
import os
from azure.ai.ml.entities import Environment

custom_env_name  = "ENV-SDKv2"
dependencies_dir = '../dependencies'
env = Environment( name=custom_env_name,
                   description="Evironment for python SDKv2 Execution",
                   conda_file=os.path.join(dependencies_dir, "conda.yaml"),
                   image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
                 )
env = ml_client.environments.create_or_update(env)

# GET ENVIRONMENT
# use 'label' parameter to get latest environment for example label='latest'
# use 'version' parameter to get specific version environment, for example version=2
env = ml_client.environments.get(name=custom_env_name, label='latest') 

print(f"Environment with name {env.name} is registered to workspace, the environment version is {env.version}")

Environment with name ENV-SDKv2 is registered to workspace, the environment version is 4


Step 3: Fetch Input Data

In [4]:
# Fetch Data
dataset_name = "pima-sdk-v2"
pima_data  = ml_client.data.get(name = dataset_name, label = "latest")



Step 4: Create .py Script for Training & Registering Model


Create .py Script with arguments for Training & Registering Model and save it in 'src folder'. we'll use this script in our next step to pass it as an argument to the command job

Step 5: Configure Command Job

**What is a command job?**

You'll create an Azure ML command job to train a model for credit default prediction. The command job is used to run a training script in a specified environment. You've already created the environment. Next you'll create the training script.

The training script handles the data preparation, training and registering of the trained model. In this tutorial, you'll create a Python training script.

Command jobs can be run from CLI, Python SDK, or studio interface. In this tutorial, you'll use the Azure ML Python SDK v2 to create and run the command job.

After running the training job, you'll able to deploy the model, then use it to produce a prediction.

**Configure the command** Now that you have a script that can perform the desired tasks, you'll use the general purpose **command** that can run command line actions. This command line action can be directly calling system commands or by running a script.

Here, you'll create input variables to specify the input data, split ratio, learning rate and registered model name. The command script will:

 - Use the environment created earlier - you can use the @latest notation to indicate the latest version of the environment when the command is run.
- Configure some metadata like display name, experiment name etc. An experiment is a container for all the iterations you do on a certain project. All the jobs submitted under the same experiment name would be listed next to each other in Azure ML studio.

- Configure the command line action itself - python main.py in this case. The inputs/outputs are accessible in the command via the ${{ ... }} notation.

- In this sample, we access the data from a file on the internet.

**access data in command Job** link

In [9]:
from azure.ai.ml import command
from azure.ai.ml import Input, Output


# Give Model name
model_name = "pima_model_SDKv2_01"

# configure job command
job = command(
    inputs=dict(data=Input(type= pima_data.type,
                path=f'azureml:{pima_data.name}:{pima_data.version}'),
                split_ratio=0.3,
                model_name = model_name),
    outputs=dict(model=Output(type="uri_folder", mode="rw_mount")),
    code="../src/",  # location of source code
    command="python train_SDKv2.py \
             --input_data ${{inputs.data}}  \
             --train_test_ratio ${{inputs.split_ratio}} \
             --registered_model_name ${{inputs.model_name}} \
             --model ${{outputs.model}}",
    environment=env,
    experiment_name="Pima_Experiments_Training_SDK_v2",
    compute=compute,
    display_name="pima_diabetes_sdkv2_prediction",
)

Submit the job

It's now time to submit the job to run in AzureML. This time you'll use create_or_update on ml_client.jobs.

[stable vs experimental](https://learn.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py) classes

In [10]:
ml_client.create_or_update(job)

[32mUploading src (0.01 MBs): 100%|██████████| 10655/10655 [00:00<00:00, 178783.41it/s]
[39m



Experiment,Name,Type,Status,Details Page
Pima_Experiments_Training_SDK_v2,serene_chicken_v07pzg919l,command,Starting,Link to Azure Machine Learning studio
