# Exercise09 : ML Pipeline

With AML pipeline, you can create ML workflows for the following purposes.

- You can build retraining pipeline for MLOps integration.
- You can build batch-scoring pipeline instead of real-time scoring in "[Exercise08 : Publish as a Web Service](./exercise08_publish_model.ipynb)".

ML pipeline can be invoked by the following methods. 

- Time-based schedule invocation
- On-demand invocation by the published endpoint (REST)
- Trigger-based invocation, such as, file change or other combined events (with Azure Event Grid, Azure Logic Apps, etc)

In this exercise, we create a simple training pipeline, which returns model metrics in top-level (pipeline's) outputs.

*back to [index](https://github.com/tsmatz/azureml-tutorial/)*

## 1. Get workspace settings

Before starting, you must read your configuration settings.<br>
When you involve in CI/CD utilities such as GitHub actions, you can also connect to ML workspace without login UI. (See "[Exercise01 : Prepare Config Settings](./exercise01_prepare_config.ipynb)".)

In [1]:
from azureml.core import Workspace
import azureml.core

ws = Workspace.from_config()

## 2. Prepare resources

### Create compute

Create your new AML compute for running pipeline.

When the pipeline is invoked, the compute will be started. When the pipeline is completed, this compute will be automatically scaled down to zero.

In [2]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

try:
    compute_target = ComputeTarget(workspace=ws, name='mycluster01')
    print('found existing:', compute_target.name)
except ComputeTargetException:
    print('creating new.')
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='Standard_D2_v2',
        min_nodes=0,
        max_nodes=1)
    compute_target = ComputeTarget.create(ws, 'mycluster01', compute_config)
    compute_target.wait_for_completion(show_output=True)

found existing: mycluster01


### Preapare data

Get dataset reference for input data.<br>
Run "[Exercise02 : Prepare Data](./exercise02_prepare_data.ipynb)" beforehand.

In [3]:
from azureml.core import Dataset

dataset = Dataset.get_by_name(ws, 'mnist_dataset', version='latest')

### Create environment

In [4]:
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.environment import Environment

# create environment
env = Environment('test-pipeline-env')
env.python.conda_dependencies = CondaDependencies.create(
    python_version="3.8",
    pip_packages=['tensorflow==2.10.0'])
env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04'
# You can also use default CPU image (azureml.core.runconfig.DEFAULT_CPU_IMAGE)

# register environment to re-use later
env.register(workspace=ws)
## # speed up by using the existing environment
## env = Environment.get(ws, name='test-remote-gpu-env')

{
    "assetId": "azureml://locations/eastus/workspaces/9f284df9-d636-40ed-bae1-0303c21d4b4f/environments/test-pipeline-env/versions/1",
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "buildContext": null,
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "test-pipeline-env",
    "python"

### Create run config

In [5]:
from azureml.core.runconfig import RunConfiguration

run_config = RunConfiguration()
run_config.environment = env

## 3. Create Train Step

In this example, I create a pipeline for model training, evaluation, and model registration.<br>
In this pipeline, the following steps will be executed.

1. The model is trained.
2. The model accuracy is evaluated. The model metrics is set as pipeline's output.

Each source code will then be saved as follows.

- training script ```./pipeline_script/train.py```
- evaluation script ```./pipeline_script/evaluate.py```

In [6]:
import os
script_folder = './pipeline_script'
os.makedirs(script_folder, exist_ok=True)

In [7]:
%%writefile pipeline_script/train.py
import os
import argparse
import tensorflow as tf

# parse arguments
parser = argparse.ArgumentParser()
parser.add_argument(
    "--data_folder",
    type=str,
    default="./data",
    help="Folder path for input data")
parser.add_argument(
    "--model_folder",
    type=str,
    default="./outputs",  # AML experiments outputs folder
    help="Folder path for model output")
parser.add_argument(
    "--learning_rate",
    type=float,
    default="0.001",
    help="Learning Rate")
parser.add_argument(
    "--first_layer",
    type=int,
    default="128",
    help="Neuron number for the first hidden layer")
parser.add_argument(
    "--second_layer",
    type=int,
    default="64",
    help="Neuron number for the second hidden layer")
parser.add_argument(
    "--epochs_num",
    type=int,
    default="6",
    help="Number of epochs")
FLAGS, unparsed = parser.parse_known_args()

# build model
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(FLAGS.first_layer, activation="relu"),
    tf.keras.layers.Dense(FLAGS.second_layer, activation="relu"),
    tf.keras.layers.Dense(10)
])
model.compile(
    optimizer=tf.keras.optimizers.Adam(FLAGS.learning_rate),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

# run training
train_data_path = os.path.join(FLAGS.data_folder, "train")
train_data = tf.data.experimental.load(train_data_path)
model.fit(
    train_data.shuffle(1000).batch(128).prefetch(tf.data.AUTOTUNE),
    epochs=FLAGS.epochs_num
)

# save model and variables
model_path = os.path.join(FLAGS.model_folder, "mnist_tf_model")
model.save(model_path)
print("current working directory : ", os.getcwd())
print("model folder : ", model_path)

Writing pipeline_script/train.py


Create train step in pipeline.

In [8]:
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineData

pdata_model_folder = PipelineData(
    "model_folder",
    datastore=ws.get_default_datastore(),
    is_directory=True
)

train_step = PythonScriptStep(
    name="Train Model",
    script_name='train.py',
    source_directory='./pipeline_script',
    compute_target=compute_target,
    outputs=[pdata_model_folder],
    arguments=[
        '--data_folder',
        dataset.as_mount(),
        '--model_folder',
        pdata_model_folder,
    ],
    runconfig=run_config,
    allow_reuse=True,
)

## 4. Create Evaluation Step

Create an evalution script ```pipeline_script/evaluate.py```.

In this step, the model is evaluated and model metrics (accuracy and loss) is saved.

In [9]:
%%writefile pipeline_script/evaluate.py
import os
import json
import argparse
import tensorflow as tf

parser = argparse.ArgumentParser()
parser.add_argument(
    '--data_folder',
    type=str,
    default='./data',
    help='Folder path for input data')
parser.add_argument(
    '--model_folder',
    type=str,
    default='./model',
    help='Folder path for model base dir')
parser.add_argument(
    '--output_info',
    type=str,
    default='./output_info',
    help='File path for model registration info')
FLAGS, unparsed = parser.parse_known_args()

# load data
test_data_path = os.path.join(FLAGS.data_folder, "test")
test_data = tf.data.experimental.load(test_data_path)

# load model
model_folder_path = os.path.join(FLAGS.model_folder, "mnist_tf_model")
loaded_model = tf.keras.models.load_model(model_folder_path)

# evaluate
results = loaded_model.evaluate(test_data.batch(128))
print("Loss: {}, Accuracy: {}".format(results[0], results[1]))

# write result (metrics)
output_info = {
    "accuracy" : float(results[1]),
    "loss" : float(results[0])
}
output_json = json.dumps(output_info)
f = open(FLAGS.output_info,"w")
f.write(output_json)
f.close()

Writing pipeline_script/evaluate.py


Create evaluation step in pipeline

In [10]:
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineData

pdata_model_info = PipelineData(
    "model_info",
    datastore=ws.get_default_datastore(),
    is_directory=False
)

eval_step = PythonScriptStep(
    name="Evaluate Model",
    script_name='evaluate.py',
    source_directory='./pipeline_script',
    compute_target=compute_target,
    inputs=[pdata_model_folder],
    outputs=[pdata_model_info],
    arguments=[
        '--data_folder',
        dataset.as_mount(),
        '--model_folder',
        pdata_model_folder,
        '--output_info',
        pdata_model_info
    ],
    runconfig=run_config,
    allow_reuse=False,
)

## 6. Create and publish ML pipeline

In [11]:
from azureml.pipeline.core import Pipeline
import uuid

train_pipeline = Pipeline(workspace=ws, steps=[train_step, eval_step])
train_pipeline._set_experiment_name
train_pipeline.validate()
published_pipeline = train_pipeline.publish(
    name="training-pipeline01",
    description="Model training/evaluation",
    version=str(uuid.uuid4()),
)

Step Train Model is ready to be created [c35d4cca]Step Evaluate Model is ready to be created [a193c662]

Created step Train Model [c35d4cca][337deae3-d7c2-4ec9-9ce2-d65e38ae68e9], (This step will run and generate new outputs)Created step Evaluate Model [a193c662][73939d6f-d1cc-4b76-b72f-f3cd396c6fc7], (This step will run and generate new outputs)



## 7. Run ML pipeline

When integrating with CI/CD tools, you can submit a new run of this publised pipeline using REST endpoint on demand.

In [12]:
# See endpoint url
published_pipeline.endpoint

'https://eastus.api.azureml.ms/pipelines/v1.0/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/rg-AML/providers/Microsoft.MachineLearningServices/workspaces/ws01/PipelineRuns/PipelineSubmit/d4cffd52-48cb-4e4d-a0f8-e2a99721f6ad'

Let's submit a new run using Python AML SDK.

In [13]:
from azureml.core import Experiment

exp = Experiment(workspace=ws, name='pipeline_experiment01')
pipeline_run = exp.submit(published_pipeline)

Submitted PipelineRun 4157af11-73e3-4663-808c-70236f33bdde
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/4157af11-73e3-4663-808c-70236f33bdde?wsid=/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourcegroups/rg-AML/workspaces/ws01&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


Go to [AML studio UI](https://ml.azure.com/) and see the progress and results.

![Pipeline results](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20220225_Experiment_Pipeline.jpg)

## 8. Remove Compute

In [16]:
# Delete cluster (nodes) and remove from AML workspace
mycompute = AmlCompute(workspace=ws, name='mycluster01')
mycompute.delete()