# Exercise09 : ML Pipeline

With AML pipeline, you can create ML workflows for the following purposes.

- You can build retraining pipeline for MLOps integration.
- You can build batch-scoring pipeline instead of real-time scoring in "[Exercise08 : Publish as a Web Service](./exercise08_publish_model.ipynb)".

ML pipeline can be invoked by the following methods. 

- Time-based schedule invocation
- On-demand invocation by the published endpoint (REST)
- Trigger-based invocation, such as, file change or other combined events (with Azure Event Grid, Azure Logic Apps, etc)

In this exercise, we create a training pipeline for MLOps integration. (See [here](https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/ai/mlops-python) for the reference architecture integrating with CI/CD tools.)

*back to [index](https://github.com/tsmatz/azureml-tutorial/)*

## 1. Get workspace settings

Before starting, you must read your configuration settings.<br>
When you involve in CI/CD utilities such as GitHub actions, you can also connect to ML workspace without login UI. (See "[Exercise01 : Prepare Config Settings](./exercise01_prepare_config.ipynb)".)

In [1]:
from azureml.core import Workspace
import azureml.core

ws = Workspace.from_config()

## 2. Prepare resources

### Create compute

Create your new AML compute for running pipeline.

When the pipeline is invoked, the compute will be started. When the pipeline is completed, this compute will be automatically scaled down to zero.

In [2]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

try:
    compute_target = ComputeTarget(workspace=ws, name='mycluster01')
    print('found existing:', compute_target.name)
except ComputeTargetException:
    print('creating new.')
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='Standard_D2_v2',
        min_nodes=0,
        max_nodes=1)
    compute_target = ComputeTarget.create(ws, 'mycluster01', compute_config)
    compute_target.wait_for_completion(show_output=True)

creating new.
InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


### Preapare data

Get dataset reference for input data.<br>
Run "[Exercise02 : Prepare Data](./exercise02_prepare_data.ipynb)" beforehand.

In [3]:
from azureml.core import Dataset

dataset = Dataset.get_by_name(ws, 'mnist_tfrecords_dataset', version='latest')

### Create environment

In [4]:
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.environment import Environment

# create environment
env = Environment('test-pipeline-env')
env.python.conda_dependencies = CondaDependencies.create(
    python_version="3.6",
    conda_packages=['tensorflow==1.15'])
env.docker.base_image = DEFAULT_CPU_IMAGE

# register environment to re-use later
env.register(workspace=ws)
## # speed up by using the existing environment
## env = Environment.get(ws, name='test-remote-gpu-env')

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20220208.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "test-pipeline-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-for

### Create run config

In [5]:
from azureml.core.runconfig import RunConfiguration

run_config = RunConfiguration()
run_config.environment = env

## 3. Create Train Step

Create a training script ```./pipeline_script/train.py```.<br>
In this script,

- Model directory name (sub folder name) is set in model info file (JSON), which is passed into the next step.
- Model is saved not only for prediction, but also evaluation.

In [6]:
import os
script_folder = './pipeline_script'
os.makedirs(script_folder, exist_ok=True)

In [7]:
%%writefile pipeline_script/train.py
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import sys
import os
import shutil
import argparse
import math
import json

import tensorflow as tf

FLAGS = None
batch_size = 100

#
# Define functions for Estimator
#

def _my_input_fn(filepath, num_epochs):
    # image - 784 (=28 x 28) elements of grey-scaled integer value [0, 1]
    # label - digit (0, 1, ..., 9)
    data_queue = tf.train.string_input_producer(
        [filepath],
        num_epochs = num_epochs) # data is repeated and it raises OutOfRange when data is over
    data_reader = tf.TFRecordReader()
    _, serialized_exam = data_reader.read(data_queue)
    data_exam = tf.parse_single_example(
        serialized_exam,
        features={
            'image_raw': tf.FixedLenFeature([], tf.string),
            'label': tf.FixedLenFeature([], tf.int64)
        })
    data_image = tf.decode_raw(data_exam['image_raw'], tf.uint8)
    data_image.set_shape([784])
    data_image = tf.cast(data_image, tf.float32) * (1. / 255)
    data_label = tf.cast(data_exam['label'], tf.int32)
    data_batch_image, data_batch_label = tf.train.batch(
        [data_image, data_label],
        batch_size=batch_size)
    return {'inputs': data_batch_image}, data_batch_label

def _get_input_fn(filepath, num_epochs):
    return lambda: _my_input_fn(filepath, num_epochs)

def _my_model_fn(features, labels, mode):
    # with tf.device(...): # You can set device if using GPUs

    # define network and inference
    # (simple 2 fully connected hidden layer : 784->128->64->10)
    with tf.name_scope('hidden1'):
        weights = tf.Variable(
            tf.truncated_normal(
                [784, FLAGS.first_layer],
                stddev=1.0 / math.sqrt(float(784))),
            name='weights')
        biases = tf.Variable(
            tf.zeros([FLAGS.first_layer]),
            name='biases')
        hidden1 = tf.nn.relu(tf.matmul(features['inputs'], weights) + biases)
    with tf.name_scope('hidden2'):
        weights = tf.Variable(
            tf.truncated_normal(
                [FLAGS.first_layer, FLAGS.second_layer],
                stddev=1.0 / math.sqrt(float(FLAGS.first_layer))),
            name='weights')
        biases = tf.Variable(
            tf.zeros([FLAGS.second_layer]),
            name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
    with tf.name_scope('softmax_linear'):
        weights = tf.Variable(
            tf.truncated_normal(
                [FLAGS.second_layer, 10],
                stddev=1.0 / math.sqrt(float(FLAGS.second_layer))),
        name='weights')
        biases = tf.Variable(
            tf.zeros([10]),
            name='biases')
        logits = tf.matmul(hidden2, weights) + biases
 
    # compute evaluation matrix
    predicted_indices = tf.argmax(input=logits, axis=1)
    if mode != tf.estimator.ModeKeys.PREDICT:
        label_indices = tf.cast(labels, tf.int32)
        accuracy = tf.metrics.accuracy(label_indices, predicted_indices)
        tf.summary.scalar('accuracy', accuracy[1]) # output to TensorBoard
 
        loss = tf.losses.sparse_softmax_cross_entropy(
            labels=labels,
            logits=logits)
 
    # define operations
    if mode == tf.estimator.ModeKeys.TRAIN:
        #global_step = tf.train.create_global_step()
        #global_step = tf.contrib.framework.get_or_create_global_step()
        global_step = tf.train.get_or_create_global_step()        
        optimizer = tf.train.GradientDescentOptimizer(
            learning_rate=FLAGS.learning_rate)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=global_step)
        return tf.estimator.EstimatorSpec(
            mode,
            loss=loss,
            train_op=train_op)
    if mode == tf.estimator.ModeKeys.EVAL:
        eval_metric_ops = {
            'accuracy': accuracy
        }
        return tf.estimator.EstimatorSpec(
            mode,
            loss=loss,
            eval_metric_ops=eval_metric_ops)
    if mode == tf.estimator.ModeKeys.PREDICT:
        probabilities = tf.nn.softmax(logits, name='softmax_tensor')
        predictions = {
            'classes': predicted_indices,
            'probabilities': probabilities
        }
        export_outputs = {
            'prediction': tf.estimator.export.PredictOutput(predictions)
        }
        return tf.estimator.EstimatorSpec(
            mode,
            predictions=predictions,
            export_outputs=export_outputs)

def _my_serving_input_fn():
    inputs = {'inputs': tf.placeholder(tf.float32, [None, 784])}
    return tf.estimator.export.ServingInputReceiver(inputs, inputs)

_my_evaluation_input_fn = (tf.estimator.experimental.build_raw_supervised_input_receiver_fn(
    {'inputs': tf.placeholder(dtype=tf.float32, shape=[None, 784])},
    tf.placeholder(dtype=tf.int32, shape=[None])))

#
# Main
#

parser = argparse.ArgumentParser()
parser.add_argument(
    '--data_folder',
    type=str,
    default='./data',
    help='Folder path for input data')
parser.add_argument(
    '--chkpoint_folder',
    type=str,
    default='./logs',  # AML experiments logs folder
    help='Folder path for checkpoint files')
parser.add_argument(
    '--model_temp',
    type=str,
    default='./outputs',
    help='Folder path for model temporary output')
parser.add_argument(
    '--model_folder',
    type=str,
    help='Folder path for model output')
parser.add_argument(
    '--model_info',
    type=str,
    help='JSON file path for saving model information')
parser.add_argument(
    '--learning_rate',
    type=float,
    default='0.07',
    help='Learning Rate')
parser.add_argument(
    '--first_layer',
    type=int,
    default='128',
    help='Neuron number for the first hidden layer')
parser.add_argument(
    '--second_layer',
    type=int,
    default='64',
    help='Neuron number for the second hidden layer')
FLAGS, unparsed = parser.parse_known_args()

# Clean checkpoint and model folder if exists
if os.path.exists(FLAGS.chkpoint_folder) :
    for file_name in os.listdir(FLAGS.chkpoint_folder):
        file_path = os.path.join(FLAGS.chkpoint_folder, file_name)
        if os.path.isfile(file_path):
            os.remove(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)
if os.path.exists(FLAGS.model_temp) :
    for file_name in os.listdir(FLAGS.model_temp):
        file_path = os.path.join(FLAGS.model_temp, file_name)
        if os.path.isfile(file_path):
            os.remove(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)

# Read TF_CONFIG
run_config = tf.estimator.RunConfig()

# Create Estimator
mnist_fullyconnected_classifier = tf.estimator.Estimator(
    model_fn=_my_model_fn,
    model_dir=FLAGS.chkpoint_folder,
    config=run_config)
train_spec = tf.estimator.TrainSpec(
    input_fn=_get_input_fn(os.path.join(FLAGS.data_folder, 'train.tfrecords'), 2),
    max_steps=60000 * 2 / batch_size)
eval_spec = tf.estimator.EvalSpec(
    input_fn=_get_input_fn(os.path.join(FLAGS.data_folder, 'test.tfrecords'), 1),
    steps=10000 * 1 / batch_size,
    start_delay_secs=0)

# Run training !
tf.estimator.train_and_evaluate(
    mnist_fullyconnected_classifier,
    train_spec,
    eval_spec
)

# Save model and parameters
model_folder = mnist_fullyconnected_classifier.experimental_export_all_saved_models(
    export_dir_base=FLAGS.model_temp,
    input_receiver_fn_map={
        tf.estimator.ModeKeys.EVAL: _my_evaluation_input_fn,
        tf.estimator.ModeKeys.PREDICT: _my_serving_input_fn
    })
print('current working directory is ', os.getcwd())

# Copy model to model_folder
model_folder_path = model_folder.decode("utf-8")
model_folder_name = os.path.basename(model_folder_path)
dest_path = os.path.join(FLAGS.model_folder, model_folder_name)
shutil.move(model_folder_path, dest_path)
print('model is saved ', dest_path)

# Save model info
model_info_dict = {'model_folder_name' : model_folder_name}
model_info_json = json.dumps(model_info_dict)
f = open(FLAGS.model_info,"w")
f.write(model_info_json)
f.close()

Writing pipeline_script/train.py


Create train step in pipeline.

In [8]:
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineData

pdata_model_folder = PipelineData(
    "model_folder",
    datastore=ws.get_default_datastore(),
    is_directory=True
)
pdata_model_info_for_eval = PipelineData(
    "model_info_for_eval",
    datastore=ws.get_default_datastore(),
    is_directory=False
)

train_step = PythonScriptStep(
    name="Train Model",
    script_name='train.py',
    compute_target=compute_target,
    source_directory='./pipeline_script',
    outputs=[pdata_model_folder, pdata_model_info_for_eval],
    arguments=[
        '--data_folder',
        dataset.as_mount(),
        '--model_folder',
        pdata_model_folder,
        '--model_info',
        pdata_model_info_for_eval
    ],
    runconfig=run_config,
    allow_reuse=True,
)

## 4. Create Evaluation Step

Create an evalution script ```pipeline_script/evaluate.py```.

In this step, the model accuracy is evaluated and if it's less than 92% (0.92), the pipeline will be canceled.

In [9]:
%%writefile pipeline_script/evaluate.py
import os
import argparse
import json

import tensorflow as tf

from azureml.core import Run

run = Run.get_context()

FLAGS = None
batch_size = 100

def _my_input_fn(filepath, num_epochs):
    # image - 784 (=28 x 28) elements of grey-scaled integer value [0, 1]
    # label - digit (0, 1, ..., 9)
    data_queue = tf.train.string_input_producer(
        [filepath],
        num_epochs = num_epochs) # data is repeated and it raises OutOfRange when data is over
    data_reader = tf.TFRecordReader()
    _, serialized_exam = data_reader.read(data_queue)
    data_exam = tf.parse_single_example(
        serialized_exam,
        features={
            'image_raw': tf.FixedLenFeature([], tf.string),
            'label': tf.FixedLenFeature([], tf.int64)
        })
    data_image = tf.decode_raw(data_exam['image_raw'], tf.uint8)
    data_image.set_shape([784])
    data_image = tf.cast(data_image, tf.float32) * (1. / 255)
    data_label = tf.cast(data_exam['label'], tf.int32)
    data_batch_image, data_batch_label = tf.train.batch(
        [data_image, data_label],
        batch_size=batch_size)
    return {'inputs': data_batch_image}, data_batch_label

def _get_input_fn(filepath, num_epochs):
    return lambda: _my_input_fn(filepath, num_epochs)

parser = argparse.ArgumentParser()
parser.add_argument(
    '--data_folder',
    type=str,
    default='./data',
    help='Folder path for input data')
parser.add_argument(
    '--model_folder',
    type=str,
    default='./model',
    help='Folder path for model base dir')
parser.add_argument(
    '--model_input_info',
    type=str,
    default='./model_input_info',
    help='File path for model evaluation info')
parser.add_argument(
    '--model_output_info',
    type=str,
    default='./model_output_info',
    help='File path for model registration info')
FLAGS, unparsed = parser.parse_known_args()

# Get model folder
f = open(FLAGS.model_input_info)
model_input_json = json.load(f)
model_folder_fullpath = os.path.join(FLAGS.model_folder, model_input_json['model_folder_name'])
f.close()

# Load model
est = tf.contrib.estimator.SavedModelEstimator(model_folder_fullpath)

# Evaluate !
eval_results = est.evaluate(
    input_fn=_get_input_fn(os.path.join(FLAGS.data_folder, 'test.tfrecords'), 1),
    steps=10000 * 1 / batch_size)

# Result check
# (If it doen't achieve threshold, stop running)
if eval_results['metrics/accuracy'] >= 0.92:
    # Evaluation Success
    print(
        "Evaluation has passed. "
        "Accuracy: {}, Loss: {}".format(
            eval_results['metrics/accuracy'], eval_results['loss']
        )
    )
    # Save model info
    model_info_dict = {
        'model_folder_name' : model_input_json['model_folder_name'],
        'evaluation' : True
    }
    model_output_json = json.dumps(model_info_dict)
    f = open(FLAGS.model_output_info,"w")
    f.write(model_output_json)
    f.close()

else:
    # Evaluation Failure
    print(
        "Evaluation has failed. "
        "Accuracy: {}, Loss: {}".format(
            eval_results['metrics/accuracy'], eval_results['loss']
        )
    )
    # Cancel pipeline
    run.parent.cancel()

Writing pipeline_script/evaluate.py


Create evaluation step in pipeline

In [10]:
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.core import PipelineData

pdata_model_info_for_reg = PipelineData(
    "model_info_for_reg",
    datastore=ws.get_default_datastore(),
    is_directory=False
)

eval_step = PythonScriptStep(
    name="Evaluate Model",
    script_name='evaluate.py',
    compute_target=compute_target,
    source_directory='./pipeline_script',
    inputs=[pdata_model_folder, pdata_model_info_for_eval],
    outputs=[pdata_model_info_for_reg],
    arguments=[
        '--data_folder',
        dataset.as_mount(),
        '--model_folder',
        pdata_model_folder,
        '--model_input_info',
        pdata_model_info_for_eval,
        '--model_output_info',
        pdata_model_info_for_reg
    ],
    runconfig=run_config,
    allow_reuse=False,
)

## 5. Create Model Registration Step

Create a model registration script ```pipeline_script/register_model.py```.

In this step, the model is registered in Azure Machine Learning model management.

In [11]:
%%writefile pipeline_script/register_model.py
import os
import argparse
import json
from azureml.core import Run, Dataset
from azureml.core.model import Model

run = Run.get_context()

parser = argparse.ArgumentParser()
parser.add_argument(
    '--model_name',
    type=str,
    default='mlops-test-model',
    help='Model name for registeration')
parser.add_argument(
    '--model_folder',
    type=str,
    default='./model',
    help='Folder path for model base dir')
parser.add_argument(
    '--model_input_info',
    type=str,
    default='./model_input_info',
    help='File path for model info')
FLAGS, unparsed = parser.parse_known_args()

f = open(FLAGS.model_input_info)

# Check evaluation result
model_input_json = json.load(f)
if model_input_json['evaluation']:
    # Get model folder
    model_folder_fullpath = os.path.join(FLAGS.model_folder, model_input_json['model_folder_name'])

    # Register model !
    model = Model.register(
        workspace=run.experiment.workspace,
        model_name=FLAGS.model_name,
        model_path=model_folder_fullpath)

f.close()

Writing pipeline_script/register_model.py


Create model registration step in pipeline.

In [12]:
reg_step = PythonScriptStep(
    name="Register Model",
    script_name='register_model.py',
    compute_target=compute_target,
    source_directory='./pipeline_script',
    inputs=[pdata_model_folder, pdata_model_info_for_reg],
    arguments=[
        '--model_name',
        'mlops-test-model',
        '--model_folder',
        pdata_model_folder,
        '--model_input_info',
        pdata_model_info_for_reg
    ],
    runconfig=run_config,
    allow_reuse=False,
)

## 6. Create and publish ML pipeline

In [13]:
from azureml.pipeline.core import Pipeline
import uuid

train_pipeline = Pipeline(workspace=ws, steps=[train_step, eval_step, reg_step])
train_pipeline._set_experiment_name
train_pipeline.validate()
published_pipeline = train_pipeline.publish(
    name="training-pipeline01",
    description="Model training/retraining pipeline",
    version=str(uuid.uuid4()),
)

Step Train Model is ready to be created [1546ec1e]Step Evaluate Model is ready to be created [bdd23be3]

Step Register Model is ready to be created [4359d555]
Created step Train Model [1546ec1e][31bf095e-ab3a-4ae2-abca-7b9cf5666cf0], (This step will run and generate new outputs)Created step Evaluate Model [bdd23be3][2e8252a1-d565-4909-9063-8230a33ab55a], (This step will run and generate new outputs)
Created step Register Model [4359d555][fd887073-1d98-4326-b9ec-420f655c104b], (This step will run and generate new outputs)



Let's go to [AML studio UI](https://ml.azure.com/) and you can then visually see pipeline graph. (See the following screenshot.)

![pipeline graph](https://tsmatz.files.wordpress.com/2021/10/20211027_ml_pipeline.jpg)

## 7. Run ML pipeline

When integrating with CI/CD tools, you can submit a new run of this publised pipeline using REST endpoint on demand.

In [14]:
# See endpoint url
published_pipeline.endpoint

'https://eastus.api.azureml.ms/pipelines/v1.0/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AzureML-Test-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/PipelineRuns/PipelineSubmit/3c8b4b5a-4930-472b-8b2c-c9d629050a3a'

Let's submit a new run using Python AML SDK.
See the progress and results in [AML studio UI](https://ml.azure.com/)

In [15]:
from azureml.core import Experiment

exp = Experiment(workspace=ws, name='pipeline_experiment01')
pipeline_run = exp.submit(published_pipeline)

Submitted PipelineRun 05f237e2-d4ad-4bb6-b868-d4e9495c486b
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/05f237e2-d4ad-4bb6-b868-d4e9495c486b?wsid=/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourcegroups/AzureML-Test-rg/workspaces/ws01&tid=72f988bf-86f1-41af-91ab-2d7cd011db47


Go to [AML studio UI](https://ml.azure.com/) and see whether model ```mlops-test-model``` is registered in your workspace.

## 8. Remove Compute

In [16]:
# Delete cluster (nodes) and remove from AML workspace
mycompute = AmlCompute(workspace=ws, name='mycluster01')
mycompute.delete()