# Build AML Pipeline with azureml modules

In this tutorial you will learn how to work with Azure ML module:

1. Setup enrivonment - install module CLI and module/pipeline SDK
2. Register a few sample modules into your aml workspace using CLI
3. Use module/pipeline SDK to create a pipeline with modules registered in step 2

## Prerequisite
* Install Azure CLI, please follow [the Azure CLI installation instructions](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install.

## Setup environment
* Install Azure CLI AML extension which includes the _module_ command group
* Install Azure ML SDK including the APIs to work with _module_ and _pipeline_

In [None]:
# Uninstall azure-cli-ml (the `az ml` commands)
!az extension remove -n azure-cli-ml 

# Install local version of azure-cli-ml (which includes `az ml module` commands)
!az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13766063/azure_cli_ml-0.1.0.13766063-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13766063 --yes 

In [None]:
# Verify the availability of `az ml module` commands
!az ml module -h

In [None]:
# Install azureml-sdk with Pipeline, Module
# Important! After install succeed, need to restart kernel

%config IPCompleter.greedy=True 
!pip install azureml-pipeline-wrapper==0.1.0.14253903 --extra-index-url https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/14253903 --user --upgrade

## Register azureml module

You can manage AML module through [azure-cli-ml](https://aka.ms/moduledoc) or [ml.azure.com](https://ml.azure.com/). <br>

Module could be registered from:
- local path
- public Github url
- Azure DevOps build artifacts

Azureml module support multiple module type:
- Basic python module
- Mpi module
- Parallel run module
- Hdi module (pending on backend support)

In [None]:
# you need to configure your ws information here

subscription_id = '74eccef0-4b8d-4f83-b5f9-fa100d155b22'
workspace_name = 'lisal-amlservice'
resource_group = 'lisal-dev'

# Specify available aml compute in workspace
pipeline_compute = "always-on-ds2v2"

In [None]:
# Configure your aml workspace 

!az login 
!az account set -s $subscription_id 
!az ml folder attach -w $workspace_name -g $resource_group 

In [None]:
# Register azureml modules from github url

!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/mpi_train.yaml 
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/score.yaml 
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/eval.yaml 
!az ml module register --spec-file=https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/compare2.yaml 

In [None]:
# list available custom module in aml workspace
!az ml module list -o table 

## Create pipeline
You can build pipeline through SDK experience, or drag-n-drop way through [Designer](https://ml.azure.com/visualinterface?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/kubeflow-demo/workspaces/kubeflow_ws_1&flight=cm,nml,newGraphDetail,newGraphAuthoring,all&tid=72f988bf-86f1-41af-91ab-2d7cd011db47) in workspace portal

The new SDK:
* Symplified the syntax to provide consistent experience with drag-n-drop
* Support intellisense and docstring, free you to work with dict all the time
* Support creating a pipeline with unpublished module


In [None]:
from azureml.core import Workspace, Run, Dataset
from azureml.pipeline.wrapper import Pipeline, Module, dsl

ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)

# get modules
train_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='MPI Train')
score_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Score')
eval_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Evaluate')
compare_module_func = Module.load(ws, namespace='microsoft.com/aml/samples', name='Compare 2 Models')

# if you have unpublished module in local or github, below function allow user to test as anounymous module
# compare_module_func = Module.load_from_yaml(ws, yaml_file='./CompareModdels/compare2.yaml')
# compare_module_func = Module.load_from_yaml(ws, yaml_file='https://github.com/lisagreenview/hello-aml-modules/blob/master/train-score-eval/compare2.yaml')

# get dataset
training_data_name = 'aml_module_training_data'
test_data_name = 'aml_module_test_data'

if training_data_name not in ws.datasets:
    print('Registering a training dataset for sample pipeline ...')
    train_data = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    train_data.register(workspace = ws, 
                              name = training_data_name, 
                              description = 'Training data (just for illustrative purpose)')
    print('Registerd')
else:
    train_data = ws.datasets[training_data_name]
    print('Training dataset found in workspace')

if test_data_name not in ws.datasets:
    print('Registering a test dataset for sample pipeline ...')
    test_data = Dataset.File.from_files(path=['https://dprepdata.blob.core.windows.net/demo/Titanic.csv'])
    test_data.register(workspace = ws, 
                          name = test_data_name, 
                          description = 'Test data (just for illustrative purpose)')
    print('Registered')
else:
    test_data = ws.datasets[test_data_name]    
    print('Test dataset found in workspace')


In [None]:
# define a sub pipeline
@dsl.pipeline(name = 'A sub pipeline including train/score/eval', 
              description = 'train model and evaluate model perf')
def training_pipeline(input_data, learning_rate):
    train = train_module_func(
        training_data=input_data, 
        max_epochs=5, 
        learning_rate=learning_rate)
   
    train.runsettings.configure(process_count_per_node = 2, node_count = 2)

    score = score_module_func(
        model_input=train.outputs.model_output, 
        test_data=test_data)

    eval = eval_module_func(scoring_result=score.outputs.score_output)
    
    return {'eval_output': eval.outputs.eval_output, 'model_output': train.outputs.model_output}

In [None]:
# define pipeline with sub pipeline
@dsl.pipeline(name = 'A dummy pipeline that trains multiple models and output the best one', 
              description = 'select best model trained with different learning rate',
              default_compute_target = pipeline_compute)
def dummy_automl_pipeline():
    train_and_evalute_model1 = training_pipeline(train_data, 0.01)
    train_and_evalute_model2 = training_pipeline(train_data, 0.02)
    
    compare = compare_module_func(
        model1=train_and_evalute_model1.outputs.model_output, 
        eval_result1=train_and_evalute_model1.outputs.eval_output,
        model2=train_and_evalute_model2.outputs.model_output,
        eval_result2=train_and_evalute_model2.outputs.eval_output
    )

    return {**compare.outputs}

# create a pipeline
pipeline = dummy_automl_pipeline()

In [None]:
# validate pipeline and visualize the graph
pipeline.validate()

In [None]:
# save as a draft
pipeline.save(experiment_name = 'pipeline-with-azureml-module')

In [None]:
# Submit a pipeline run
run = pipeline.submit_run(experiment_name = 'pipeline-with-azureml-module')
run.wait_for_completion()