# Build AML Pipeline with azureml modules

In this tutorial you will learn how to work with Azure ML module:

1. Setup enrivonment - install module CLI and module/pipeline SDK
2. Register a few sample modules into your aml workspace using CLI
3. Use module/pipeline SDK to create a pipeline with modules registered in step 2

## Prerequisite
* Install Azure CLI, please follow [the Azure CLI installation instructions](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) to install.

## Setup environment
* Install Azure CLI AML extension which includes the _module_ command group
* Install Azure ML SDK including the APIs to work with _module_ and _pipeline_

In [3]:
# Uninstall azure-cli-ml (the `az ml` commands)
!az extension remove -n azure-cli-ml 

# Install local version of azure-cli-ml (which includes `az ml module` commands)
!az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13766063/azure_cli_ml-0.1.0.13766063-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13766063 --yes 

[91mThe extension azure-cli-ml is not installed.[0m
[K - Downloading ..[K - Validating ..[K - Installing ..[0m

In [4]:
# Verify the availability of `az ml module` commands
!az ml module -h


Group
    az ml module : Commands to manage modules.
        Refer to https://aka.ms/moduledoc for details.

Commands:
    disable             : Disable a module.
    download            : Download a module to a specified directory.
    enable              : Enable a module.
    list                : List modules in a workspace.
    register            : Create or upgrade a module.
    set-default-version : Set default version of a module.
    show                : Show detail information of a module.
    validate-spec       : Validate module spec file.

[1mFor more specific examples, use: az find "az ml module"[0m

[33m[1mPlease let us know how we are doing: [34mhttps://aka.ms/clihats[0m
[0m

In [1]:
# Install azureml-sdk with Pipeline, Module
# Important! After install succeed, need to restart kernel

%config IPCompleter.greedyaz=True 
!pip install --extra-index-url=https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/14455084  azureml-pipeline-wrapper==0.1.0.14455084 --user --upgrade

█████████████| 1.2MB 417kB/s 
Collecting azureml-train-automl-client~=0.1.0.14455084
[?25l  Downloading https://azuremlsdktestpypi.blob.core.windows.net/repo/CLI-SDK-Runners-Validation/14455084/azureml_train_automl_client-0.1.0.14455084-py3-none-any.whl?sv=2018-03-28&sr=b&sig=jVMqCkLQwGv0lDtYCSw9Elr%2FJ10QKyEK0twmaWDXKUY%3D&st=2020-05-19T15%3A24%3A14Z&se=2021-05-19T15%3A24%3A14Z&sp=rl (79kB)
[K     |████████████████████████████████| 81kB 6.7MB/s 
[?25hCollecting azureml-train-core~=0.1.0.14455084
[?25l  Downloading https://azuremlsdktestpypi.blob.core.windows.net/repo/CLI-SDK-Runners-Validation/14455084/azureml_train_core-0.1.0.14455084-py3-none-any.whl?sv=2018-03-28&sr=b&sig=AdNGMkk3st0MzMmSgqQUyeaYE53YiZc41obIUOZgdHk%3D&st=2020-05-19T15%3A24%3A14Z&se=2021-05-19T15%3A24%3A14Z&sp=rl (8.6MB)
[K     |████████████████████████████████| 8.6MB 437kB/s 
Collecting azureml-automl-core~=0.1.0.14455084
[?25l  Downloading https://azuremlsdktestpypi.blob.core.windows.net/repo/CLI-SDK-Runners

## Register AML Module

You can manage AML module through [azure-cli-ml](https://aka.ms/moduledoc) or [ml.azure.com](https://ml.azure.com/). <br>

Module could be registered from:
- local path
- public Github url
- Azure DevOps build artifacts

Azureml module support multiple module type:
- Basic python module
- Mpi module
- Parallel run module
- Hdi module (pending on backend support)

In [2]:
# you need to configure your ws information here

subscription_id = '74eccef0-4b8d-4f83-b5f9-fa100d155b22'
workspace_name = 'kubeflow_ws_1'
resource_group = 'kubeflow-demo'

# Specify available aml compute in workspace
pipeline_compute = "kubeflow-aks"

In [2]:
# Configure your aml workspace 

!az login 
!az account set -s $subscription_id 
!az ml folder attach -w $workspace_name -g $resource_group 

tId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "id": "10a1cf3f-9a29-4c5d-aef4-0f6a234190af",
    "isDefault": false,
    "managedByTenants": [],
    "name": "AML V1 INT 1",
    "state": "Enabled",
    "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "user": {
      "name": "jietong@microsoft.com",
      "type": "user"
    }
  },
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "id": "d128f140-94e6-4175-87a7-954b9d27db16",
    "isDefault": false,
    "managedByTenants": [
      {
        "tenantId": "2f4a9838-26b7-47ee-be60-ccc1fdec5953"
      }
    ],
    "name": "AML V1 Personal 1",
    "state": "Enabled",
    "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "user": {
      "name": "jietong@microsoft.com",
      "type": "user"
    }
  },
  {
    "cloudName": "AzureCloud",
    "homeTenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "id": "4f455bd0-f95a-4b7d-8d08-078611508e0b",
    "isDefault": false,
    "managedByT

In [3]:
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/sar_train.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/stratified_splitter.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/sar_score.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/recall_at_k.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/ndcg.yaml --set-as-default-version
!az ml module register --spec-file=https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml --set-as-default-version

[K{
  "contact": null,
  "defaultVersion": "1.1.1",
  "description": "SAR Train from Recommenders repo: https://github.com/Microsoft/Recommenders.",
  "helpDocument": null,
  "lastUpdatedOn": "2020-05-19T23:53:08.3602071Z",
  "name": "SAR Training",
  "namespace": "microsoft.com/cat",
  "registeredBy": "Jie Tong",
  "registeredOn": "2020-04-16T08:01:19.6800299Z",
  "status": "Active",
  "tags": [
    "Recommenders"
  ],
  "version": "1.1.1",
  "versions": [
    "1.1.0",
    "1.1.1"
  ],
  "yamlLink": "https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/sar_train.yaml"
}
[K[33mVersion 1.1.1 has already exist in module Stratified Splitter (namespace: microsoft.com/cat)[0m
{}
[K[91mAzure-cli-ml Version: 0.1.0.13766063

Learn more about optional inputs: https://aka.ms/azureml-module-optional-inputs[0m
[K{
  "contact": null,
  "defaultVersion": "1.1.1",
  "description": "Recall at K metric from Recommenders repo: https://git

## Create pipeline
You can build pipeline through SDK experience, or drag-n-drop way through [Designer](https://ml.azure.com/visualinterface?wsid=/subscriptions/74eccef0-4b8d-4f83-b5f9-fa100d155b22/resourcegroups/kubeflow-demo/workspaces/kubeflow_ws_1&flight=cm,nml,newGraphDetail,newGraphAuthoring,all&tid=72f988bf-86f1-41af-91ab-2d7cd011db47) in workspace portal

The new SDK:
* Symplified the syntax to provide consistent experience with drag-n-drop
* Support intellisense and docstring, free you to work with dict all the time
* Support creating a pipeline with unpublished module

In [3]:
from azureml.core import Workspace, Run, Dataset, Datastore
from azureml.pipeline.wrapper import Pipeline, Module, dsl

ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)

# get modules
sar_split = Module.load(ws, namespace='microsoft.com/cat', name='Stratified Splitter')
sar_train = Module.load(ws, namespace='microsoft.com/cat', name='SAR Training')
sar_score = Module.load(ws, namespace='microsoft.com/cat', name='SAR Scoring')
sar_map = Module.load(ws, namespace='microsoft.com/cat', name='MAP')
sar_ndcg = Module.load(ws, namespace='microsoft.com/cat', name='nDCG')
sar_recall = Module.load(ws, namespace='microsoft.com/cat', name='Recall at K')
sar_precision = Module.load(ws, namespace='microsoft.com/cat', name='Precision at K')

# get dataset
data_name = 'Movie_Ratings'

if data_name not in ws.datasets:
    global_datastore = Datastore(ws, name="azureml_globaldatasets")
    movie_ratings_data = Dataset.File.from_files(global_datastore.path('GenericCSV/Movie_Ratings')).register(workspace=ws, name='Movie_Ratings', description='movie rating data')
    print('Registerd')

movie_ratings_data = ws.datasets[data_name]
print('Training dataset found in workspace')

Training dataset found in workspace


In [4]:
#define a sub pipeline
@dsl.pipeline(name = 'recommender-evaluation', 
              description = 'recall, precision, ndcg, map')
def sar_eval(user_col, item_col, rating_col, precision_col, method, top_n, threshold, test_input, score_input):   
    recall = sar_recall(user_column=user_col, item_column=item_col, rating_column=rating_col, prediction_column=precision_col,
                       relevancy_method=method, top_k=top_n, threshold=threshold,
                       rating_true=test_input,
                       rating_pred=score_input)
    
    map = sar_map(user_column=user_col, item_column=item_col, rating_column=rating_col, prediction_column=precision_col,
                 relevancy_method=method, top_k=top_n, threshold=threshold,
                 rating_true=test_input,
                 rating_pred=score_input)
    
    precision = sar_precision(user_column=user_col, item_column=item_col, rating_column=rating_col, prediction_column=precision_col,
                 relevancy_method=method, top_k=top_n, threshold=threshold,
                 rating_true=test_input,
                 rating_pred=score_input)
    
    ndcg = sar_ndcg(user_column=user_col, item_column=item_col, rating_column=rating_col, prediction_column=precision_col,
                 relevancy_method=method, top_k=top_n, threshold=threshold,
                 rating_true=test_input,
                 rating_pred=score_input)    
    
    return {'recall': recall.outputs.score, 'precision': precision.outputs.score, 'ndcg': ndcg.outputs.score, 'map': map.outputs.score}

In [5]:
#define a pipeline
@dsl.pipeline(name = 'recommender pipeline', 
              description = 'recommender pipeline with SAR algo',
              default_compute_target = pipeline_compute)
def sar_pipeline(top_n, threshold):
    split = sar_split(input_path=movie_ratings_data, 
                      ratio=0.75, seed=42, 
                      user_column='UserId', item_column='MovieId')
    
    train = sar_train(input_path=split.outputs.output_train_data, 
                      user_column='UserId', item_column='MovieId', rating_column='Rating', timestamp_column='Timestamp',
                      time_decay=False, normalize=False)
    
    
    score = sar_score(trained_model=train.outputs.output_model, dataset_to_score=split.outputs.output_test_data,
                     score_type='Item recommendation', ranking_metric='Rating', top_k=top_n, sort_top_k=True)
    
    eval = sar_eval('UserId', 'MovieId', 'Rating', 'prediction', 
                    'top_k', 10, 1.0, 
                    split.outputs.output_test_data, score.outputs.score_result)
    
    return {**eval.outputs}

pipeline = sar_pipeline(10, 1.0)

In [8]:
# validate pipeline and visualize the graph
pipeline.validate()

{'result': 'validation passed', 'errors': []}

In [6]:
# save as a draft
pipeline.save(experiment_name = 'recommender-pipeline-sar', id = '767f99ef-9d0d-4518-b9aa-9faf0170b954')

Name,Id,Tags,Properties,Last Submitted Pipeline Run Id
recommender pipeline,767f99ef-9d0d-4518-b9aa-9faf0170b954,,,


In [6]:
run = pipeline.submit(experiment_name = 'recommender-pipeline-sar')
run.wait_for_completion()

{'runId': '95bf343c-426f-4215-8a85-2adbcf8b7876',
 'status': 'Completed',
 'startTimeUtc': '2020-05-21T03:10:47.238577Z',
 'endTimeUtc': '2020-05-21T03:21:11.548008Z',
 'properties': {'azureml.runsource': 'azureml.PipelineRun',
  'runSource': 'Designer',
  'runType': 'HTTP',
  'azureml.parameters': '{"top_n":"10"}'},
 'inputDatasets': [],
 'logFiles': {'azureml-logs/kfcloud.txt': 'https://kubeflowws17354384724.blob.core.windows.net/azureml/ExperimentRun/dcid.95bf343c-426f-4215-8a85-2adbcf8b7876/azureml-logs/kfcloud.txt?sv=2019-02-02&sr=b&sig=FmnNbs%2FGTw01PI%2BaZ64Z%2B1TllZ3A2R14fJ200OY4OaM%3D&st=2020-05-21T03%3A11%3A15Z&se=2020-05-21T11%3A21%3A15Z&sp=r',
  'azureml-logs/kubeflow.yaml': 'https://kubeflowws17354384724.blob.core.windows.net/azureml/ExperimentRun/dcid.95bf343c-426f-4215-8a85-2adbcf8b7876/azureml-logs/kubeflow.yaml?sv=2019-02-02&sr=b&sig=55Av6LTVlgZ%2BWfEN3DzqO5zjiAEwhtH%2Bwc13F2v6ra4%3D&st=2020-05-21T03%3A11%3A15Z&se=2020-05-21T11%3A21%3A15Z&sp=r',
  'azureml-logs/manifes