Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Develop your own Azure Machine Learning component using component CLI & dsl.component

In this notebook, you learn how to create a simple machine learning component from scratch and use it in an ML pipeline.

* Init a component from an existing python script with component cli;
* Run local test(optional), to make sure the code works correctly;
* Register the component to your Machine learning workspace;



## Prerequisites
* Install azure cli with azure-cli-ml extension following the [instructions here](setup-environment.ipynb).


## Setup workspace

Login to azure with cli and set the default workspace using `az ml folder attach` command.

After this operation, the workspace could be retrived with the `Workspace.from_config()` for SDK usage.

In [None]:
# NOTE: Update the following information with your environment

SUBSCRIPTION_ID = '<your subscription ID>'
WORKSPACE_NAME = '<your workspace name>'
RESOURCE_GROUP_NAME = '<your resource group>'

In [None]:
!az login -o none # When the first time you use az cli, you need to login to make sure you could access you sub and workspace
!az account set -s $SUBSCRIPTION_ID 
!az ml folder attach -w $WORKSPACE_NAME -g $RESOURCE_GROUP_NAME 

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()

## Init a component with an existing python script

We will init a component from an existing python script with azureml cli. The script is in [components/creation/prepare_data](components/creation/prepare_data). Print the script to see what's in it.

In [None]:
with open('components/creation/prepare_data/prepare_data.py') as fin:
    print(fin.read())

Now init a dsl component with `az ml component init` command. In following cell, `--source` defines the source script (need to parse command-line arguments with argparse) to initalize the dsl component from, `--inputs/outputs` defines the input and output from the source script's argparse parameters, `source-dir` defines the source directory to generate the initalized files, `--entry-only` defines it only generate the entry script.

Run `az ml component init --help` to learn more how to use this command. 

In [None]:
!az ml component init --source prepare_data.py --inputs input_data --outputs output_data --source-dir components/creation/prepare_data --entry-only

Note: This toturial run `az ml component init` command with `--entry-only` arugment, which means it will only generate the entry file. Without `--entry-only`, following files will be generated to help component development/test. 

|File name|Purpose|
| -----------| ----------- |
|prepare_data_entry.py|component entry script wrapped by dsl component decorator, which defines the interface and main logic of the component|
|prepare_data.spec.yaml|component spec in yaml format, use it to register the component|
|conda.yaml|YAML file that manages conda environment. It's refferneced by component yaml sepc|
|prepare_data_entry.test.ipynb| Test notebook for this component. It contains sample code on how to test the component|
|config.json|A blank AML workspace config.josn file. Copy your AML workspace config here|
|./tests|The template test folder. It contains a template unit test script for the component.|
|./data|The placeholder data folder to put local test data.|


The `az ml init --source` required source script parse the command-line using argparse. If your script is not written by argparse, we suggest to init a template dsl component (by not defining source), and edit the component script manually. 

Print the generated component entry file

In [None]:
with open('components/creation/prepare_data/prepare_data_entry.py') as fin:
    print(fin.read())

For this toturial, the inialized entry script parse all component arguments correctly, so there is no need to further edit the entry script. For a different component, you may adjust the entry script in following parts:

* Put the meta information in `@dsl.component()`, see [component spec](https://aka.ms/azureml-component-specs) for more details;
* Adjust the function args to indicate the component interface, use InputFile/InputDirectory/OutputFile/OutputDirectory to indicate the input/output, others are parameters;
* Adjust the logic in function body, this is the main logic for the component;

In AzureML, the script is executed with command line args, in the main part, we call "componentExecutor(prepare_data).execute(sys.argv)" to parse the command line to function args and pass the args to the function.
Thus we cannot change this line to make sure the component runs correctly in AzureML.

## Run the component in local 

After we developed the component, to ensure it works correctly, we could load this component and run the component in local.

`component.from_func` can load a component from a Python fucntion. 
`component.from_yaml` can load a component object from yaml spec. 

This two methods are designed for load component quickly for test purpose.

In [None]:
from azureml.pipeline.wrapper import component
import sys
sys.path.insert(0, 'components/creation/prepare_data/')  # This line adds components/prepare_data to your path so you could import it.
from prepare_data_entry import prepare_data

prepare_component_func = component.from_func(ws, prepare_data)
help(prepare_component_func)

Run the component with local test data.`component.run` support both run the component with local python environment, or with local container, configured by `use_docker` parameter. Note that in component entry script the input type is InputDirectory, so here pass a direcotry for local test. 

In [None]:
local_sample = 'components/creation/prepare_data_small_input'
component = prepare_component_func(input_data=local_sample)
component.run(experiment_name='prepare_data_local_run', use_docker=False, track_run_history=True)

## Register the component
Once we have successfully tested the component, we could register the component to the workspace with the following command. After registeration, the component will be availbe to all users that have access to the worksapce. And it can be consumed both in designer UI and SDK.  
- `az ml component build` will automatically generate yaml spec from component entry script that are wrapped as dsl component
- `az ml component register` will register component to an Azure Machine Learning workspace. 


In [None]:
!az ml component build --target components/creation/prepare_data/prepare_data_entry.py --source-dir components/creation/prepare_data
!az ml component register --set-as-default-version --spec-file components/creation/prepare_data/prepare_data_entry.spec.yaml

In [None]:
# Check whether the component is successfully registered
component_func = component.load(ws, name='Prepare data')
help(component_func)

It's also possible to consume the registered component in designer UI. Add &flight=cm at end of your URL to see custom components in designer.
![consume-in-designer](./components/media/consume-in-designer.png)