#### Workflow
1. Initialize Workspace & creat workspace handle
2. Initialize
    - compute Cluster 
    - Environment
3. Create a .py scripts Data Processing & Training Model
4. Create Components
5. Build Pipeline using Components
6. Get Data Path
7. Initiate Pipeline

##### Step 1: Initialize Workspace and Create Workspace handle

In [1]:
from azureml.core import Workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Initialize  workspace
ws = Workspace.from_config()  

# Get a handle to the workspace
credential = DefaultAzureCredential()  # authenticate
ml_client = MLClient( credential=credential,
                      subscription_id=ws.subscription_id,
                      resource_group_name=ws.resource_group,
                      workspace_name=ws.name,
                    )


##### Step 2: Initialize Compute Cluster & Environment

In [2]:
from azure.ai.ml.entities import AmlCompute

# Name assigned to the compute cluster
compute = "ML-Pipeline-Cluster"

try:
    cpu_cluster = ml_client.compute.get(compute)
    print(f"You already have a cluster named {compute}, we'll reuse it as is.")

except Exception:
    print("Creating a new cpu compute target...")
    cpu_cluster = AmlCompute(
        name=compute,
        type="amlcompute",
        size="STANDARD_DS3_V2",
        min_instances=0,
        max_instances=4,
        idle_time_before_scale_down=300,
        tier="Dedicated",
    )
    
    print(f"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}")
    # Now, we pass the object to MLClient's create_or_update method
    cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

You already have a cluster named ML-Pipeline-Cluster, we'll reuse it as is.


##### Environment

In [3]:
import os
from azure.ai.ml.entities import Environment

custom_env_name  = "ENV-SDKv2"
# dependencies_dir = '../dependencies'
# env = Environment( name=custom_env_name,
#                    description="Evironment for python SDKv2 Execution",
#                    conda_file=os.path.join(dependencies_dir, "conda.yaml"),
#                    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
#                  )
# env = ml_client.environments.create_or_update(env)

# GET ENVIRONMENT
# use 'label' parameter to get latest environment for example label='latest'
# use 'version' parameter to get specific version environment, for example version=2
env = ml_client.environments.get(name=custom_env_name, label='latest') 

print(f"Environment with name {env.name} is registered to workspace, the environment version is {env.version}")

Environment with name ENV-SDKv2 is registered to workspace, the environment version is 9


##### Step 3: Create Components to Build Pipeline

Now that you have all assets required to run your pipeline, it's time to build the pipeline itself.

Azure Machine Learning pipelines are reusable ML workflows that usually consist of several components. The typical life of a component is:

- Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
- Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
- Load that component from the pipeline code.
- Implement the pipeline using the component's inputs, outputs and parameters.
- Submit the pipeline.

There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component using programmatic definition

> [!NOTE]
> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).

In [4]:
from azure.ai.ml import command
from azure.ai.ml import Input, Output

scripts_dir = "../src"
data_prep_component = command( name="data prep pima diabetes detection",
                               display_name ="Data preparation for training",
                               description  ="reads input data & preprocesses it",
                               inputs= { "data": Input(type="uri_folder"),
                               "train_test_ratio": Input(type='number') },

                               outputs=dict( processed_data=Output(type="uri_folder", mode="rw_mount")),
                               code=scripts_dir, # The source folder of the component
                               command="""python pima_dataProcessing_SDKv2.py \
                                        --data ${{inputs.data}} \
                                        --train_test_ratio ${{inputs.train_test_ratio}} \
                                        --processed_data ${{outputs.processed_data}} \
                                        """,
                               environment=f"{env.name}:{env.version}",
                            )

train_component = command( name="pima diabetes training  model",
                            display_name ="Training Model",
                            inputs= { "processed_data": Input(type="uri_folder"),
                                      "registered_model_name":Input(type='string'),
                                    },
                            outputs=dict(model=Output(type="uri_folder", mode="rw_mount")),
                            code=scripts_dir,
                            command="""python pima_model_Train_andRegister_SDKv2.py \
                                    --input_data ${{inputs.processed_data}} \
                                    --registered_model_name ${{inputs.registered_model_name}} \
                                    --model ${{outputs.model}} \
                                    """,
                            environment=f"{env.name}:{env.version}",
                            )

##### Step 4: Build Pipeline using Components

To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.

Here, we used *input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.


In [5]:
# the dsl decorator tells the sdk that we are defining an Azure Machine Learning pipeline
from azure.ai.ml import dsl

@dsl.pipeline(compute=compute, description="Building Training Pipeline using SDKv2")
def pima_diabetes_detection_pipeline(input_data, train_test_ratio, registered_model_name,):
                             # using data_prep_function like a python call with its own inputs
                             data_prep_job = data_prep_component(data=input_data, train_test_ratio= train_test_ratio)

                             # using train_func like a python call with its own inputs
                             train_job = train_component( processed_data  = data_prep_job.outputs.processed_data,     # note: using outputs from previous step
                                                          registered_model_name=registered_model_name
                                                        )

                             # a pipeline returns a dictionary of outputs
                             # return  { "processed_data": data_prep_job.outputs.processed_data }

##### Step 6: Get Data

In [6]:
# FETCH DATA
dataset_name = "pima-sdk-v2"  
pima_data  = ml_client.data.get(name = dataset_name, label = "latest")

##### Step 7: Initiate Pipeline

In [7]:
# Name of the model to be registered as 
registered_model_name = "pima_pipeline_model_SDKv2_03"

# Let's instantiate the pipeline with the parameters of our choice
pipeline = pima_diabetes_detection_pipeline(input_data=Input(type="uri_file", path= pima_data.path),
                                    train_test_ratio=0.25,
                                    registered_model_name=registered_model_name,
                                    )

##### Step 8: Submit Job

In [8]:
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(pipeline,experiment_name="pima_pipeline_training_sdk_v2",)
ml_client.jobs.stream(pipeline_job.name)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
[32mUploading src (0.02 MBs): 100%|██

RunId: green_spade_yqfwm3dj9n
Web View: https://ml.azure.com/runs/green_spade_yqfwm3dj9n?wsid=/subscriptions/ba5d6a04-af22-45ea-bc5a-946ef1c32949/resourcegroups/us_azure_practice/workspaces/us_azure

Streaming logs/azureml/executionlogs.txt

[2023-11-15 12:57:58Z] Submitting 1 runs, first five are: d0b3211e:f8889470-4d0c-4a84-af2d-24c8921b01ee
[2023-11-15 13:02:27Z] Completing processing run id f8889470-4d0c-4a84-af2d-24c8921b01ee.
[2023-11-15 13:02:27Z] Submitting 1 runs, first five are: 68708bbf:bb236f81-49ff-4c91-bc0a-62f7a042336a
[2023-11-15 13:03:20Z] Completing processing run id bb236f81-49ff-4c91-bc0a-62f7a042336a.

Execution Summary
RunId: green_spade_yqfwm3dj9n
Web View: https://ml.azure.com/runs/green_spade_yqfwm3dj9n?wsid=/subscriptions/ba5d6a04-af22-45ea-bc5a-946ef1c32949/resourcegroups/us_azure_practice/workspaces/us_azure

