#### Workflow
1. Initialize Workspace & creat workspace handle
2. Initialize
    - compute Cluster 
    - Environment
3. Create a .py scripts Data Processing & Training Model
4. Create Components
5. Build Pipeline using Components
6. Get Data Path
7. Initiate Pipeline

##### Step 1: Initialize Workspace and Create Workspace handle

In [14]:
from azureml.core import Workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Initialize  workspace
ws = Workspace.from_config()  

# Get a handle to the workspace
credential = DefaultAzureCredential()  # authenticate
ml_client = MLClient( credential=credential,
                      subscription_id=ws.subscription_id,
                      resource_group_name=ws.resource_group,
                      workspace_name=ws.name,
                    )


##### Step 2: Initialize Compute Cluster & Environment

In [15]:
from azure.ai.ml.entities import AmlCompute

# Name assigned to the compute cluster
compute = "ML-Pipeline-Cluster"

try:
    cpu_cluster = ml_client.compute.get(compute)
    print(f"You already have a cluster named {compute}, we'll reuse it as is.")

except Exception:
    print("Creating a new cpu compute target...")
    cpu_cluster = AmlCompute(
        name=compute,
        type="amlcompute",
        size="STANDARD_DS3_V2",
        min_instances=0,
        max_instances=4,
        idle_time_before_scale_down=300,
        tier="Dedicated",
    )
    
    print(f"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}")
    # Now, we pass the object to MLClient's create_or_update method
    cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)

You already have a cluster named ML-Pipeline-Cluster, we'll reuse it as is.


##### Environment

In [16]:
import os
from azure.ai.ml.entities import Environment

custom_env_name  = "ENV-SDKv2"
# dependencies_dir = '../dependencies'
# env = Environment( name=custom_env_name,
#                    description="Evironment for python SDKv2 Execution",
#                    conda_file=os.path.join(dependencies_dir, "conda.yaml"),
#                    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
#                  )
# env = ml_client.environments.create_or_update(env)

# GET ENVIRONMENT
# use 'label' parameter to get latest environment for example label='latest'
# use 'version' parameter to get specific version environment, for example version=2
env = ml_client.environments.get(name=custom_env_name, label='latest') 

print(f"Environment with name {env.name} is registered to workspace, the environment version is {env.version}")

Environment with name ENV-SDKv2 is registered to workspace, the environment version is 9


##### Step 3: Create Components to Build Pipeline

Now that you have all assets required to run your pipeline, it's time to build the pipeline itself.

Azure Machine Learning pipelines are reusable ML workflows that usually consist of several components. The typical life of a component is:

- Write the yaml specification of the component, or create it programmatically using `ComponentMethod`.
- Optionally, register the component with a name and version in your workspace, to make it reusable and shareable.
- Load that component from the pipeline code.
- Implement the pipeline using the component's inputs, outputs and parameters.
- Submit the pipeline.

There are two ways to create a component, programmatic and yaml definition. The next two sections walk you through creating a component using programmatic definition

> [!NOTE]
> In this tutorial for simplicity we are using the same compute for all components. However, you can set different computes for each component, for example by adding a line like `train_step.compute = "cpu-cluster"`. To view an example of building a pipeline with different computes for each component, see the [Basic pipeline job section in the cifar-10 pipeline tutorial](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/pipelines/2b_train_cifar_10_with_pytorch/train_cifar_10_with_pytorch.ipynb).

In [27]:
from azure.ai.ml import command
from azure.ai.ml import Input, Output

scripts_dir = "../src"
data_prep_component = command( name="inference data prep pima diabetes detection",
                               display_name ="Data preparation for inference",
                               description  ="reads input data & preprocesses it",
                               inputs= { "data": Input(type="uri_folder")},

                               outputs=dict( processed_data=Output(type="uri_folder", mode="rw_mount")),
                               code=scripts_dir, # The source folder of the component
                               command="""python pima_inference_dataProcessing_SDKv2.py \
                                        --data ${{inputs.data}} \
                                        --output ${{outputs.processed_data}} \
                                        """,
                               environment=f"{env.name}:{env.version}",
                            )

# , "output": Input(type="uri_folder")

prediction_component = command( name="pima diabetes model inference",
                            display_name ="Model inference",
                            inputs= { "processed_data": Input(type="uri_folder")
                                    },
                            outputs=dict(output=Output(type="uri_folder", mode="rw_mount")),

                            code=scripts_dir,
                            command="""python pima_modelPrediction_SDKv2.py \
                                    --processed_data ${{inputs.processed_data}} \
                                    --output ${{outputs.output}} \
                                    """,
                            environment=f"{env.name}:{env.version}",
                            )

##### Step 4: Build Pipeline using Components

To code the pipeline, you use a specific `@dsl.pipeline` decorator that identifies the Azure Machine Learning pipelines. In the decorator, we can specify the pipeline description and default resources like compute and storage. Like a Python function, pipelines can have inputs. You can then create multiple instances of a single pipeline with different inputs.

Here, we used *input data*, *split ratio* and *registered model name* as input variables. We then call the components and connect them via their inputs/outputs identifiers. The outputs of each step can be accessed via the `.outputs` property.


In [28]:
# the dsl decorator tells the sdk that we are defining an Azure Machine Learning pipeline
from azure.ai.ml import dsl

@dsl.pipeline(compute=compute, description="Building pima inference Pipeline using SDKv2")
def pima_inference_pipeline(input_data):
                             # using data_prep_function like a python call with its own inputs
                             data_prep_job = data_prep_component(data=input_data)

                             # using train_func like a python call with its own inputs
                             prediction_job = prediction_component( processed_data  = data_prep_job.outputs.processed_data,     # note: using outputs from previous step
                                                          
                                                        )

                             # a pipeline returns a dictionary of outputs
                             # return  { "processed_data": data_prep_job.outputs.processed_data }

##### Step 6: Get Data

In [29]:
# FETCH DATA
dataset_name = "test_pima_data_typeFile_SDKv2"  
pima_data  = ml_client.data.get(name = dataset_name, label = "latest")

##### Step 7: Initiate Pipeline

In [30]:
# Name of the model to be registered as 
#registered_model_name = "pima_pipeline_model_SDKv2"

# Let's instantiate the pipeline with the parameters of our choice
pipeline = pima_inference_pipeline(input_data=Input(type="uri_file", path= pima_data.path))
                                                                       

##### Step 8: Submit Job

In [31]:
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(pipeline,experiment_name="pima_pipeline_inference_sdk_v2",)
ml_client.jobs.stream(pipeline_job.name)

Exception: 
[37m
[30m
1) At least one required parameter is missing[39m[39m

Details: 

[31m(x) Input path can't be empty for jobs.[39m

Resolutions: 
1) Ensure all parameters required by the Job schema are specified.
If using the CLI, you can also check the full log in debug mode for more details by adding --debug to the end of your command

Additional Resources: The easiest way to author a yaml specification file is using IntelliSense and auto-completion Azure ML VS code extension provides: [36mhttps://code.visualstudio.com/docs/datascience/azure-machine-learning.[39m To set up VS Code, visit [36mhttps://docs.microsoft.com/azure/machine-learning/how-to-setup-vs-code[39m


In [27]:
# from azureml.core import Run, Model
# #run= Run.get_context()
# #ws = run.experiment.workspace
            
# model_obj  = Model(ws, name= 'pima_pipeline_model_SDKv2_03') # by default takes the latest version
# artifacts_path = model_obj.download(exist_ok = True)

# import joblib
# joblib.load(os.path.join(artifacts_path, 'unique_values_train.pkl'))

{'BM_DESC': ['Obese', 'Over', 'Healthy', 'Under']
 Categories (4, object): ['Obese', 'Over', 'Healthy', 'Under'],
 'INSULIN_DESC': ['Normal', 'Abnormal']
 Categories (2, object): ['Normal', 'Abnormal']}