#### Deploy the model as Batch Endpoint

##### Workflow
1. Initialize 
- Workspace
- Environment
- Cluster
- Experiment
2. Get Reference to Input data
3. Create a Scoring script
4. Create and Submit Pipeline Step
5. Download the predictions to local folder (optional)
6. Publish the pipeline
7. Invoke pipeline endpoint


##### Step 1: Initializing Workspace

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [1]:
from azureml.core import Workspace

# Initializing Workspace
ws = Workspace.from_config()

##### Create compute

In [2]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

compute = "ML-Pipeline-Cluster"

try:
    # Check for existing compute target
    inference_cluster = ComputeTarget(workspace=ws, name=compute)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        inference_cluster = ComputeTarget.create(ws, compute, compute_config)
        inference_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)
    

Found existing cluster, use it.


##### Create or Get Environment

In [3]:
# creating an environment
from azureml.core import Environment
from azureml.core.runconfig import RunConfiguration

# ---- create enviroment using .yaml file
env_name= 'ENV-SDKv1'
# python_packages = Environment.from_conda_specification(env_name, '../dependencies/conda.yaml')
# # register the environment
# python_packages.register(workspace=ws)

# calling Environment
reg_env = Environment.get(ws, env_name)

##### Step 2: Get Referrence to Input Data

In [4]:
from azureml.core import Dataset
# ---- Getting Data
dataset_name = 'pima_test_typeTabular_SDKv1'
# loading data from Dataset
df_tb   = Dataset.get_by_name(workspace=ws, name= dataset_name)

##### Step 2: Create a Scoring Script
The Scoring script must contain a **init()** & **run(mini_batch)** function
- **run(mini_batch)**: The function will run for each mini_batch instance.
- **mini_batch**: ParallelRunStep will invoke run method and pass either a list or pandas DataFrame as an argument to the method. Each entry in mini_batch will be a **file path** if input is a **FileDataset** or a **pandas DataFrame** if input is a **TabularDataset**



##### Step 3: Create a pipeline for batch inferencing
You're going to use a pipeline to run the batch prediction script, generate predictions from the input data, and save the results as a text file in the output folder. To do this, you can use a **ParallelRunStep**, which enables the batch data to be processed in parallel and the results collated in a single output file named *parallel_run_step.txt*.

**Important**: For more details on Batch Inferencing & scoring script, [click here](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-debug-parallel-run-step?view=azureml-api-1#testing-scripts-locally)

In [7]:
from azureml.pipeline.steps import ParallelRunConfig, ParallelRunStep
from azureml.data import OutputFileDatasetConfig

output_dir = OutputFileDatasetConfig(name='Pima_BatchEndpoint_Output')

parallel_run_config = ParallelRunConfig(
    source_directory='../src',
    entry_script="pima_scoreBatchEndpoint_SDKv1.py",
    mini_batch_size='10MB',
    error_threshold=-1,
    output_action="append_row",
    environment= reg_env,
    compute_target=compute,
    node_count=2
    )

parallelrun_step = ParallelRunStep(
    name='pima-batch-endpoint-SDKv1',
    parallel_run_config=parallel_run_config,
    inputs=[df_tb.as_named_input('pima_batch_data')],
    output=output_dir,
    # arguments=[],
    allow_reuse=False
)

print('Steps defined')

Steps defined


ParallelRunStep requires azureml-dataset-runtime[fuse,pandas] for tabular dataset.
Please add relevant package in CondaDependencies.


put the parallelrun_step into a pipeline, and run it.


In [8]:
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline

pipeline = Pipeline(workspace=ws, steps=[parallelrun_step])
pipeline_run = Experiment(ws, 'Pima_Batch_Experiments_Training_SDK_v1').submit(pipeline)

Created step pima-batch-endpoint-SDKv1 [4fe241a0][4251a177-13e8-4bf4-97d8-0bc91e4ff5fb], (This step will run and generate new outputs)
Submitted PipelineRun 96ae44f8-6ac2-4cc9-b821-dd655a79e73f
Link to Azure Machine Learning Portal: https://ml.azure.com/runs/96ae44f8-6ac2-4cc9-b821-dd655a79e73f?wsid=/subscriptions/ba5d6a04-af22-45ea-bc5a-946ef1c32949/resourcegroups/us_azure_practice/workspaces/us_azure&tid=5ac231ff-07da-46e9-9b1d-c924625f23bd


##### Step 5: Download  predictions to local Folder (Optional)
When the pipeline has finished running, the resulting predictions will have been saved in the outputs of the experiment associated with the first (and only) step in the pipeline. You can retrieve it as follows:

In [None]:
import pandas as pd
import os

# Get the run for the first step and download its output
prediction_run = next(pipeline_run.get_children())
prediction_output = prediction_run.get_output_data('Pima_BatchEndpoint_Output')
prediction_output.download(local_path='../batchprediction_output')

# Traverse the folder hierarchy and find the results file
for root, dirs, files in os.walk('../batchprediction_output'):
    for file in files:
        if file.endswith('parallel_run_step.txt'):
            result_file = os.path.join(root,file)

# cleanup output format
df = pd.read_csv(result_file, delimiter=" ", header=None)

# add column namesto dataframe
df_main = df_tb.to_pandas_dataframe()
#df_main.drop(['Time','Amount','Class'], axis=1, inplace=True)
df_col   = df_main.columns.tolist()
pred_col = ['Prediction']

# Display the first 20 results
df.columns = df_col + pred_col
df.head(20)

##### Step 6: Publish the Pipeline

Now that you have a working pipeline for batch inferencing, you can publish it and use a REST endpoint to run it from an application.

In [None]:
published_pipeline = pipeline_run.publish_pipeline(name='pima_pipelineEndpoint_BatchPrediction_SDKv1', description='Batch scoring of diabetes data', version='1.0')
# Get pipeline endpoint
rest_endpoint = published_pipeline.endpoint
print(rest_endpoint)

##### Step 7: Invoke the pipeline endpoint

In [None]:
from azureml.core.authentication import InteractiveLoginAuthentication
import requests

# Authenticate
interactive_auth = InteractiveLoginAuthentication()
auth_header = interactive_auth.get_authentication_header()
print('Authentication header ready.')


# Invoke
rest_endpoint = published_pipeline.endpoint
response = requests.post(rest_endpoint, 
                         headers=auth_header, 
                         json={"ExperimentName": "Pima_Batch_Experiments_Training_SDK_v1"})
run_id = response.json()["Id"]
run_id