# Exercise08 : Publish as a Web Service

Finally we publish our model as a web service.

Before running this code, **complete the model registration in "[Exercise04 : Train on Remote GPU Virtual Machine](./exercise04_train_remote.ipynb)"**.

*back to [index](https://github.com/tsmatz/azureml-tutorial/)*

## Initialize MLClient

Replace below's branket's string with your subscription id, resource group name, and AML workspace name.<br>
(I note that creating ```MLClient``` will not connect to AML workspace, and the client initialization is lazy.)

In [1]:
from azure.ai.ml import MLClient
from azure.identity import DeviceCodeCredential, TokenCachePersistenceOptions

# When you run on remote
cache_opt = TokenCachePersistenceOptions(allow_unencrypted_storage=True)
cred = DeviceCodeCredential(cache_persistence_options=cache_opt)

# # When you run on Azure ML Notebook
# from azure.identity import DefaultAzureCredential
# cred = DefaultAzureCredential()

# Get a handle to the workspace
ml_client = MLClient(
    credential=cred,
    subscription_id="{SUBSCRIPTION ID}",
    resource_group_name="{RESOURCE GROUP NAME}",
    workspace_name="{AML WORKSPACE NAME}",
)

## Create entry script (.py)

In order to deploy as web service, first we generate the following scoring code.<br>
This entry script in AML should include both ```init()``` and ```run()```.

> Note : The serving compute (VM) provides [managed identity endpoint](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token). (Your script can use both system-assigned identity and user-assigned identity.) Your script can then get the access permissions for Azure resources without providing secure information.

In [2]:
import os
script_folder = './inference_script'
os.makedirs(script_folder, exist_ok=True)

In [3]:
%%writefile inference_script/score.py
import os
import json
import numpy as np
import tensorflow as tf

def init():
    global loaded_model
    ## model_path = azureml.core.Model.get_model_path(model_name='mnist_model_test')
    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "mnist_tf_model"
    )
    loaded_model = tf.keras.models.load_model(model_path)

def run(raw_data):
    try:
        data = json.loads(raw_data)["data"]
        pred_output = loaded_model(np.array(data))
        pred_list = tf.math.argmax(pred_output, axis=-1).numpy().tolist()
        return pred_list
    except Exception as e:
       result = str(e)
       return 'Internal Exception : ' + result

Writing inference_script/score.py


## Create a managed endpoint

There exist **endpoint** and **deployment** in deployment topology in managed online endpoint.<br>
You can run multiple deployments in a single endpoint, and allocate appropriate traffic for these multiple deployments.

First, create a managed endpoint for deployment target.<br>
I note that **```name``` should be unique and then specify arbitrary unique name**.

In [4]:
endpoint_name = "{UNIQUE_ENDPOINT_NAME}"

Replace the following ```UNIQUE_ENDPOINT_NAME```.

In [5]:
from azure.ai.ml.entities import ManagedOnlineEndpoint

endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    auth_mode="key",
)
ml_client.begin_create_or_update(endpoint)

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code AFS9GTWSZ to authenticate.


ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://my-mnist-test123.eastus.inference.ml.azure.com/score', 'swagger_uri': 'https://my-mnist-test123.eastus.inference.ml.azure.com/swagger.json', 'name': 'my-mnist-test123', 'description': None, 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourcegroups/aml-rg/providers/microsoft.machinelearningservices/workspaces/ws01/onlineendpoints/my-mnist-test123', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/providers/Microsoft.MachineLearningServices/locations/eastus/mfeOperationsStatus/oe:030b8dcf-1ffc-4245-9b2d-faabb1c4e31a:4c422361-d6f7-4ef8-8d19-b9025468e4dc?api-version=2022-02-01-preview'}, 'id': '/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/onlineEndpoints/my-mnist-te

Go to [AML Studio UI](https://ml.azure.com/), click "Endpoints", and select the above endpoint.<br>
Please wait until the provisioning state is succeeded.

![Endpoint status](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20221220_Endpoint_Status.jpg)

## Deploy as web service

Next we deploy the serving code (```score.py```) as a web service in the previous endpoint.

Before deployment, create conda configuration for serving environment.

In [6]:
%%writefile 08_conda_env.yml
name: serving_example
dependencies:
- python=3.8
- pip:
  - azureml-defaults
  - tensorflow==2.10.0
  - numpy
channels:
- anaconda
- conda-forge

Writing 08_conda_env.yml


Now we deploy a web service with yaml configuration for deployment.

When you change the model (or code) in managed endpoint, you can submit multiple deployments and transfer the traffic allocation without causing any disruption.<br>
With the following ```--all-traffic``` option, all traffic (100% traffic) will be allocated to this single deployment.

In this example, I use the trained model in Exercise04, and **run "[Exercise04 : Train on Remote GPU Virtual Machine](./exercise04_train_remote.ipynb)", before running this code.**

Replace the following ```UNIQUE_ENDPOINT_NAME```.

> Note : You can scale computes by increasing the following ```instance_count```. (You can also define auto-scale settings.)

In [7]:
from azure.ai.ml.entities import ManagedOnlineDeployment, Environment, CodeConfiguration

model = ml_client.models.get("mnist_model_test", version=1)

deployment = ManagedOnlineDeployment(
    name="my-mnist-deployment-v1",
    endpoint_name=endpoint_name,
    model=model,
    environment=Environment(
        conda_file="08_conda_env.yml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
    code_configuration=CodeConfiguration(
        code="./inference_script", scoring_script="score.py"
    ),
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

ml_client.begin_create_or_update(deployment)

Check: endpoint my-mnist-test123 exists
[32mUploading inference_script (0.0 MBs): 100%|█████████████████████████████████| 659/659 [00:00<00:00, 143833.39it/s][0m
[39m

data_collector is not a known attribute of class <class 'azure.ai.ml._restclient.v2022_02_01_preview.models._models_py3.ManagedOnlineDeployment'> and will be ignored
Creating/updating online deployment my-mnist-deployment-v1 

..........................................................................................................................................................................

Done (15m 18s)


Please wait until the deployment state is succeeded.

![Deployment status](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20221220_Deployment_Status.jpg)

Assign all traffic (100%) to this deployment.

In [8]:
endpoint.traffic = {"my-mnist-deployment-v1": 100}
ml_client.begin_create_or_update(endpoint)

ManagedOnlineEndpoint({'public_network_access': 'Enabled', 'provisioning_state': 'Succeeded', 'scoring_uri': 'https://my-mnist-test123.eastus.inference.ml.azure.com/score', 'swagger_uri': 'https://my-mnist-test123.eastus.inference.ml.azure.com/swagger.json', 'name': 'my-mnist-test123', 'description': None, 'tags': {}, 'properties': {'azureml.onlineendpointid': '/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourcegroups/aml-rg/providers/microsoft.machinelearningservices/workspaces/ws01/onlineendpoints/my-mnist-test123', 'AzureAsyncOperationUri': 'https://management.azure.com/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/providers/Microsoft.MachineLearningServices/locations/eastus/mfeOperationsStatus/oe:030b8dcf-1ffc-4245-9b2d-faabb1c4e31a:c00fd922-c45a-424c-8c24-87a7ce87f072?api-version=2022-02-01-preview'}, 'id': '/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/AML-rg/providers/Microsoft.MachineLearningServices/workspaces/ws01/onlineEndpoints/my-mnist-te

Please wait until the traffic becomes 100%.

![Deployment traffic](https://tsmatz.github.io/images/github/azure-ml-tensorflow-complete-sample/20221220_Deployment_Traffic.jpg)

For debugging purpose, you can also submit your deployment on local docker runtime. (See [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-managed-online-endpoints).)<br>
With Visual Studio Code, you can also attach debugger on local deployment.

## Test your web service

Let's invoke your deployed web service and check the returned results in Python.

You can get endpoint URI (address) as follows.

In [9]:
ml_client.online_endpoints.get(endpoint_name).scoring_uri

'https://my-mnist-test123.eastus.inference.ml.azure.com/score'

You can also extract key credential in your endpoint as follows.

In [10]:
ml_client.online_endpoints.get_keys(endpoint_name).primary_key

'IQlK8NXVaqyjPfqdIgNf5tLtmjQcJ6tm'

Now let's invoke scoring web service in Python.

In [11]:
import requests
import json

import tensorflow as tf

SERVING_URI = ml_client.online_endpoints.get(endpoint_name).scoring_uri
API_KEY = ml_client.online_endpoints.get_keys(endpoint_name).primary_key

# Read data by tensor
test_data = tf.data.Dataset.load("./data/test")

# Generate data
image_arr = []
label_arr = []
for image, label in test_data.take(3):
    image_arr.append(image.numpy().tolist())
    label_arr.append(label.numpy().item())

# Invoke web service !
headers = {
    'Content-Type':'application/json',
    'Authorization':('Bearer '+ API_KEY)
} 
values = json.dumps(image_arr)
input_data = "{\"data\": " + values + "}"
http_res = requests.post(
    SERVING_URI,
    input_data,
    headers = headers)
print('Predicted : ', http_res.text)
print('Actual    : ', label_arr)

2022-10-05 03:09:28.146497: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-05 03:09:28.337236: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-05 03:09:28.337283: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-10-05 03:09:28.377910: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-05 03:09:29.517958: W tensorflow/stream_executor/platform/de

Predicted :  [7, 2, 1]
Actual    :  [7, 2, 1]


You can also invoke web service by using AML Python SDK.

In [12]:
with open("sample-request.json", 'w') as f:
    f.write(input_data)

In [13]:
ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name="my-mnist-deployment-v1",
    request_file="./sample-request.json",
)

'[7, 2, 1]'

## Remove endpoint

In [14]:
ml_client.online_endpoints.begin_delete(name=endpoint_name)

Deleting endpoint my-mnist-test123 


..........................................................................................................

Done (9m 5s)
