# Exercise08 : Publish as a Web Service

Finally we publish our model as a web service.

Before running this code, **complete the model registration in "[Exercise04 : Train on Remote GPU Virtual Machine](./exercise04_train_remote.ipynb)"**.

*back to [index](https://github.com/tsmatz/azureml-tutorial/)*

## Variable's Setting

Replace below's branket's string and set the required variables.

> Note : By the following ```az configure --defaults```, you can skip setting for ```--resource-group``` and ```--workspace-name``` options in each ```az ml``` command.<br>
> ```az configure --defaults group=$resource_group workspace=$aml_workspace```

In [1]:
my_resource_group = "{AML-RESOURCE-GROUP-NAME}"
my_workspace = "{AML-WORSPACE-NAME}"

## Create entry script (.py)

In order to deploy as web service, first we generate the following scoring code.<br>
This entry script in AML should include both ```init()``` and ```run()```.

> Note : The serving compute (VM) provides [managed identity endpoint](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-to-use-vm-token). (Your script can use both system-assigned identity and user-assigned identity.) Your script can then get the access permissions for Azure resources without providing secure information.

In [2]:
import os
script_folder = './inference_script'
os.makedirs(script_folder, exist_ok=True)

In [3]:
%%writefile inference_script/score.py
import os
import json
import numpy as np
import tensorflow as tf

def init():
    global loaded_model
    ## model_path = azureml.core.Model.get_model_path(model_name='mnist_model_test')
    model_path = os.path.join(
        os.getenv("AZUREML_MODEL_DIR"), "mnist_tf_model"
    )
    loaded_model = tf.keras.models.load_model(model_path)

def run(raw_data):
    try:
        data = json.loads(raw_data)["data"]
        pred_output = loaded_model(np.array(data))
        pred_list = tf.math.argmax(pred_output, axis=-1).numpy().tolist()
        return pred_list
    except Exception as e:
       result = str(e)
       return 'Internal Exception : ' + result

Writing inference_script/score.py


## Create a managed endpoint

There exist **endpoint** and **deployment** in deployment topology in managed online endpoint.<br>
You can run multiple deployments in a single endpoint, and allocate appropriate traffic for these multiple deployments.

First, create a managed endpoint for deployment target.<br>
I note that **```name``` should be unique and then specify arbitrary unique name**.

In [4]:
endpoint_name = "{UNIQUE_ENDPOINT_NAME}"

Replace the following ```UNIQUE_ENDPOINT_NAME```.

In [5]:
%%writefile 08_managed_endpoint.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: {UNIQUE_ENDPOINT_NAME}
auth_mode: key

Writing 08_managed_endpoint.yml


In [6]:
!az ml online-endpoint create --file 08_managed_endpoint.yml \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

{
  "auth_mode": "key",
  "id": "/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/rg-AML/providers/Microsoft.MachineLearningServices/workspaces/ws01/onlineEndpoints/my-mnist-test123",
  "identity": {
    "principal_id": "65f96799-6bf2-49ae-841a-246d2acd5bd1",
    "tenant_id": "72f988bf-86f1-41af-91ab-2d7cd011db47",
    "type": "system_assigned"
  },
  "kind": "Managed",
  "location": "eastus",
  "mirror_traffic": {},
  "name": "my-mnist-test123",
  "properties": {
    "AzureAsyncOperationUri": "https://management.azure.com/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/providers/Microsoft.MachineLearningServices/locations/eastus/mfeOperationsStatus/oe:161509fe-37ed-4417-a7dc-90cfeccf6f44:f723d046-1e90-4f61-a23b-d2a1a29ce856?api-version=2022-02-01-preview",
    "azureml.onlineendpointid": "/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourcegroups/rg-aml/providers/microsoft.machinelearningservices/workspaces/ws01/onlineendpoints/my-mnist-test1

## Deploy as web service

Next we deploy the serving code (```score.py```) as a web service in the previous endpoint.

Before deployment, create conda configuration for serving environment.

In [7]:
%%writefile 08_conda_env.yml
name: serving_example
dependencies:
- python=3.8
- pip:
  - azureml-defaults
  - tensorflow==2.10.0
  - numpy
channels:
- anaconda
- conda-forge

Writing 08_conda_env.yml


Now we deploy a web service with yaml configuration for deployment.

When you change the model (or code) in managed endpoint, you can submit multiple deployments and transfer the traffic allocation without causing any disruption.<br>
With the following ```--all-traffic``` option, all traffic (100% traffic) will be allocated to this single deployment.

In this example, I use the trained model in Exercise04, and **run "[Exercise04 : Train on Remote GPU Virtual Machine](./exercise04_train_remote.ipynb)", before running this code.**

Replace the following ```UNIQUE_ENDPOINT_NAME```.

> Note : You can scale computes by increasing the following ```instance_count```. (You can also define auto-scale settings.)

In [8]:
%%writefile 08_managed_deployment.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: my-mnist-deployment-v1
endpoint_name: {UNIQUE_ENDPOINT_NAME}
model: azureml:mnist_model_test@latest
code_configuration:
  code: ./inference_script
  scoring_script: score.py
environment: 
  conda_file: 08_conda_env.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
instance_type: Standard_DS2_v2
instance_count: 1

Writing 08_managed_deployment.yml


In [9]:
!az ml online-deployment create --file 08_managed_deployment.yml \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace \
  --all-traffic

All traffic will be set to deployment my-mnist-deployment-v1 once it has been provisioned.
If you interrupt this command or it times out while waiting for the provisioning, you can try to set all the traffic to this deployment later once its has been provisioned.
Check: endpoint my-mnist-test123 exists
[32mUploading inference_script (0.0 MBs): 100%|█| 683/683 [00:00<00:00, 78500.25it/s[0m
[39m

Creating/updating online deployment my-mnist-deployment-v1 ................................................................................................................................................................................Done (15m 41s)
{
  "app_insights_enabled": false,
  "code_configuration": {
    "code": "/subscriptions/b3ae1c15-4fef-4362-8c3a-5d804cdeb18d/resourceGroups/rg-AML/providers/Microsoft.MachineLearningServices/workspaces/ws01/codes/ec542dcc-ce84-4df7-8fa2-d28b986519ed/versions/1",
    "scoring_script": "score.py"
  },
  "endpoint_name": "my-mnist-test123",
  "envir

[Optional] If error has occured, you can see logs as follows.<br>
For instance, the following log output shows Python import error in entry script.

In [9]:
!az ml online-deployment get-logs \
  --endpoint-name $endpoint_name \
  --name my-mnist-deployment-v1 \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

[36mCommand group 'ml online-deployment' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus[0m
Instance status:
SystemSetup: Succeeded
UserContainerImagePull: Succeeded
ModelDownload: Succeeded
UserContainerStart: InProgress

Container logs:
2022-02-28T07:19:01,824500903+00:00 - rsyslog/run 
2022-02-28T07:19:01,827055025+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
2022-02-28T07:19:01,844367081+00:00 - nginx/run 
2022-02-28T07:19:01,856197687+00:00 - iot-server/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2022-02-28T07:19:02,048559714+00:00 - iot-server/finish 1 0
2022-02-28T07:19:02,050840535+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (12)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 37
SPARK_HOME not set. Skipping PySpark Initialization.
Initializing log

Before submitting a deployment on cloud, you can submit and debug your deployment on local docker runtime. (See [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-managed-online-endpoints).)<br>
With Visual Studio Code, you can also attach debugger on local deployment.

## Test your web service

Let's invoke your deployed web service and check the returned results in Python.

First see URI (address) for your deployed web service.

In [10]:
!az ml online-endpoint show \
  --name $endpoint_name \
  --query scoring_uri \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

"https://my-mnist-test123.eastus.inference.ml.azure.com/score"
[0m

Extract key credential for this endpoint.

In [11]:
!az ml online-endpoint get-credentials \
  --name $endpoint_name \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace

{
  "primaryKey": "bbH40mTXMkYMCUyibUbK9Ta5beXAdaxR",
  "secondaryKey": "GZt56bE2XeRaHTvqRAtFG2W2HLiVtJ9R"
}
[0m

Now let's invoke scoring web service in Python.<br>
(**Replace the following ```UNIQUE_ENDPOINT_NAME``` and ```API_KEY``` with yours**.)

In [12]:
import requests
import json

import tensorflow as tf

SERVING_URI = "https://{UNIQUE_ENDPOINT_NAME}.eastus.inference.ml.azure.com/score"
API_KEY = "{API_KEY}"

# Read data by tensor
test_data = tf.data.Dataset.load("./data/test")

# Generate data
image_arr = []
label_arr = []
for image, label in test_data.take(3):
    image_arr.append(image.numpy().tolist())
    label_arr.append(label.numpy().item())

# Invoke web service !
headers = {
    'Content-Type':'application/json',
    'Authorization':('Bearer '+ API_KEY)
} 
values = json.dumps(image_arr)
input_data = "{\"data\": " + values + "}"
http_res = requests.post(
    SERVING_URI,
    input_data,
    headers = headers)
print('Predicted : ', http_res.text)
print('Actual    : ', label_arr)

2022-10-04 09:02:22.513146: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-04 09:02:22.656867: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-04 09:02:22.656908: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-10-04 09:02:22.689794: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-04 09:02:23.562819: W tensorflow/stream_executor/platform/de

Predicted :  [7, 2, 1]
Actual    :  [7, 2, 1]


2022-10-04 09:02:24.294638: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-10-04 09:02:24.294678: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-10-04 09:02:24.294717: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (client1004): /proc/driver/nvidia/version does not exist
2022-10-04 09:02:24.294941: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Remove endpoint

In [13]:
!az ml online-endpoint delete \
  --name $endpoint_name \
  --resource-group $my_resource_group \
  --workspace-name $my_workspace \
  --yes

 Delete request initiated. If you interrupt this command or it times out while waiting for deletion to complete, status can be checked using `az ml online-endpoint show -n my-mnist-test123`

Deleting endpoint my-mnist-test123 
................................................................................................................Done (9m 35s)
[0m