# Deploy Document Classification Custom Skill

This tutorial shows how to deploy a document classification custom skill for Cognitive Search. We will use the document classifier that was created by *01_Train_AML_Model.ipynb*. If you have not already, please run that script.

For more information on using custom skills with Cognitive Search, please see this [page](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface)

### 0.0 Important Variables you need to set for this tutorial

Enter your workspace, resource and subscription credentials below


In [1]:
# Machine Learning Service Workspace configuration
my_workspace_name = 'tg-aml-l400'
my_azure_subscription_id = '80a3336a-33ac-4098-a7e7-64eb71d80cee'
my_resource_group = 'tg-l400-train'

# Azure Kubernetes Service configuration
my_aks_location = 'australiaeast'
my_aks_compute_target_name = 'aks-comptarget1'
my_aks_service_name = 'aks-service1'     
my_leaf_domain_label = 'ssl1'   # web service url prefix

### 1.0 Import Packages

In [2]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import numpy as np

import azureml
from azureml.core import Workspace, Run

# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.27.0


### 2.0 Connect to Workspace
Create a workspace object. If you already have a workspace and a config.json file you can use `ws = Workspace.from_config()` instead.

In [3]:
from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.get(name = my_workspace_name, resource_group = my_resource_group, subscription_id = my_azure_subscription_id)
print(ws.name, ws.location, ws.resource_group, sep = '\t')

tg-aml-l400	australiaeast	tg-l400-train


### 3.0 Register Model
The last step in the training script wrote the file outputs/sklearn_mnist_model.pkl in a directory named outputs.

Register the model in the workspace so that you (or other collaborators) can query, examine, and deploy this model.

In [4]:
model = Model.register(model_path="outputs/newsgroup_classifier.pkl",
                        model_name="newsgroup_classifier",
                        tags={"data": "newsgroup", "document": "classification"},
                        description="document classifier for newsgroup20",
                        workspace=ws)

print(model.id)

Registering model newsgroup_classifier
newsgroup_classifier:3


### 4.0 Create Scoring Script
Create the scoring script, called score.py, used by the web service call to show how to use the model.

You must include two required functions into the scoring script:
- The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.
- The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

*The **run function** has been specifically tailored to deploy the model as a custom skill. This means that inputs & outputs are formatted correctly and any errors will be returned in a format usable by Cognitive Search*.

In [5]:
%%writefile score.py
import json
import numpy as np
import pandas as pd
import os
import pickle
import joblib

from azureml.core.model import Model

def init():
    global model
    # retreive the path to the model file using the model name
    model_path = Model.get_model_path(model_name='newsgroup_classifier')
    model = joblib.load(model_path)
    
def convert_to_df(my_dict):
    df = pd.DataFrame(my_dict["values"])
    data = df['data'].tolist()
    index = df['recordId'].tolist()
    return pd.DataFrame(data, index = index)

def run(raw_data):
    data = json.loads(raw_data)
    # Converting the input dictionary to a dataframe
    try:
        df = convert_to_df(data)
    # Returning error message for each item in batch if data not in correct format 
    except:
        df = pd.DataFrame(data)
        index = df['recordId'].tolist()
        message = "Request for batch is not in correct format"
        output_list = [{'recordId': i, 'data': {}, "errors": [{'message': message}]} for i in index]
        return {'values': output_list}
    
    output_list = []
    for index, row in df.iterrows():
        output = {'recordId': index, 'data': {}}
        try:
            output['data']['type'] = str(model.predict([row['content']])[0])
        # Returning exception if an error occurs
        except Exception as ex:
            output['errors'] = [{'message': str(ex)}]
        output_list.append(output)

    return {'values': output_list}    

Overwriting score.py


### 5.0 Create Environment File
Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs scikit-learn, pandas, and azureml-sdk.

In [6]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
myenv.add_conda_package("pandas")

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

In [7]:
with open("myenv.yml","r") as f:
    print(f.read())

# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
    # Required packages for AzureML execution, history, and data preparation.
  - azureml-defaults

- scikit-learn
- pandas
channels:
- anaconda
- conda-forge



### 6.0 Create Azure Kubernetes Service Configuration File
Estimated time to complete: about 10 minutes

Create an Azure Kubernetes Service deployment configuration file. Notice that we enable SSL since Azure Search only allows secure endpoints as custom skills. 

In [9]:
# Combine script and environment in an InferenceConfig
from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(runtime= "python",
                                              source_directory = 'service_files',
                                              entry_script="score.py",
                                              conda_file="myenv.yml")

In [10]:
# Define a deployment configuration
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name = 'aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='australiaeast')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)

Creating...................................................................
SucceededProvisioning operation finished, operation "Succeeded"


In [11]:
# Set the target-specific compute specification for deployment

from azureml.core.webservice import AksWebservice

classifier_deploy_config = AksWebservice.deploy_configuration(cpu_cores = 1,
                                                              memory_gb = 1)

In [13]:
# Deploy the model
from azureml.core.model import Model

model = ws.models['newsgroup_classifier']
service = Model.deploy(workspace=ws,
                       name = 'classifier-service',
                       models = [model],
                       inference_config = classifier_inference_config,
                       deployment_config = classifier_deploy_config,
                       deployment_target = production_cluster)
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-05-23 03:06:00+00:00 Creating Container Registry if not exists.
2021-05-23 03:06:00+00:00 Registering the environment.
2021-05-23 03:06:03+00:00 Building image..
2021-05-23 03:13:22+00:00 Creating resources in AKS.
2021-05-23 03:13:23+00:00 Submitting deployment to compute.
2021-05-23 03:13:23+00:00 Checking the status of deployment classifier-service..
2021-05-23 03:15:53+00:00 Checking the status of inference endpoint classifier-service.
Succeeded
AKS service creation operation finished, operation "Succeeded"


In [14]:
print('Scoring Uri: ' + service.scoring_uri)

Scoring Uri: http://20.40.181.245:80/api/v1/service/classifier-service/score


In [11]:
# # create AKS compute target
# from azureml.core.compute import ComputeTarget, AksCompute
# config = AksCompute.provisioning_configuration(location= my_aks_location)
# config.enable_ssl(leaf_domain_label= my_leaf_domain_label, overwrite_existing_domain=True)

# aks = ComputeTarget.create(ws, my_aks_compute_target_name, config)
# aks.wait_for_completion(show_output=True)

# # if you already created a configuration file, you can just attach: 
# #config = AksCompute.attach_configuration(resource_group= my_resource_group, cluster_name='enter cluser name here')
# #config.enable_ssl(leaf_domain_label= my_leaf_domain_label, overwrite_existing_domain=True)
# #aks = ComputeTarget.attach(ws, my_aks_compute_target_name, config)
# #aks.wait_for_completion(show_output=True)

# print(aks.ssl_configuration.cname, aks.ssl_configuration.status)

Creating............................................................................................................................................
SucceededProvisioning operation finished, operation "Succeeded"
ssl1wrpf4w.australiaeast.cloudapp.azure.com Auto


### 7.0 Create Container Image

Estimated time to complete: about 7-8 minutes

Build an image using:
1. The scoring file (score.py)
1. The environment file (myenv.yml)
1. The model file


In [19]:
# from azureml.core.webservice import  AksWebservice, Webservice
# from azureml.core.image import ContainerImage

# # build the image
# image_config = ContainerImage.image_configuration(execution_script = "score.py",
#                                                  runtime = "python",
#                                                  conda_file = "myenv.yml")

# image = ContainerImage.create(name = "sklearn-newsgroup-classifier",
#                               models = [model], 
#                               image_config = image_config, 
#                               workspace = ws)

# image.wait_for_creation(show_output=True)

  import sys
  if sys.path[0] == '':


Creating image
Running..........................................................................
Succeeded
Image creation operation finished for image sklearn-newsgroup-classifier:4, operation "Succeeded"


### 8.0 Deploy a web service
Deploy a web service using the AKS image. Then get the web service HTTPS endpoint and the key to use to call the service

In [18]:
# from azureml.core.webservice import  AksWebservice, Webservice
# from azureml.core.image import ContainerImage

# image.update_creation_state()

# # deploy an AKS web service using the image (unsure why we have to deploy a new service for the image - perhaps a different way of testing)
# aks_config = AksWebservice.deploy_configuration()
# service = Webservice.deploy_from_image(workspace = ws,
#                                        name = "aks-image",
#                                        image = image,
#                                        deployment_config = aks_config,
#                                        deployment_target = aks)


# service.wait_for_deployment(show_output = True)
primary, secondary = service.get_keys()
# print('Scoring Uri: ' + service.scoring_uri)
# print('Primary key: ' + primary)

### 9.0 Test Deployed Service

#### 9.1 Import 20newsgroups Test Dataset

In [19]:
from sklearn.datasets import fetch_20newsgroups

categories = ['comp.graphics', 'sci.space']
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)

X_test = newsgroups_test.data
y_test = [categories[x] for x in newsgroups_test.target]

#### 9.2 Format Data in Correct Structure for Cognitive Search
For more information on custom skills see this [link](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface).

In [20]:
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)

input_data = {"values":[{"recordId": "0", "data": {"content": newsgroups_test.data[random_index]}}]}
print(input_data)

{'values': [{'recordId': '0', 'data': {'content': 'Subject: PHIGS User Group Conference\nFrom: hamlin@ug.eds.com (Griff Hamlin)\nReply-To: hamlin@ug.eds.com (Griff Hamlin)\nDistribution: world\nOrganization: EDS Unigraphics, Cypress CA\nNntp-Posting-Host: 134.244.15.158\nLines: 173\n\n\n\n                FIRST ANNUAL PHIGS USER GROUP CONFERENCE\n\n          The First Annual PHIGS User Group Conference was held March 21-24\n          in Orlando, Florida.  The conference was organized by the Rensse-\n          laer Design Research Center in co-operation with  IEEE  and  SIG-\n          GRAPH.   Attendees  came  from five countries spanning three con-\n          tinents.   A  good  cross-section  of  the  PHIGS  community  was\n          represented  at this conference with participants including PHIGS\n          users, workstation vendors, third-party PHIGS implementors, stan-\n          dards  committee  members,  and  researchers  from  industry  and\n          academia.  The opening s

#### 9.3 Send HTTP Request and View Results

In [22]:
import requests
import json

input_json = json.dumps(input_data)

headers = { 'Content-Type':'application/json'}
headers['Authorization']= f'Bearer {primary}'

# for AKS deployment you'd need the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_json, headers=headers)

print("POST to url", service.scoring_uri)
print("label:", y_test[random_index])
print("prediction:", resp.text)

POST to url http://20.40.181.245:80/api/v1/service/classifier-service/score
label: comp.graphics
prediction: {"values": [{"recordId": "0", "data": {"type": "comp.graphics"}}]}
