# Spark ML model operationalization with Azure Machine Learning service.

In this lab, you will operationalize the Spark ML model developed in Lab 03 as a REST web service running in Azure Container Instance.
Azure Machine Learning service helps you orchestrate machine learning workflows using the architecture depicted on the below diagram.

![AML workflow](https://github.com/jakazmie/images-for-hands-on-labs/raw/master/amlarch.png)

## Install Azure ML SDK 

Before you can use Azure ML service features from Azure Databricks, you need to install Azure ML SDK as Azure Databricks Library. Follow this instructions:

https://docs.databricks.com/user-guide/libraries.html 

and add `azureml-sdk[databricks]` as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.

### Check SDK Version

In [4]:
import azureml.core

print("SDK version:", azureml.core.VERSION)

## Connect to Azure ML workspace

Follow the instructor to create Azure ML Workspace using Azure Portal. 

After the workspace has been provisioned, execute the below cells to connect to the workspace and save connection information on a driver node.

In [6]:
# Set connection parameters
subscription_id = "<your subscription>"
resource_group = "<your resource gropup>"
workspace_name = "<your workspace name>"


In [7]:
from azureml.core import Workspace# Connect to the workspace

from azureml.core import Workspace

ws = Workspace(workspace_name = workspace_name,
               subscription_id = subscription_id,
               resource_group = resource_group)

# persist the subscription id, resource group name, and workspace name in aml_config/config.json.
ws.write_config()

Review the AML config file.

In [9]:
%sh
cat /databricks/driver/aml_config/config.json

## Register your model 

One of the key features of Azure Machine Learning service is **Model Registry**. You can use model registry to manage versions of models including arbitrary meta data about the models.

Before you call the AML model register API you need to copy the model to the driver node, as the model register API searches for model files in the local (driver) file system.

In [11]:
import os

model_dbfs_path = '/models/churn_classifier'
model_name = 'ChurnClassifierML'
model_local = 'file:' + os.getcwd() + '/' + model_name

dbutils.fs.cp(model_dbfs_path, model_local, True)

In [12]:
%sh

ls -la .

You can now register the model.

In [14]:
from azureml.core.model import Model

mymodel = Model.register(model_path=model_name, 
                         model_name=model_name,
                         description='Spark ML classifier model for customer churn prediction',
                         workspace=ws
                        )

print(mymodel.name, mymodel.description, mymodel.version)

The model has been registered with the model registry. The next step is to deploy the model to Azure Container Instance.

## Deploy the model to ACI

To build the correct environment for ACI deployment, you need to provide the following:
* A scoring script that invokes the model
* An environment file to show what packages need to be installed
* A configuration file to build the ACI
* The serialized model 

### Create scoring script

Create the scoring script, called score.py, used by the web service call to invoke the model.

You must include two required functions in the scoring script:
* The `init()` function, which loads the model into a global object. This function is run only once when the Docker container is started. 

* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats can be used.

In [16]:
%%writefile score.py

import json
import pyspark
from azureml.core.model import Model
from pyspark.ml import PipelineModel


def init():
    try:
        # One-time initialization of PySpark and predictive model
        
        global trainedModel
        global spark
        
        spark = pyspark.sql.SparkSession.builder.appName("Churn prediction").getOrCreate()
        model_name = "<<model_name>>" 
        model_path = Model.get_model_path(model_name)
        trainedModel = PipelineModel.load(model_path)
    except Exception as e:
        trainedModel = e

def run(input_json):
    if isinstance(trainedModel, Exception):
        return json.dumps({"trainedModel":str(trainedModel)})
      
    try:
        sc = spark.sparkContext
        input_list = json.loads(input_json)
        input_rdd = sc.parallelize(input_list)
        input_df = spark.read.json(input_rdd)
    
        # Compute prediction
        prediction = trainedModel.transform(input_df)
        #result = prediction.first().prediction
        predictions = prediction.collect()

        #Get each scored result
        preds = [str(x['prediction']) for x in predictions]
        # result = ",".join(preds)
        result = preds
    except Exception as e:
        result = str(e)
    return json.dumps({"result":result})        

In [17]:
%sh
cat score.py

Substitue the actual *model name* in the script file.

In [19]:
script_file_name = 'score.py'

with open(script_file_name, 'r') as cefr:
    content = cefr.read()
    
with open(script_file_name, 'w') as cefw:
    cefw.write(content.replace('<<model_name>>', mymodel.name))

In [20]:
%sh
cat score.py

### Create a docker image encapsulating the model

The docker image that encapsulates our model is be based on a standard AML image that contains the PySpark runtime and a web service wrapper. It must also include the scoring script, any dependencies required by the scoring script, and `azureml-defaults`.

In [22]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())
    
# Review Conda dependencies file
with open("myenv.yml","r") as f:
    print(f.read())

In [23]:
from azureml.core.image import ContainerImage, Image

runtime = "spark-py"
scoring_script = "score.py"

# Configure the image
image_config = ContainerImage.image_configuration(execution_script=scoring_script, 
                                                  runtime=runtime, 
                                                  conda_file="myenv.yml",
                                                  description="Churn prediction web service",
                                                  tags={"Classifier": "GBT"})

# Create image
image = Image.create(name = "churn-classifier",
                     # this is the model object 
                     models = [mymodel],
                     image_config = image_config, 
                     workspace = ws)

image.wait_for_creation(show_output = True)

### Define ACI configuration

Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. The default is 1 core and 1 gigabyte of RAM.  In this lab we will use the defaults but you should always go through the proper performance plannig exercise to find the right configuration.

In [25]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"Model": "GBT"}, 
                                               description='Predict customer churn')

### Deploy in ACI

Deploy the image as a web service in Azure Container Instance.

In [27]:
from azureml.core.webservice import Webservice

aci_service_name = 'churn-classifier'
print(aci_service_name)
aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                           image = image,
                                           name = aci_service_name,
                                           workspace = ws)
aci_service.wait_for_deployment(True)

In [28]:
print(aci_service.get_logs())

### Test the prediction web service

The web service encapsulating the model has been started and is accessible using the following URL.

In [30]:
print(aci_service.scoring_uri)

To test the service we will use 5 rows from the testing dataset that was saved as a parquet file in the previous lab. As you recall, the `run` function in the scoring script assumes that the data is formatted as JSON.

In [32]:
import json

# Read 5 rows fro the test dataset
test_data = spark.read.parquet("/datasets/churn_test_data").limit(5)

# Convert it to JSON
test_json = json.dumps(test_data.toJSON().collect())
print(test_json)


Call the web service.

In [34]:
aci_service.run(input_data=test_json)

In [35]:
print(aci_service.get_logs())

### Clean up

In [37]:
#Delete service

aci_service.delete()