Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Creating and Updating a Docker Image before Deployment as a Webservice

This notebook demonstrates how to make changes to an existing docker image, before deploying it as a webservice.  

Knowing how to do this can be helpful, for example if you need to debug the execution script of a webservice you're developing, and debugging it involves several iterations of code changes.  In this case it is not an option to deploy your application as a webservice at every iteration, because the time it takes to deploy your service will significantly slow you down.  In some cases, it may be easier to simply run the execution script on the command line, but this not an option if your script accumulates data across individual calls.

**Note:** This code was tested on a Data Science Virtual Machine ([DSVM](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/)), running Ubuntu Linux 16.04 (Xenial).

## Configure your Azure Workspace

We need to set up your workspace, and make sure we have access to it from here.

This requires that you have downloaded the ***config.json*** configuration file for your azure workspace.

Follow this [Quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) to set up your workspace and to download the config.json file, which contains information about the workspace you just created. Save the file in the same directory as this notebook.

In [None]:
%matplotlib inline  
from azureml.core import Workspace

ws = Workspace.from_config()

Let's make sure that you have the correction version of the Azure ML SDK installed on your workstation or VM.  If you don't have the write version, please follow these [Installation Instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python#install-the-sdk).

In [None]:
import azureml

# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

if azureml.core.VERSION == '0.1.59':
    print("Looks like you have the correct version. We are good to go.")
else:
    print("There is a version mismatch, this notebook may not work as expected!")

## Create a Docker image using the Azure ML SDK

### Create a template execution script for your application

We are going to start with just an execution script for your webservice that ingests one value at a time and returns a running average.

In [None]:
%%writefile score.py

import json # we use json in order to interact with the anomaly detection service via a RESTful API

# The init function is only run once, when the webservice (or Docker container) is started
def init():
    global running_avg, curr_n
    
    running_avg = 0.0
    curr_n = 0
    
    pass

# the run function is run everytime we interact with the service
def run(raw_data):
    """
    Calculates rolling average according to Welford's online algorithm.
    https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online

    :param raw_data: raw_data should be a json query containing a dictionary with the key 'value'
    :return: runnin_avg (float, json response)
    """
    global running_avg, curr_n
    
    value = json.loads(raw_data)['value']
    n_arg = 5 # we calculate the average over the last "n" measures
    
    curr_n += 1
    n = min(curr_n, n_arg) # in case we don't have "n" measures yet
    
    running_avg += (value - running_avg) / n
    
    return json.dumps(running_avg)

### Create environment file for your Conda environment

Next, create an environment file (environment.yml) that specifies all the python dependencies of your script. This file is used to ensure that all of those dependencies are installed in the Docker image.  Let's assume your Webservice will require ``azureml-sdk``, ``scikit-learn``, and ``pynacl``.

In [None]:
from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
myenv.add_pip_package("pynacl==1.2.1")

with open("environment.yml","w") as f:
    f.write(myenv.serialize_to_string())

Review the content of the `environment.yml` file.

In [None]:
with open("environment.yml","r") as f:
    print(f.read())

### Create the initial Docker image

We use the ``environment.yml`` and ``score.py`` files from above, to create an initial Docker image.

In [None]:
%%time

from azureml.core.image import ContainerImage

# configure the image
image_config = ContainerImage.image_configuration(execution_script = "score.py", 
                                                  runtime = "python",
                                                  conda_file = "environment.yml")

# create the docker image. this should take less than 5 minutes
image = ContainerImage.create(name = "my-docker-image",
                              image_config = image_config,
                              models = [],
                              workspace = ws)

# we wait until the image has been created
image.wait_for_creation(show_output=True)

# let's save the image location
imageLocation = image.serialize()['imageLocation']

## Test the application by running the Docker container locally

### Download the created Docker image from the Azure Container Registry ([ACR](https://azure.microsoft.com/en-us/services/container-registry/))

Here we use some [cell magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html) to exchange variables between python and bash.

In [None]:
%%bash -s "$imageLocation" 

# get the location of the docker image in ACR
imageLocation=$1

# extract the address of the repository within ACR
repository=$(echo $imageLocation | cut -f 1 -d ".")

echo "Attempting to login to repository $repository"
az acr login --name $repository
echo

echo "Trying to pull image $imageLocation"
docker pull $imageLocation

### Start the docker container

We use standard Docker commands to start the container locally.

In [None]:
%%bash -s "$imageLocation"

# extract image name and tag from imageLocation
image_name=$(echo $1 | cut -f 1 -d ":")
tag=$(echo $1 | cut -f 2 -d ":")
echo "Image name: $image_name, tag: $tag"

# extract image ID from list of downloaded docker images
image_id=$(docker images $image_name:$tag --format "{{.ID}}"))
echo "Image ID: $image_id"

# we forward TCP port 5001 of the docker container to local port 8080 for testing
echo "Starting docker container"
docker run -d -p 8080:5001 $image_id

sleep 1

### Test the docker container

We test the docker container, by sending some data to it to see how it responds - just as we would with a Webservice.

In [None]:
import json
import requests
import numpy as np
import matplotlib.pyplot as plt

values = np.random.normal(0,1,100)
values = np.cumsum(values)


running_avgs = []

for value in values:
    raw_data = {"value": value}

    r = requests.post('http://localhost:8080/score', json=raw_data)

    result = json.loads(r.json())
    running_avgs.append(result)

plt.plot(values)
plt.plot(running_avgs)

## Modifying the container

Let's make a change to the the execution script: We want to enable an additional input argument to ``score.py`` to set how many previous values to consider in the running average.

In [None]:
%%writefile score.py

import json # we use json in order to interact with the anomaly detection service via a RESTful API

# The init function is only run once, when the webservice (or Docker container) is started
def init():
    global running_avg, curr_n
    
    running_avg = 0.0
    curr_n = 0
    
    pass

# the run function is run everytime we interact with the service
def run(raw_data):
    """
    Calculates rolling average according to Welford's online algorithm.
    https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online

    :param raw_data: raw_data should be a json query containing a dictionary with the key 'value'
    :return: runnin_avg (float, json response)
    """
    global running_avg, curr_n
    
    value = json.loads(raw_data)['value']
    n_arg = json.loads(raw_data)['n'] # we calculate the average over the last "n" measures
    
    curr_n += 1
    n = min(curr_n, n_arg) # in case we don't have "n" measures yet
    
    running_avg += (value - running_avg) / n
    
    return json.dumps(running_avg)

### Update container image

Copy the changed ``score.py`` into the running docker container and commit the changes to the container image.

In [None]:
%%bash -s $imageLocation

image_location=$1

# extract image name and tag from imageLocation
image_name=$(echo $image_location | cut -f 1 -d ":")
tag=$(echo $image_location | cut -f 2 -d ":")

echo "Image name: $image_name, tag: $tag"

# extract image id
image_id=$(docker images | grep $image_name | grep " ${tag} " | cut -b 74-85)

echo "Image ID: $image_id"

# extract container ID
container_id=$(docker ps | tail -n1 | cut -f 1 -d " ")
echo "Container ID: $container_id"

# copy modified scoring script again
docker cp score.py $container_id:/var/azureml-app/

sleep 1
# commit changes made in the container to the local copy of the image
docker commit $container_id $image_location

# let's wait for two seconds here
sleep 1

# stop the container
docker restart $container_id



### Test the container

**Note**, you probably have to run the above cell twice for the change to score.py to ahve an effect.

In [None]:
import json
import requests
import numpy as np
import matplotlib.pyplot as plt

n = 2  # set the number of values going into the running avg
values = np.random.normal(0,1,100)
values = np.cumsum(values)


running_avgs = []

for value in values:
    raw_data = {"value": value, "n": n}

    r = requests.post('http://localhost:8080/score', json=raw_data)

    result = json.loads(r.json())
    running_avgs.append(result)

plt.plot(values)
plt.plot(running_avgs)

### Push the updated container to ACR

**First**, test your Docker container again (run the json query above), to ensure that the changes are having the expected effect. 

**Then** you can push the image into ACR, so that it can be retrieved by the Azure ML SDK when you want to deploy your Webservice.

In [None]:
%%bash -s "$imageLocation"

image_location=$1

# extract container ID
container_id=$(docker ps | tail -n1 | cut -f 1 -d " ")
echo "Container ID: $container_id"

sleep 1
# commit changes made in the container to the local copy of the image
docker commit $container_id $image_location

docker push $image_location

Let's try to deploy the container to ACI, just to make sure everything behaves as expected.

In [None]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
from azureml.core.webservice import AciWebservice

# create configuration for ACI
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "some data",  "method" : "machine learning"}, 
                                               description="Does machine learning on some data")
# pull the image
image = ContainerImage(ws, name='my-docker-image', tags='1')

# deploy webservice
service_name = 'my-web-service'
service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                            image = image,
                                            name = service_name,
                                            workspace = ws)
service.wait_for_deployment(show_output = True)
print(service.state)


In [None]:
import json
import requests
import numpy as np
import matplotlib.pyplot as plt

n = 2  # set the number of values going into the running avg
values = np.random.normal(0,1,100)
values = np.cumsum(values)


running_avgs = []

for value in values:
    raw_data = json.dumps({"value": value, "n": n})
    raw_data = bytes(raw_data, encoding = 'utf8')
    
    # predict using the deployed model
    result = json.loads(service.run(input_data=raw_data))

    running_avgs.append(result)

plt.plot(values)
plt.plot(running_avgs)

## Clean up resources

To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:

In [None]:
service.delete()

# The end

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.