# Deploy a TensorFlow SavedModel model trained elsewhere to Amazon SageMaker

A lot of the steps below are taken from [this blog post](https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/) which explains how to take advantage of Amazon SageMaker deployment capabilities, such as selecting the type and number of instances, performing A/B testing, and Auto Scaling. Auto Scaling clusters are spread across multiple Availability Zones to deliver high performance and high availability.

In this notebook we'll be deploying Microsoft's Megadetector model, saved in SavedModel for TF Serving format, which can be downloaded [here](https://github.com/microsoft/CameraTraps/blob/master/megadetector.md#downloading-the-models). The blog post listed above also demonstrates how to deploy Keras models (JSON and weights hdf5) format to Sagemaker, but that is out of the scope of this notebook.

For more on training the model on SageMaker and deploying, refer to https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_distributed_mnist.ipynb

### Step 1. Set up

If you're already reading this in a Sagemaker Notebook instance, just execute the code block below to get the Sagemaker execution role.

If not, and you need to set up the Sagemaker Notebook, in the AWS Management Console go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Associate it with the animl-ml git repo (https://github.com/tnc-ca-geo/animl-ml), and set the kernel to conda_tensorflow_p36.

The ```get_execution_role``` function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

In [101]:
import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

### Step 2. Convert TensorFlow model to a SageMaker readable format

Download the megadetector model, unzip it, and rename the .pb file to ```saved_model.pb```. Note - you may have already done this if you ran the ```get_models.sh``` script locally.

Create an export directory structure in the jupyter environment (```animl-ml/notebooks/export/Servo/1```), and upload the contents of the downloaded model there, including the empty ```variables``` directory. Create a code directory in the export folder (```animl-ml/notebooks/export/code```), and **copy** the contents of ```animl-ml/code```  (```inference.py``` and ```requirements.txt``` files) into it. ```inference.py``` is a pre/post processing script, and dependencies in ```requirements.txt``` get installed in the endpoint container when it gets initialized. More on that and examples [here](https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing).

The export directory structure should look like this:


```
notebook
     ├─ deploy.ipynb
     ├─ export
           └─ Servo
                 └─ 1
                       └─ saved_model.pb
                       └─ variables
           └─ code
                 └─ inference.py
                 └─ requirements.txt
```

In [None]:
!mkdir export

In [None]:
!mkdir export/Servo

In [None]:
!mkdir export/Servo/1

In [None]:
!mkdir export/Servo/1/variables

In [None]:
!mkdir export/code

####  Tar the entire directory and upload to S3
Yeeehaw now we're read to zip it all up...

In [102]:
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('export', recursive=True)

In [103]:
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

### Step 3. Deploy the trained model

There are [two ways to deploy models to sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-deploy-model.html), using the AWS Sagemaker Python SDK (what we use below), or using the AWS SDK for Python (Boto 3). Boto 3 offers more lower level configuration controls. Documentation on using the Sagemaker Python SDK for deployment can be found [here](https://sagemaker.readthedocs.io/en/stable/using_tf.html#deploy-to-a-sagemaker-endpoint). The ```model.deploy()``` function returns a predictor that you can use to test inference on right away.

TODO: 
- look into using Elastic Inference (https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html) for low-cost fast inference without using a GPU instance

NOTES: Ignore the warning about python 3 and do not set the py_version argument

In [104]:
from sagemaker.tensorflow.serving import Model
sagemaker_model = Model(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                          role = role,
                          framework_version = '1.12',
                          entry_point = 'inference.py',
                          source_dir='export/code'
                       )

In [105]:
%%time
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge',
                                  )

-----------!CPU times: user 18.5 s, sys: 2.03 s, total: 20.5 s
Wall time: 5min 51s


### Step 4. Invoke the endpoint

Grab the newly created endpoint name from the Amazon Sagemaker Console (https://us-west-1.console.aws.amazon.com/sagemaker/home?region=us-west-1#/endpoints) and plug it in below:

In [106]:
endpoint_name = 'sagemaker-tensorflow-serving-2020-03-27-03-18-29-899'

#### Create a predictor from the endpoint
This is only necessary if you didn't just deploy an endpoint and create a predictor in the step above.

In [None]:
from sagemaker.tensorflow.model import TensorFlowModel
predictor = sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)

#### Invoke the SageMaker endpoint using a boto3 client
Replace ```TEST_IMG``` with an image object key that you know is in the ```s3://animl-images``` bucket, and then let it rip.

In [107]:
import json
import boto3

BUCKET = "animl-images"
TEST_IMG = "0fe0cecc40c04d916dfde886d7a5a9c7.jpg"
# RENDER_THRESHOLD = 0.8
# MODEL_NAME = "saved_model_megadetector_v3_tf19"

client = boto3.client('runtime.sagemaker')

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json', 
    Body=json.dumps(TEST_IMG)
)

response_body = response['Body']
print(response_body.read())

b'{\n    "predictions": [\n        {\n            "detection_scores": [0.0829493, 0.0828338, 0.0691808, 0.0447764, 0.0343378, 0.0319628, 0.028849, 0.0258857, 0.0161797, 0.0150743, 0.0138039, 0.0136431, 0.0131869, 0.0107709, 0.0106035, 0.00831678, 0.00705173, 0.00585848, 0.00576879, 0.00550011, 0.0052371, 0.00415652, 0.00409158, 0.00395912, 0.00304632, 0.00253824, 0.00251295, 0.00194808, 0.00192088, 0.00161899, 0.00153268, 0.00142408, 0.00141167, 0.00105104, 0.000982431, 0.000929584, 0.000844995, 0.000684777, 0.000607615, 0.000567728, 0.000552413, 0.00054706, 0.000537264, 0.000452302, 0.000437536, 0.000420866, 0.000407217, 0.000380945, 0.000380073, 0.000367915, 0.000332465, 0.000310611, 0.000287382, 0.000281439, 0.000271631, 0.000255717, 0.000242343, 0.000240265, 0.000228235, 0.000220861, 0.000206343, 0.000184694, 0.000177503, 0.000173855, 0.000166879, 0.00013705, 0.000130971, 0.000129523, 0.00012885, 0.000128738, 0.000118365, 0.000116286, 0.000112218, 0.000110522, 0.000109341, 0.000103