# Deploy a TensorFlow SavedModel model trained elsewhere to Amazon SageMaker

A lot of the steps below are taken from [this blog post](https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/) which explains how to take advantage of Amazon SageMaker deployment capabilities, such as selecting the type and number of instances, performing A/B testing, and Auto Scaling. Auto Scaling clusters are spread across multiple Availability Zones to deliver high performance and high availability.

In this notebook we'll be deploying Microsoft's Megadetector model, saved in SavedModel for TF Serving format, which can be downloaded [here](https://github.com/microsoft/CameraTraps/blob/master/megadetector.md#downloading-the-models). The blog post listed above also demonstrates how to deploy Keras models (JSON and weights hdf5) format to Sagemaker, but that is out of the scope of this notebook.

For more on training the model on SageMaker and deploying, refer to https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_distributed_mnist.ipynb

### Step 1. Set up

If you're already reading this in a Sagemaker Notebook instance, just execute the code block below to get the Sagemaker execution role.

If not, and you need to set up the Sagemaker Notebook, in the AWS Management Console go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Associate it with the animl-ml git repo (https://github.com/tnc-ca-geo/animl-ml), and set the kernel to conda_tensorflow_p36.

The ```get_execution_role``` function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

In [1]:
import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

### Step 2. Convert TensorFlow model to a SageMaker readable format

Download the megadetector model, unzip it, and make sure the .pb file is named ```saved_model.pb```. Note - you may have already done this if you ran the ```get_models.sh``` script locally.

Create an export directory structure in the jupyter environment (```animl-ml/notebooks/export/Servo/1```), and upload the contents of the downloaded model there. Create a code directory in the export folder (```animl-ml/notebooks/export/code```), and **copy** the contents of ```animl-ml/code```  (```inference.py``` and ```requirements.txt``` files) into it. ```inference.py``` is a pre/post processing script, and dependencies in ```requirements.txt``` get installed in the endpoint container when it is initialized. More on that and examples [here](https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing).

The export directory structure should look like this:


```
notebook
     ├─ deploy.ipynb
     ├─ export
           └─ Servo
                 └─ 1
                       └─ saved_model.pb
                       └─ variables
           └─ code
                 └─ inference.py
                 └─ requirements.txt
```

In [2]:
!mkdir export

In [3]:
!mkdir export/Servo

In [4]:
!mkdir export/Servo/1

In [5]:
!mkdir export/Servo/1/variables

In [7]:
!cp -r ../models/megadetector/code export/

####  Tar the entire directory and upload to S3
Yeeehaw now we're read to zip it all up and upload it to s3...

In [36]:
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('export', recursive=True)

In [37]:
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

### Step 3. Deploy the trained model

There are [two ways to deploy models to sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-deploy-model.html), using the AWS Sagemaker Python SDK (what we use below), or using the AWS SDK for Python (Boto 3). Boto 3 offers more lower level configuration controls. Documentation on using the Sagemaker Python SDK for deployment can be found [here](https://sagemaker.readthedocs.io/en/stable/using_tf.html#deploy-to-a-sagemaker-endpoint). The ```model.deploy()``` function returns a predictor that you can use to test inference on right away.

TODO: 
- look into using Elastic Inference (https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html) for low-cost fast inference without using a GPU instance

NOTES: Ignore the warning about python 3 and do not set the py_version argument

In [44]:
from sagemaker.tensorflow.serving import Model
sagemaker_model = Model(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                          role = role,
                          framework_version = '1.13',
                          entry_point = 'inference.py',
                          source_dir='export/code'
                       )

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [45]:
%%time
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge',
                                  )

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-------------!CPU times: user 15.9 s, sys: 2.48 s, total: 18.4 s
Wall time: 6min 48s


### Step 4. Invoke the endpoint

Grab the newly created endpoint name from the Amazon Sagemaker Console (https://us-west-1.console.aws.amazon.com/sagemaker/home?region=us-west-1#/endpoints) and plug it in below:

In [46]:
# this endpoitn works (TensorFlow 1.12)
# endpoint_name = 'sagemaker-tensorflow-serving-2020-11-08-21-35-31-540'

# this is TensorFlow 1.13
endpoint_name = 'tensorflow-inference-2020-11-08-22-51-57-429'

#### Create a predictor from the endpoint
This is only necessary if you didn't just deploy an endpoint and create a predictor in the step above.

In [13]:
from sagemaker.tensorflow.model import TensorFlowModel
predictor = sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)

#### Invoke the SageMaker endpoint using a boto3 client

In [47]:
import json
import boto3

TEST_IMG = 'input/sample-img.jpg'
# RENDER_THRESHOLD = 0.8
# MODEL_NAME = "megadetector"

client = boto3.client('runtime.sagemaker')

with open(args.img_uri, 'rb') as fd:
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/x-image', 
        Body=fd
    )

    response_body = response['Body']
    print(response_body.read())

b'{\n    "predictions": [\n        {\n            "detection_classes": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0],\n            "num_detections": 100.0,\n            "detection_boxes": [[0.476009, 0.27858, 0.552951, 0.409597], [0.445225, 0.241919, 0.549637, 0.410256], [0.00768748, 0.0, 0.306559, 0.885349], [0.0226511, 0.0389927, 0.59295, 0.970974], [0.46062, 0.2563, 0.539868, 0.377574], [0.50386, 0.188327, 0.5396, 0.257556], [0.442832, 0.252802, 0.521171, 0.336108], [0.494258, 0.1934, 0.541704, 0.309821], [0.450192, 0.255344, 0.502189, 