# Hosting Detectron2 model on Sagemaker Inference endpoint

In this notebook we'll package previously trained model into PyTorch Serving container and deploy it on Sagemaker. First, let's review serving container. There are two key difference comparing to training container:
- we are using different base container provided by Sagemaker;
- we need to start Web server (refer to ENTRYPOINT command).

## Compiling Serving Container

In [None]:
! pygmentize -l docker Dockerfile.serving

As in case of training image, we'll need to build and push container to AWS ECR. Before this, we'll need to loging to shared Sagemaker ECR and your local ECR

In [None]:
# loging to Sagemaker ECR with Deep Learning Containers
!aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-2.amazonaws.com
# loging to your private ECR
!aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 553020858742.dkr.ecr.us-east-2.amazonaws.com

Now, let's build and push container using follow command. Note, that here we supply non-default Dockerfile.

In [None]:
! ./build_and_push.sh d2-sm-coco-serving latest Dockerfile.serving

## Preparing test data

We'll be using coco2017 validation dataset. To simplify working with it, let's install locally Pycoco package

In [None]:
!pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

In [None]:
!pip install -U scikit-image

Now, let's download Coco2017 validation dataset

In [None]:
data_dir = "../datasets/coco/" # folder where data will be saved
dataset  = "val2017"

In [None]:
! mkdir -p {data_dir}{dataset}
! wget http://images.cocodataset.org/zips/val2017.zip -P {data_dir}
! unzip {data_dir}/val2017.zip -d {data_dir}

In [None]:
! wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -P {data_dir}
! unzip {data_dir}/annotations_trainval2017.zip -d {data_dir}{dataset}

Let's get a random image ...

In [None]:
%matplotlib inline
from pycocotools.coco import COCO
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
import pylab
pylab.rcParams['figure.figsize'] = (8.0, 10.0)

annFile='{}{}/annotations/instances_{}.json'.format(data_dir,dataset, dataset)
coco=COCO(annFile)

# get all images containing given categories, select one at random
catIds = coco.getCatIds(catNms=['person','dog']);
imgIds = coco.getImgIds(catIds=catIds);
imgId = imgIds[np.random.randint(len(imgIds))]
image_instance = coco.loadImgs(imgId)[0]
image_np = io.imread(image_instance['coco_url'])    

In [None]:
plt.axis('off')
plt.imshow(image_np)
plt.show()

# Testing inference script locally

Let's first check what inference script we'll deploy:

In [None]:
!pip install sagemaker-inference

In [None]:
!pygmentize container_serving/predict_coco.py

To test inference pipeline locally, you can run your `container_serving/predict_coco.py` locally (only code in __main__ guard will be executed). You'll need to have Detectron2 and number other packages locally installed to test it.

Make sure that you pass correct --model-dir argument.

In [None]:
!python container_serving/predict_coco.py --image container_serving/coco_sample.jpg --model-dir ../trained_model

# Deploying Inference Endpoint

Below is some initial imports and configuration.

In [None]:
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

In [None]:
import sagemaker
from time import gmtime, strftime

sess = sagemaker.Session() # can use LocalSession() to run container locally

bucket = sess.default_bucket()
region = "us-east-2"
account = sess.boto_session.client('sts').get_caller_identity()['Account']
prefix_input = 'detectron2-input'
prefix_output = 'detectron2-ouput'

## Define parameters of your container

In [None]:
container_serving = "d2-sm-coco-serving" # your container name
tag = "latest" # you can have several version of container available
image = '{}.dkr.ecr.{}.amazonaws.com/{}:{}'.format(account, region, container_serving, tag)

print("Following container will be used for hosting: ",image)

## Deploy endpoint locally

As training on COCO2017 can be quite lenghty, we'll deploy our endpoint from model artifacts from already completed training jobs. Please review your training jobs, and find one which succesffuly completed. Then, copy model artifact S3 URI and.  pass it to `model_data` argument below.

In [None]:
from sagemaker.pytorch import PyTorchModel, PyTorch, PyTorchPredictor
from sagemaker import Model

# model = Model(model_data="s3://sagemaker-us-east-2-553020858742/detectron2-model/model_R_50_FPN_1x.tar.gz",
#               role=role,
#               image=image)

model = PyTorchModel(
#                     model_data="s3://sagemaker-us-east-2-553020858742/detectron2-model/model.tar.gz", #default D2 model
                     model_data="s3://sagemaker-us-east-2-553020858742/detectron2-model/model_R_50_FPN_1x.tar.gz", # from training job
                     role=role,
                     entry_point="predict_coco.py", source_dir="container_serving",
                     framework_version="1.4", py_version="3.6",
                     image=image)

In [None]:
predictor = model.deploy(
                         instance_type = 'local_gpu',
                         initial_instance_count=1,
                         endpoint_name=f"{container_serving}-{tag}", # define a unqie endpoint name; if ommited, Sagemaker will generate it based on used container
                         tags=[{"Key":"image", "Value":f"{container_serving}:{tag}"}], 
                         wait=False
                         )

In [None]:
from sagemaker_inference import content_types, decoder, default_inference_handler, encoder
from sagemaker.content_types import CONTENT_TYPE_JSON, CONTENT_TYPE_CSV, CONTENT_TYPE_NPY

# These are serializer and deserializer to communication between client (e.g. this notebook) and Sagemaker endpoint.

def np_to_npy(request_body):
    body = encoder.encode(request_body, CONTENT_TYPE_NPY)
    return body

def bytes_to_pickle(response_body, content_type):
    try:
        return response_body.read()
    finally:
        response_body.close()

payload = np_to_npy(image_np)
print(type(image_np))
print(type(payload))

In [None]:
predictor.predict(payload)

In [None]:
!curl -X POST 172.18.0.2:8080/invocations -v --data {payload} -H "Content-Type: application/x-npy"

## Deploy remote endpoint

To process inference data when we are sending it over internet, we need to have two customer ser/deser methods.

In [6]:
from sagemaker.pytorch import PyTorchModel, PyTorch, PyTorchPredictor
from sagemaker.estimator import Estimator, Model

print(image)

remote_model = Model(
                     model_data="s3://sagemaker-us-east-2-553020858742/detectron2-model/model_R_50_FPN_1x.tar.gz",
                     role=role,
                     image=image)

553020858742.dkr.ecr.us-east-2.amazonaws.com/d2-sm-coco-serving:latest


In [7]:
remote_predictor = remote_model.deploy(
                         instance_type='ml.p3.16xlarge', 
                         initial_instance_count=1,
                         endpoint_name=f"{container_serving}-{tag}-v2", # define a unqie endpoint name; if ommited, Sagemaker will generate it based on used container
                         tags=[{"Key":"image", "Value":f"{container_serving}:{tag}"}], 
                         wait=False
                         )

In [None]:
remote_predictor.predict(image_np)

In [None]:
from sagemaker import RealTimePredictor


remote_rt_predictor  = RealTimePredictor(endpoint=f"d2-sm-coco-serving-latest-v2",
                                sagemaker_session=sess,
                                serializer=np_to_npy,
                                deserializer=bytes_to_pickle
                               )


remote_rt_predictor.predict(image_np)

In [None]:
import boto3

client = boto3.client('sagemaker-runtime')

client.invoke_endpoint(
    EndpointName='d2-sm-coco-serving-latest',
    Body=np_to_npy(image_np),
    ContentType='application/x-npy',
    Accept='string', # TODO
    TargetModel='d2-sm-coco-serving-2020-04-21-01-10-43-766'
)

In [None]:
predictor.delete_endpoint()