# Hosting Detectron2 model on Sagemaker Inference endpoint

In this notebook we'll package previously trained model into PyTorch Serving container and deploy it on Sagemaker. First, let's review serving container. There are two key difference comparing to training container:
- we are using different base container provided by Sagemaker;
- we need to start Web server (refer to ENTRYPOINT command).

In [None]:
! pygmentize -l docker Dockerfile.serving

As in case of training image, we'll need to build and push container to AWS ECR. Before this, we'll need to loging to shared Sagemaker ECR and your local ECR

In [None]:
# loging to Sagemaker ECR with Deep Learning Containers
!aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-2.amazonaws.com
# loging to your private ECR
!aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin 553020858742.dkr.ecr.us-east-2.amazonaws.com

Now, let's build and push container using follow command. Note, that here we supply non-default Dockerfile.

In [57]:
! ./build_and_push.sh d2-sm-test mnist3 Dockerfile.test

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  1.523MB
Step 1/2 : FROM 763104351884.dkr.ecr.us-east-2.amazonaws.com/pytorch-inference:1.4.0-gpu-py36-cu101-ubuntu16.04
 ---> b92e9aa5a4ae
Step 2/2 : WORKDIR /
 ---> Running in 19f5ba3d69af
Removing intermediate container 19f5ba3d69af
 ---> bd58d727051d
Successfully built bd58d727051d
Successfully tagged d2-sm-test:latest
The push refers to repository [553020858742.dkr.ecr.us-east-2.amazonaws.com/d2-sm-test]

[1B7c396725: Preparing 
[1Bfbe9b2b0: Preparing 
[1B5fdb4837: Preparing 
[1B766a0187: Preparing 
[1B799c06ee: Preparing 
[1B30a84f78: Preparing 
[1Bff18f18d: Preparing 
[1Bed92ea38: Preparing 
[1B2f5281ec: Preparing 
[1B1ef06ab8: Preparing 
[1B93e764aa: Preparing 
[1B45d61a9e: Preparing 
[1B260fc70b: Preparing 
[1B25d00bb6: Preparing 
[1B9ccfe508: Preparing 
[1B39aaf404: Preparing 
[1B9333c486: Preparing 
[13B0a84f78: Waiting g 
[

# Deploying endpoint

Below is some initial imports and configuration.

In [58]:
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

In [59]:
import sagemaker
from time import gmtime, strftime

sess = sagemaker.Session() # can use LocalSession() to run container locally

bucket = sess.default_bucket()
region = "us-east-2"
account = sess.boto_session.client('sts').get_caller_identity()['Account']
prefix_input = 'detectron2-input'
prefix_output = 'detectron2-ouput'

## Define parameters of your container

In [60]:
container_test = "d2-sm-test" # your container name 
tag = "mnist3" # you can have several version of container available
image = '{}.dkr.ecr.{}.amazonaws.com/{}:{}'.format(account, region, container_serving, tag)

print("Following container will be used for hosting: ",image)

Following container will be used for hosting:  553020858742.dkr.ecr.us-east-2.amazonaws.com/d2-sm-test:mnist3


## Debug local endpoint

As training on COCO2017 can be quite lenghty, we'll deploy our endpoint from model artifacts from already completed training jobs. Please review your training jobs, and find one which succesffuly completed. Then, copy model artifact S3 URI and.  pass it to `model_data` argument below.

In [63]:
from sagemaker.pytorch import PyTorchModel, PyTorch, PyTorchPredictor

model = PyTorchModel(model_data="s3://sagemaker-us-east-2-553020858742/mnist/model_mnnist.tar.gz",
                     role=role,
                     entry_point="predict.py", source_dir="container_test",
                     framework_version="1.4", py_version="3.6",
                     image=image)

In [64]:
predictor = model.deploy(
                         instance_type = 'local_gpu',
                         initial_instance_count=1,
                         endpoint_name=f"{container_test}-{tag}", # define a unqie endpoint name; if ommited, Sagemaker will generate it based on used container
                         tags=[{"Key":"image", "Value":f"{container_test}:{tag}"}], 
                         wait=True
                         )

Attaching to tmp4z65u5f6_algo-1-surj4_1
[36malgo-1-surj4_1  |[0m 2020-04-22 03:08:40,678 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-surj4_1  |[0m MMS Home: /opt/conda/lib/python3.6/site-packages
[36malgo-1-surj4_1  |[0m Current directory: /
[36malgo-1-surj4_1  |[0m Temp directory: /home/model-server/tmp
[36malgo-1-surj4_1  |[0m Number of GPUs: 8
[36malgo-1-surj4_1  |[0m Number of CPUs: 64
[36malgo-1-surj4_1  |[0m Max heap size: 27305 M
[36malgo-1-surj4_1  |[0m Python executable: /opt/conda/bin/python
[36malgo-1-surj4_1  |[0m Config file: /etc/sagemaker-mms.properties
[36malgo-1-surj4_1  |[0m Inference address: http://0.0.0.0:8080
[36malgo-1-surj4_1  |[0m Management address: http://0.0.0.0:8080
[36malgo-1-surj4_1  |[0m Model Store: /.sagemaker/mms/models
[36malgo-1-surj4_1  |[0m Initial Models: ALL
[36malgo-1-surj4_1  |[0m Log dir: /logs
[36malgo-1-surj4_1  |[0m Metrics dir: /logs
[36malgo-1-surj4_1  |[0m Netty threads: 0
[36malgo-1-surj

In [65]:
# Let send prediction request using previously fetched image. 
# print(image_np)
# print(image)

batch_size = 100
data = np.random.rand(batch_size, 1, 28, 28).astype(np.float32)
response = predictor.predict(data)

[36malgo-1-surj4_1  |[0m 2020-04-22 03:09:08,852 [INFO ] W-9007-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 23286
[36malgo-1-surj4_1  |[0m 2020-04-22 03:09:08,852 [INFO ] W-9007-model ACCESS_LOG - /172.18.0.1:52602 "POST /invocations HTTP/1.1" 200 23290


In [66]:
print(response)

[[-2.39590716 -2.1643219  -2.39140368 -2.16797209 -2.40198469 -1.98820829
  -2.71318316 -2.44703007 -2.09107852 -2.47380447]
 [-2.23166275 -2.13188481 -2.2412107  -2.36000657 -2.45573545 -2.16299629
  -2.61025691 -2.40163064 -2.05093837 -2.5281415 ]
 [-2.26306486 -2.09808373 -2.26002908 -2.50288534 -2.28699636 -2.16186023
  -2.51280785 -2.4423337  -2.09865236 -2.52848029]
 [-2.26229572 -2.1461556  -2.49545383 -2.33746934 -2.42985606 -1.93201149
  -2.56041765 -2.43778753 -2.11441994 -2.50482678]
 [-2.18351769 -2.11946774 -2.19423747 -2.31780171 -2.52236485 -2.15039897
  -2.6083107  -2.41858459 -2.08717918 -2.60536385]
 [-2.2159543  -2.19677019 -2.42866945 -2.41303658 -2.39587641 -2.05938435
  -2.67321038 -2.33501482 -2.08948755 -2.36577129]
 [-2.41076899 -2.0008688  -2.21199369 -2.20018625 -2.47376537 -2.13901424
  -2.66241145 -2.40784693 -2.08913946 -2.67145729]
 [-2.16761661 -2.19551849 -2.37393546 -2.46108294 -2.36844134 -2.11144638
  -2.61162949 -2.37674904 -2.07832623 -2.41131234]
