# Serving sdzwa-andesv1 classifer with Sagemaker Serverless

This nb is adapted from 
https://github.com/aws-samples/amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve

It takes an existing .mar torchserve package from the animl-model-zoo, places it in a prod bucket, and serves it with a Sagemaker Serverless Endpoint.

In [None]:
%reload_ext autoreload
%autoreload 2


%matplotlib inline

## Boilerplate

### Session

In [None]:
import boto3, time, json
from PIL import Image
import sagemaker

sess = boto3.Session()
sm = sess.client("sagemaker")
region = sess.region_name
account = boto3.client("sts").get_caller_identity().get("Account")

### IAM Role

**Note**: make sure the IAM role has:  
- `AmazonS3FullAccess`  
- `AmazonEC2ContainerRegistryFullAccess`  
- `AmazonSageMakerFullAccess`  

In [None]:
role = sagemaker.get_execution_role()
role

### Amazon Elastic Container Registry (ECR)

**Note**: create ECR if it doesn't exist

In [None]:
registry_name = "torchserve-sdzwa-andesv1-sagemaker"

In [None]:
!aws ecr create-repository --repository-name {registry_name}

In [None]:
image = f"{account}.dkr.ecr.{region}.amazonaws.com/{registry_name}:latest"
image

### Pytorch Model Artifact

Create a compressed `*.tar.gz` file from the `*.mar` file per requirement of Amazon SageMaker and upload the model to your Amazon S3 bucket.

In [None]:
model_prefix = 'sdzwa-andesv1'
model_uri = f's3://animl-model-zoo/{model_prefix}/{model_prefix}.mar'
sagemaker_session = sagemaker.Session(boto_session=sess)
bucket_name = sagemaker_session.default_bucket()
prefix = 'torchserve'
prod_model_uri = f"s3://{bucket_name}/{prefix}/models/"

In [None]:
model_uri

In [None]:
prod_model_uri

In [None]:
!aws s3 cp {model_uri} ./

!tar cvfz {model_prefix}.tar.gz {model_prefix}.mar

!aws s3 cp {model_prefix}.tar.gz {prod_model_uri}

### Build a TorchServe Docker container and push it to Amazon ECR

**Skip this step if the registry is already made and the custom latest pytorch container is already pushed since this step takes a couple of minutes**

In [None]:
!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account}.dkr.ecr.{region}.amazonaws.com
!docker build -t {registry_name} .
!docker tag {registry_name} {image}
!docker push {image}

### Model

In [None]:
model_data = f"{prod_model_uri}{model_prefix}.tar.gz"
model_already_created = False
for model_def in sm.list_models()['Models']:
    if model_prefix == model_def['ModelName']:
        create_model_response = model_def
        model_already_created = True

In [None]:
container = {"Image": image, "ModelDataUrl": model_data}

if not model_already_created:
    create_model_response = sm.create_model(
        ModelName=model_prefix, ExecutionRoleArn=role, PrimaryContainer=container
    )

print(create_model_response["ModelArn"])

## Inference Endpoint

### Endpoint configuration

**Note**: choose your preferred `InstanceType`: https://aws.amazon.com/sagemaker/pricing/

### Serverless Config (this adds the serverless config section and removes instance type and size specs from the original notebook)

In [None]:
import time

# for batch endpoints, use concurrency of 80, for real-time endpoints, use 20
# https://github.com/tnc-ca-geo/animl-api/issues/101
concurrency = 80

endpoint_config_name = f"{model_prefix}-config-concurrency-{concurrency}"
#useful for testing, not production
# + time.strftime(
#     "%Y-%m-%d-%H-%M-%S", time.gmtime()
# )
print(endpoint_config_name)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "ModelName": model_prefix,
            "VariantName": "AllTraffic",
            "ServerlessConfig": {
            "MemorySizeInMB": 6144,
            "MaxConcurrency": concurrency
            }
        }
    ],
)

print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

### Endpoint

In [None]:
endpoint_name = f"{model_prefix}-concurrency-{concurrency}"
# useful for testing not production
# + time.strftime(
#     "%Y-%m-%d-%H-%M-%S", time.gmtime()
# )

print(endpoint_name)

create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print(create_endpoint_response["EndpointArn"])

In [None]:
%%time
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

In [None]:
resp

### Testing

NOTE: the deployed endpoints can be a little finiky when you first deploy them. I'm not totally sure what's going on, but sometimes we get strange timeouts and opaque errors when you invoke the endpoint immediately after deploying it. If it doesn't work right out of the gate, try a few times, let the endpoint instance scale down and stop running, try cold starts/warm starts, etc. before determining something is broken.

In [None]:
from io import BytesIO
import boto3
from PIL import Image
import json
# endpoint_name = "nzdoc"

image_key = "brazilian-tapir.jpg"
image_bbox = [0.0005858103395439684,0.3138456642627716,0.8546993732452393,0.9790124893188477]
image = boto3.client("s3").get_object(Bucket="animl-sample-images", Key=image_key)['Body'].read()
Image.open(BytesIO(image))

In [None]:
%%time
import base64
image_string = base64.b64encode(image).decode('ascii')
payload = { 'image': image_string, 'bbox': image_bbox } # bbox can either be list or string (e.g. "[0,0,1,1]")
payload_encoded = json.dumps(payload).encode('utf-8')
client = boto3.client("runtime.sagemaker")
response = client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="multipart/form-data", Body=payload_encoded
)
response = json.loads(response["Body"].read())

In [None]:
response

### Cleanup

In [None]:
client = boto3.client("sagemaker")
client.delete_model(ModelName=sm_model_name)
client.delete_endpoint(EndpointName=endpoint_name)
client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

In [None]:
### weird error, create model expects arn satisfying a different reg expression than create inf recommender. "model" vs "model-package" in the reg expression requirement.

# job_name = "mdv5-recommender"
# job_type = "Default"
# sm.create_inference_recommendations_job(
#     JobName = job_name,
#     JobType = job_type,
#     RoleArn = role,
#     InputConfig = {
#         'ModelPackageVersionArn': create_model_response["ModelArn"]
#     }
# )