# Amazon SageMaker Workshop
## _**Deployment**_

---

In this part of the workshop we will deploy our model created in the previous lab in an endpoint for real-time inferences to Predict Mobile Customer Departure.

---

## Contents

1. [Model hosting](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)
  * Set up a persistent endpoint to get predictions from your model
 
2. [Exercise - You turn to an endpoint and customize inference](#Exercise)
  
---

## Background

In the previous labs [Modeling](../../2-Modeling/modeling.ipynb) and [Evaluation](../../3-Evaluation/evaluation.ipynb) we trained multiple models with multiple SageMaker training jobs and evaluated them .

In SageMaker, there are multiple methods to deploy a trained model to a Real-Time Inference endpoint: SageMaker SDK, AWS SDK - Boto3, and SageMaker console. For more information, see Deploy Models for Inference in the Amazon SageMaker Developer Guide. SageMaker SDK has more abstractions compared to the AWS SDK - Boto3, with the latter exposing lower-level APIs for greater control over model deployment. In this tutorial, you deploy the model using the AWS SDK -Boto3. There are three steps you need to follow in sequence to deploy a model:

    1. Create a SageMaker model from the model artifact
    2. Create an endpoint configuration to specify properties, including instance type and count
    3. Create the endpoint using the endpoint configuration

Let's import the libraries for this lab:


In [56]:
#Supress default INFO loggingd
import logging
logger = logging.getLogger()
logger.setLevel(logging.ERROR)

In [57]:
import time
import json
from time import strftime, gmtime

import boto3

import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import csv_serializer
from sagemaker.model_monitor import DataCaptureConfig, DatasetFormat, DefaultModelMonitor
from sagemaker.s3 import S3Uploader, S3Downloader

In [58]:
sess = boto3.Session()
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()
s3_client = boto3.client("s3", region_name=region)
sm_client = boto3.client("sagemaker", region_name=region)
sm_runtime_client = boto3.client("sagemaker-runtime")
sm_autoscaling_client = boto3.client("application-autoscaling")

In [59]:
%store -r bucket
%store -r prefix
%store -r region
%store -r docker_image_name
%store -r framework_version
%store -r s3uri_test
%store -r training_job_name

In [60]:
bucket, prefix, region, docker_image_name, framework_version,training_job_name,s3uri_test

('sagemaker-studio-us-west-2-917049230680',
 'xgboost-churn',
 'us-west-2',
 '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.3-1',
 '1.3-1',
 'workshop-framework-xgboost-customer-chu-2022-09-16-07-12-01-702',
 's3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/data/test/test.csv')

In [61]:
# Create a model s3 path URI
s3_model_uri = f's3://{bucket}/{prefix}/output/{training_job_name}/output/model.tar.gz'
s3_model_uri

#Craete Data capture URI 
# S3 path where data captured at endpoint will be stored
data_capture_uri = f"s3://{bucket}/{prefix}/datacapture"

# S3 location of test data
test_data_uri = s3uri_test
s3_model_uri, data_capture_uri,test_data_uri

('s3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/output/workshop-framework-xgboost-customer-chu-2022-09-16-07-12-01-702/output/model.tar.gz',
 's3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/datacapture',
 's3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/data/test/test.csv')

---
## Host the model

Now that we've trained the model, let's deploy it to a hosted endpoint. To monitor the model after it's hosted and serving requests, we'll also add configurations to capture data that is being sent to the endpoint.

In [64]:
from sagemaker.image_uris import retrieve
# Retrieve the SageMaker managed XGBoost image
training_image = retrieve(framework="xgboost", region=region, version="1.3-1")

# Specify a unique model name that does not exist (truncated model name to accomodate limit of characters allowed (64) for endpoint name)
model_name = training_job_name[19:] 
primary_container = {
                     "Image": docker_image_name,
                     "ModelDataUrl": s3_model_uri
                    }

model_matches = sm_client.list_models(NameContains=model_name)["Models"]
if not model_matches:
    model = sm_client.create_model(ModelName=model_name,
                                   PrimaryContainer=primary_container,
                                   ExecutionRoleArn=role)
else:
    print(f"Model with name {model_name} already exists! Change model name to create new")


Model with name xgboost-customer-chu-2022-09-16-07-12-01-702 already exists! Change model name to create new


In [65]:
# Endpoint Config name
endpoint_config_name = f"{model_name}-ep-config"

# Endpoint config parameters
production_variant_dict = {
                           "VariantName": "Alltraffic",
                           "ModelName": training_job_name,
                           "InitialInstanceCount": 1,
                           "InstanceType": "ml.m5.xlarge",
                           "InitialVariantWeight": 1
                          }

# Data capture config parameters
data_capture_config_dict = {
                            "EnableCapture": True,
                            "InitialSamplingPercentage": 100,
                            "DestinationS3Uri": data_capture_uri,
                            "CaptureOptions": [{"CaptureMode" : "Input"}, {"CaptureMode" : "Output"}]
                           }


# Create endpoint config if one with the same name does not exist
endpoint_config_matches = sm_client.list_endpoint_configs(NameContains=endpoint_config_name)["EndpointConfigs"]
if not endpoint_config_matches:
    endpoint_config_response = sm_client.create_endpoint_config(
                                                                EndpointConfigName=endpoint_config_name,
                                                                ProductionVariants=[production_variant_dict],
                                                                DataCaptureConfig=data_capture_config_dict
                                                               )
else:
    print(f"Endpoint config with name {endpoint_config_name} already exists! Change endpoint config name to create new")

Endpoint config with name xgboost-customer-chu-2022-09-16-07-12-01-702-ep-config already exists! Change endpoint config name to create new


In [66]:
#code to create End point
endpoint_name_ = f"{model_name}-ep"

endpoint_matches = sm_client.list_endpoints(NameContains=endpoint_name)["Endpoints"]
if not endpoint_matches:
    endpoint_response = sm_client.create_endpoint(
                                                  EndpointName=endpoint_name,
                                                  EndpointConfigName=endpoint_config_name
                                                 )
else:
    print(f"Endpoint with name {endpoint_name} already exists! Change endpoint name to create new")

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
while status == "Creating":
    print(f"Endpoint Status: {status}...")
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
print(f"Endpoint Status: {status}")

Endpoint with name xgboost-customer-chu-2022-09-16-07-12-01-702-ep already exists! Change endpoint name to create new
Endpoint Status: InService


Ok, we just trained a model with SageMaker and then used deployed it in a managed SageMaker endpoint. 

In [67]:
from IPython.core.display import display, HTML
sm_ep_placeholder = "https://us-east-2.console.aws.amazon.com/sagemaker/home?region={}#/endpoints"

display(HTML(f"<a href={sm_ep_placeholder.format(region)}>Look at your endpoints here</a>"))

Or go to the left tab here, inside the Studio UI, and select "Endpoints":

![endpoints.png](media/endpoints.png)

#### Let's save the endpoint name for later (Monitoring lab)

In [68]:
endpoint_name_v2 = endpoint_name
%store endpoint_name_v2

Stored 'endpoint_name_v2' (str)


### Invoke the inference endpoint

Now that we have a hosted endpoint running, we can make real-time predictions from our model by making an http POST request.  But first, we need to set up serializers and deserializers for passing our `test_data` NumPy arrays to the model behind the endpoint.

In [69]:
# Fetch test data to run predictions with the endpoint
import pandas as pd
import numpy as np
import boto3
import sagemaker
import time
import json
import io
from io import StringIO
import base64
import pprint
import re
test_df = pd.read_csv(test_data_uri)
# test_df.head(5)
# For content type text/csv, payload should be a string with commas separating the values for each feature
# This is the inference request serialization step
# CSV serialization
csv_file = io.StringIO()
test_sample = test_df.drop(test_df.columns[0], axis=1).iloc[:5]
test_sample.to_csv(csv_file, sep=",", header=False, index=False)
payload = csv_file.getvalue()
response = sm_runtime_client.invoke_endpoint(
                                             EndpointName=endpoint_name,
                                             Body=payload,
                                             ContentType="text/csv",
                                             Accept="text/csv"
                                            )

# This is the inference response deserialization step
# This is a bytes object
result = response["Body"].read()
# Decoding bytes to a string
result = result.decode("utf-8")
# Converting to list of predictions
result = re.split(",|\n",result)

prediction_df = pd.DataFrame()
prediction_df["Prediction"] = result[:5]
prediction_df["Label"] = test_df[test_df.columns[0]].iloc[:5].values
prediction_df

Unnamed: 0,Prediction,Label
0,0.0098030250519514,0
1,0.0061666797846555,0
2,0.3982780873775482,0
3,0.0223251190036535,0
4,0.0261822752654552,0


Because data capture was set up in the endpoint configuration, you have a way to inspect what payload was sent to the endpoint alongside its response. The captured data takes some time to get fully uploaded to S3.

In [70]:
from sagemaker.s3 import S3Downloader
print("Waiting for captures to show up", end="")
for _ in range(90):
    capture_files = sorted(S3Downloader.list(f"{data_capture_uri}/{endpoint_name}"))
    if capture_files:
        capture_file = S3Downloader.read_file(capture_files[-1]).split("\n")
        capture_record = json.loads(capture_file[0])
        if "inferenceId" in capture_record["eventMetadata"]:
            break
    print(".", end="", flush=True)
    time.sleep(1)
print()
print(f"Found {len(capture_files)} Data Capture Files:")

Waiting for captures to show up..........................................................................................
Found 3 Data Capture Files:


### Verify that data is captured in Amazon S3

When we made some real-time predictions by sending data to our endpoint, we should have also captured that data for monitoring purposes. 

Let's list the data capture files stored in Amazon S3. Expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:

`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`

In [71]:
from time import sleep

current_endpoint_capture_prefix = '{}/{}'.format(data_capture_prefix, endpoint_name)
for _ in range(12): # wait up to a minute to see captures in S3
    capture_files = S3Downloader.list("s3://{}/{}".format(bucket, current_endpoint_capture_prefix))
    if capture_files:
        break
    sleep(5)

print("Found Data Capture Files:")
print(capture_files)

Found Data Capture Files:
['s3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/datacapture/xgboost-customer-chu-2022-09-16-07-12-01-702-ep/Alltraffic/2022/10/07/04/03-10-985-618dc692-6f89-40b7-b295-050a8d25b8a7.jsonl', 's3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/datacapture/xgboost-customer-chu-2022-09-16-07-12-01-702-ep/Alltraffic/2022/10/07/04/04-21-279-568647ac-7405-453e-901a-5947ddd9866b.jsonl', 's3://sagemaker-studio-us-west-2-917049230680/xgboost-churn/datacapture/xgboost-customer-chu-2022-09-16-07-12-01-702-ep/Alltraffic/2022/10/08/05/03-30-619-da332f99-d0cf-4654-aa38-ff7ba72d4459.jsonl']


All the data captured is stored in a SageMaker specific json-line formatted file. Next, Let's take a quick peek at the contents of a single line in a pretty formatted json so that we can observe the format a little better.

In [72]:
capture_file = S3Downloader.read_file(capture_files[-1])

print("=====Single Data Capture====")
print(json.dumps(json.loads(capture_file.split('\n')[0]), indent=2)[:2000])

=====Single Data Capture====
{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "MTMyLDI1LDExMy4yLDk2LDI2OS45LDEwNywyMjkuMSw4Nyw3LjEsNywyLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMSwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMSwwLDEsMCwwLDEKMTEyLDE3LDE4My4yLDk1LDI1Mi44LDEyNSwxNTYuNyw5NSw5LjcsMywwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMSwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMSwwLDEsMCwwLDEKOTEsMjQsOTMuNSwxMTIsMTgzLjQsMTI4LDI0MC43LDEzMyw5LjksMywwLDAsMCwwLDAsMCwwLDAsMCwwLDEsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwxLDAsMSwwLDEKMjIsMCwxMTAuMywxMDcsMTY2LjUsOTMsMjAyLjMsOTYsOS41LDUsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDAsMCwwLDEsMCwwLDAsMCwwLDAsMCwwLDAsMCwxLDAsMCwxLDAsMSwwCjEw

As you can see, each inference request is captured in one line in the jsonl file. The line contains both the input and output merged together. In our example, we provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, we expose the enconding that we used to encode the input and output payloads in the capture format with the `encoding` value.

To recap, we have observed how you can enable capturing the input and/or output payloads to an Endpoint with a new parameter. We have also observed how the captured format looks like in S3. Let's continue to explore how SageMaker helps with monitoring the data collected in S3.

---
## _Alternative deployment_

Ok, nice! We can train with SageMaker and then deploy in a managed endpoint with monitoring enabled.

But:

#### - What if I already have a model that was trained outside of SageMaker? How do I deploy it in SageMaker without training it previously?

#### - What if I need to preprocess the request before performing inference and then post process what my model just predicted. How can I customize the inference logic with a custom inference script?

# Exercise
### _[Challenge] Your turn!_

Deploy another model in SageMaker. Remember that the output of each training job was an artifact (tar.gz file with the model and other configurations) that was saved in S3.

1. Pick one of this models in S3 or upload another one from your laptop to S3. Then deploy it.
(If you haven't trained a model, pick the `model.tar.gz` in the `config` directory).

2. Add a custom inference script in your endpoint

To make things easiser, you can add a simple post-processing function add a new value to the output `"hello from post-processing function!!!` to the request.

So, if we send to our endpoint: 
```
186,0.1,137.8,97,187.7,118,146.4,85,8.7,6,1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.10,0.11,0.12,0.13,0.14,0.15,0.16,0.17,1.1,0.18,0.19,0.20,0.21,0.22,0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.30,0.31,0.32,0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42,0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.50,0.51,0.52,0.53,1.2,1.3,0.54,1.4,0.55
``` 

The output will be something like:
```
0.014719205908477306,"hello from post-processing"
```

Want a hint? [Look here](./solutions/b-hint1.md)

In [None]:
# YOUR SOLUTION HERE


---
# [You can now go to the lab 5-Monitoring](../../5-Monitoring/monitoring.ipynb)