# Serving a TensorFlow Model as a REST Endpoint with TensorFlow Serving and SageMaker

We need to understand the application and business context to choose between real-time and batch predictions. Are we trying to optimize for latency or throughput? Does the application require our models to scale automatically throughout the day to handle cyclic traffic requirements? Do we plan to compare models in production through A/B tests?

If our application requires low latency, then we should deploy the model as a real-time API to provide super-fast predictions on single prediction requests over HTTPS. We can deploy, scale, and compare our model prediction servers with SageMaker Endpoints.

<img src="img/sagemaker-architecture.png" width="80%" align="left">

In [1]:
import boto3
import sagemaker
import pandas as pd

sess = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name="sagemaker", region_name=region)

In [2]:
%store -r training_job_name

In [3]:
try:
    training_job_name
    print("[OK]")
except NameError:
    print("+++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the notebooks in the previous TRAIN section before you continue.")
    print("+++++++++++++++++++++++++++++++")

[OK]


# Copy the Model to the Notebook

In [4]:
!aws s3 cp s3://$bucket/$training_job_name/output/model.tar.gz ./model.tar.gz

download: s3://sagemaker-us-east-1-117859797117/tensorflow-training-2021-04-05-11-23-57-968/output/model.tar.gz to ./model.tar.gz


In [5]:
!rm -rf ./model/

In [6]:
!mkdir -p ./model/
!tar -xvzf ./model.tar.gz -C ./model/

transformers/
transformers/fine-tuned/
transformers/fine-tuned/config.json
transformers/fine-tuned/tf_model.h5
test_data/
test_data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz
metrics/
metrics/confusion_matrix.png
metrics/evaluation.json
tensorflow/
tensorflow/saved_model/
tensorflow/saved_model/0/
tensorflow/saved_model/0/assets/
tensorflow/saved_model/0/saved_model.pb
tensorflow/saved_model/0/variables/
tensorflow/saved_model/0/variables/variables.index
tensorflow/saved_model/0/variables/variables.data-00000-of-00001
tensorboard/
tensorboard/train/
tensorboard/train/plugins/
tensorboard/train/plugins/profile/
tensorboard/train/plugins/profile/2021_04_05_11_28_55/
tensorboard/train/plugins/profile/2021_04_05_11_28_55/ip-10-0-249-156.ec2.internal.trace.json.gz
tensorboard/train/plugins/profile/2021_04_05_11_28_55/ip-10-0-249-156.ec2.internal.kernel_stats.pb
tensorboard/train/plugins/profile/2021_04_05_11_28_55/ip-10-0-249-156.ec2.internal.input_pipeline.pb
tensorboard/train/plugins

In [7]:
!saved_model_cli show --all --dir './model/tensorflow/saved_model/0/'

2021-04-06 08:03:14.214350: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-04-06 08:03:14.214396: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_ids'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 64)
        name: serving_defau

In [8]:
!saved_model_cli run --dir './model/tensorflow/saved_model/0/' --tag_set serve --signature_def serving_default \
    --input_exprs 'input_ids=np.zeros((1,64));input_mask=np.zeros((1,64))'

2021-04-06 08:03:25.980925: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-04-06 08:03:25.980972: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-04-06 08:03:27.440331: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-04-06 08:03:27.440379: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-04-06 08:03:27.440417: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (datascience-1-0-ml-t3-medium-1abf3407f667f989be9d86559395): /proc/driver/nvidia/version does not exist
2021-04-06 08:03:27

# Show `inference.py`

In [9]:
!pygmentize ./code/inference.py

[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36msubprocess[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m

[37m# subprocess.check_call([sys.executable, "-m", "pip", "uninstall","-y", "tensorflow"])[39;49;00m
subprocess.check_call([sys.executable, [33m"[39;49;00m[33m-m[39;49;00m[33m"[39;49;00m, [33m"[39;49;00m[33mpip[39;49;00m[33m"[39;49;00m, [33m"[39;49;00m[33minstall[39;49;00m[33m"[39;49;00m, [33m"[39;49;00m[33mtensorflow[39;49;00m[33m"[39;49;00m])
subprocess.check_call([sys.executable, [33m"[39;49;00m[33m-m[39;49;00m[33m"[39;49;00m, [33m"[39;49;00m[33mpip[39;49;00m[33m"[39;49;00m, [33m"[39;49;00m[33minstall[39;49;00m[33m"[39;49;00m, [33m"[39;49;00m[33mtransformers==4.1.1[39;49;00m[33m"[39;49;00m])
[37m# Workaround for https://github.com/huggingface/tokenizers/issues/120 and[39;49;00m
[37m#                https://github.com/kaushaltrivedi/fast-bert/issues/174[39;49;00m
[37m# subprocess

# Deploy the Model
This will create a default `EndpointConfig` with a single model.  

The next notebook will demonstrate how to perform more advanced `EndpointConfig` strategies to support canary rollouts and A/B testing.

_Note:  If not using a US-based region, you may need to adapt the container image to your current region using the following table:_

https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html

In [10]:
import time

timestamp = int(time.time())

tensorflow_model_name = "{}-{}-{}".format(training_job_name, "tf", timestamp)

print(tensorflow_model_name)

tensorflow-training-2021-04-05-11-23-57-968-tf-1617696213


In [11]:
from sagemaker.tensorflow.estimator import TensorFlow

estimator = TensorFlow.attach(training_job_name=training_job_name)


2021-04-05 12:07:37 Starting - Preparing the instances for training
2021-04-05 12:07:37 Downloading - Downloading input data
2021-04-05 12:07:37 Training - Training image download completed. Training in progress.
2021-04-05 12:07:37 Uploading - Uploading generated training model
2021-04-05 12:07:37 Completed - Training job completed


In [12]:
# requires enough disk space for tensorflow, transformers, and bert downloads
instance_type = "ml.m4.xlarge"

In [2]:
from sagemaker.tensorflow.model import TensorFlowModel

tensorflow_model = TensorFlowModel(
    name=tensorflow_model_name,
    source_dir="code",
    entry_point="inference.py",
    model_data="s3://{}/{}/output/model.tar.gz".format(bucket, training_job_name),
    role=role,
    framework_version="2.3.1",
)

In [14]:
tensorflow_endpoint_name = "{}-{}-{}".format(training_job_name, "tf", timestamp)

print(tensorflow_endpoint_name)

tensorflow-training-2021-04-05-11-23-57-968-tf-1617696213


In [15]:
tensorflow_model.deploy(
    endpoint_name=tensorflow_endpoint_name,
    initial_instance_count=1,  # Should use >=2 for high(er) availability
    instance_type=instance_type,
    wait=False,
)

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


<sagemaker.tensorflow.model.TensorFlowPredictor at 0x7f864822d710>

In [16]:
from IPython.core.display import display, HTML

display(
    HTML(
        '<b>Review <a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={}#/endpoints/{}">SageMaker REST Endpoint</a></b>'.format(
            region, tensorflow_endpoint_name
        )
    )
)

# _Wait Until the Endpoint is Deployed_

In [None]:
%%time

waiter = sm.get_waiter("endpoint_in_service")
waiter.wait(EndpointName=tensorflow_endpoint_name)

# _Wait Until the ^^ Endpoint ^^ is Deployed_

In [18]:
tensorflow_endpoint_arn = sm.describe_endpoint(EndpointName=tensorflow_endpoint_name)["EndpointArn"]
print(tensorflow_endpoint_arn)

arn:aws:sagemaker:us-east-1:117859797117:endpoint/tensorflow-training-2021-04-05-11-23-57-968-tf-1617696213


# Show the Experiment Tracking Lineage

In [19]:
from sagemaker.lineage.visualizer import LineageTableVisualizer

lineage_table_viz = LineageTableVisualizer(sess)
lineage_table_viz_df = lineage_table_viz.show(endpoint_arn=tensorflow_endpoint_arn)
lineage_table_viz_df

Unnamed: 0,Name/Source,Direction,Type,Association Type,Lineage Type
0,tensorflow-training-2021-04-05-11-23-57-968-tf...,Input,ModelDeployment,AssociatedWith,action


# Test the Deployed Model

In [20]:
import json
from sagemaker.tensorflow.model import TensorFlowPredictor
from sagemaker.serializers import JSONLinesSerializer
from sagemaker.deserializers import JSONLinesDeserializer

predictor = TensorFlowPredictor(
    endpoint_name=tensorflow_endpoint_name,
    sagemaker_session=sess,
    model_name="saved_model",
    model_version=0,
    content_type="application/jsonlines",
    accept_type="application/jsonlines",
    serializer=JSONLinesSerializer(),
    deserializer=JSONLinesDeserializer(),
)

content_type is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


# Wait for the Endpoint to Settle Down

In [None]:
import time

time.sleep(30)

# Predict the `star_rating` with Ad Hoc `review_body` Samples

In [21]:
inputs = [{"features": ["This is great!"]}, {"features": ["This is bad."]}]

predicted_classes = predictor.predict(inputs)

for predicted_class in predicted_classes:
    print("Predicted star_rating: {}".format(predicted_class))

Predicted star_rating: {'predicted_label': 5}
Predicted star_rating: {'predicted_label': 1}


# Predict the `star_rating` with `review_body` Samples from our TSV's

In [22]:
import csv

df_reviews = pd.read_csv(
    "./data/amazon_reviews_us_Digital_Software_v1_00.tsv.gz",
    delimiter="\t",
    quoting=csv.QUOTE_NONE,
    compression="gzip",
)
df_sample_reviews = df_reviews[["review_body", "star_rating"]].sample(n=5)
df_sample_reviews = df_sample_reviews.reset_index()
df_sample_reviews.shape

(5, 3)

In [None]:
import pandas as pd


def predict(review_body):
    inputs = [{"features": [review_body]}]
    predicted_classes = predictor.predict(inputs)
    return predicted_classes[0]["predicted_label"]


df_sample_reviews["predicted_class"] = df_sample_reviews["review_body"].map(predict)
df_sample_reviews.head(5)

# Save for Next Notebook(s)

In [None]:
%store tensorflow_model_name

In [None]:
%store tensorflow_endpoint_name

In [None]:
%store tensorflow_endpoint_arn

In [None]:
%store

# Release Resources
To save cost, we should delete the endpoint.

In [None]:
# sm.delete_endpoint(
#      EndpointName=tensorflow_endpoint_name
# )

In [None]:
%%html

<p><b>Shutting down your kernel for this notebook to release resources.</b></p>
<button class="sm-command-button" data-commandlinker-command="kernelmenu:shutdown" style="display:none;">Shutdown Kernel</button>
        
<script>
try {
    els = document.getElementsByClassName("sm-command-button");
    els[0].click();
}
catch(err) {
    // NoOp
}    
</script>

In [None]:
%%javascript

try {
    Jupyter.notebook.save_checkpoint();
    Jupyter.notebook.session.delete();
}
catch(err) {
    // NoOp
}

# Internal - DO NOT RUN - WILL REMOVE SOON


In [None]:
# %%bash

# # without split:  tensorflow-training-2021-01-27-02-29-07-903-tf-1611724084
# # with split:  tensorflow-training-2021-01-28-01-19-50-987-tf-1611799952

# aws sagemaker-runtime invoke-endpoint \
#     --endpoint-name "tensorflow-training-2021-01-28-01-19-50-987-tf-1611799952" \
#     --content-type application/jsonlines \
#     --accept application/jsonlines \
#     --body $'{"features":["Amazon gift cards are the best"]}\n{"features":["It is the worst"]}' >(cat) 1>/dev/null

In [None]:
# !rm model.tar.gz
# !aws s3 cp s3://sagemaker-us-east-1-835319576252/tensorflow-training-2021-01-28-01-19-50-987/output/model.tar.gz ./

In [None]:
# !rm -rf ./model
# !mkdir -p  ./model
# !tar -xvzf ./model.tar.gz -C model/

In [None]:
# !cp ./code/inference.py model/code/

In [None]:
# !cat model/code/inference.py