# TensorFlow BYOM: Train with Custom Training Script, Compile with Neo, and Deploy on SageMaker

This notebook can be compared to [TensorFlow MNIST distributed training notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_distributed_mnist.ipynb) in terms of its functionality. We will do the same classification task, but this time we will compile the trained model using the Neo API backend, to optimize for our choice of hardware. Finally, we setup a real-time hosted endpoint in SageMaker for our compiled model using the Neo Deep Learning Runtime.

Add note on pink warnings!!!!!!

### Set up the environment

In [1]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

### Download the MNIST dataset

In [2]:
import utils
from tensorflow.contrib.learn.python.learn.datasets import mnist
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
data_sets = mnist.read_data_sets('data', dtype=tf.uint8, reshape=False, validation_size=5000)

utils.convert_to(data_sets.train, 'train', 'data')
utils.convert_to(data_sets.validation, 'validation', 'data')
utils.convert_to(data_sets.test, 'test', 'data')

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Writing data/train.tfrecords
Writing data/validation.tfrecords
Writing data/test.tfrecords


### Upload the data
We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job.

In [3]:
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/DEMO-mnist')

# Construct a script for distributed training 
Here is the full code for the network model:

In [4]:
!pygmentize 'mnist.py'

[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mtensorflow[39;49;00m [34mas[39;49;00m [04m[36mtf[39;49;00m
[34mfrom[39;49;00m [04m[36mtensorflow.python.estimator.model_fn[39;49;00m [34mimport[39;49;00m ModeKeys [34mas[39;49;00m Modes

INPUT_TENSOR_NAME = [33m'[39;49;00m[33minputs[39;49;00m[33m'[39;49;00m
SIGNATURE_NAME = [33m'[39;49;00m[33mpredictions[39;49;00m[33m'[39;49;00m

LEARNING_RATE = [34m0.001[39;49;00m


[34mdef[39;49;00m [32mmodel_fn[39;49;00m(features, labels, mode, params):
    [37m# Input Layer[39;49;00m
    input_layer = tf.reshape(features[INPUT_TENSOR_NAME], [-[34m1[39;49;00m, [34m28[39;49;00m, [34m28[39;49;00m, [34m1[39;49;00m])

    [37m# Convolutional Layer #1[39;49;00m
    conv1 = tf.layers.conv2d(
        inputs=input_layer,
        filters=[34m32[39;49;00m,
        kernel_size=[[34m5[39;49;00m, [34m5[39;49;00m],
        padding=[33m'[39;49;00m[33msame[39;49;0

The script here is and adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/mnist). It provides a ```model_fn(features, labels, mode)```, which is used for training, evaluation and inference. See [TensorFlow MNIST distributed training notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_distributed_mnist.ipynb) for more details about the training script.

At the end of the training script, there are two additional functions, to be used with Neo Deep Learning Runtime:
* `neo_preprocess(payload, content_type)`: Function that takes in the payload and Content-Type of each incoming request and returns a NumPy array
* `neo_postprocess(result)`: Function that takes the prediction results produced by Deep Learining Runtime and returns the response body

## Create a training job using the sagemaker.TensorFlow estimator

In [13]:
from sagemaker.tensorflow import TensorFlow

mnist_estimator = TensorFlow(entry_point='mnist.py',
                             role=role,
                             framework_version='1.12.0',
                             training_steps=1000, 
                             evaluation_steps=100,
                             train_instance_count=2,
                             train_instance_type='ml.c4.xlarge',
                             base_job_name='neo-compile-test')

mnist_estimator.fit(inputs)

W1017 23:21:27.011585 140475592632128 estimator.py:293] tensorflow py2 container will be deprecated soon.


2019-10-17 23:21:27 Starting - Starting the training job...
2019-10-17 23:21:29 Starting - Launching requested ML instances......
2019-10-17 23:22:35 Starting - Preparing the instances for training......
2019-10-17 23:23:52 Downloading - Downloading input data
2019-10-17 23:23:52 Training - Downloading the training image..[32m2019-10-17 23:24:04,875 INFO - root - running container entrypoint[0m
[32m2019-10-17 23:24:04,875 INFO - root - starting train task[0m
[32m2019-10-17 23:24:04,888 INFO - container_support.training - Training starting[0m
[32mDownloading s3://sagemaker-us-east-1-497456752804/neo-compile-test-2019-10-17-23-21-27-014/source/sourcedir.tar.gz to /tmp/script.tar.gz[0m
[32m2019-10-17 23:24:07,588 INFO - tf_container - ----------------------TF_CONFIG--------------------------[0m
[32m2019-10-17 23:24:07,589 INFO - tf_container - {"environment": "cloud", "cluster": {"worker": ["algo-2:2222"], "ps": ["algo-1:2223", "algo-2:2223"], "master": ["algo-1:2222"]}, "task"


[32mInstructions for updating:[0m
[32mQueue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.[0m
[32mInstructions for updating:[0m
[32mQueue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.[0m
[32mInstructions for updating:[0m
[32mQueue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensors(tensor).repeat(num_epochs)`.[0m
[32mInstructions for updating:[0m
[32mTo construct input pipelines, use the `tf.data` module.[0m
[32mInstructions for updating:[0m
[32mTo construct input pipelines, use the `tf.data` module.[0m
[32mInstructions for updating:[0m
[32mQueue-based input pipeli

[31m2019-10-17 23:25:34,329 INFO - tensorflow - SavedModel written to: s3://sagemaker-us-east-1-497456752804/neo-compile-test-2019-10-17-23-21-27-014/checkpoints/export/Servo/1571354731/saved_model.pb[0m
[31m2019-10-17 23:25:34,443 INFO - tensorflow - Loss for final step: 0.001797853.[0m
[31m2019-10-17 23:25:34,747 INFO - tf_container - Downloaded saved model at /opt/ml/model/export/Servo/1571354731[0m

2019-10-17 23:27:00 Uploading - Uploading generated training model
2019-10-17 23:27:00 Completed - Training job completed
[32m2019-10-17 23:26:51,460 INFO - tf_container - master algo-1 is down, stopping parameter server[0m


The **```fit```** method will create a training job in two **ml.c4.xlarge** instances. The logs above will show the instances doing training, evaluation, and incrementing the number of **training steps**. 

In the end of the training, the training job will generate a saved model for TF serving.

# Deploy the trained model to prepare for predictions (the old way)

The deploy() method creates an endpoint which serves prediction requests in real-time.

In [15]:
mnist_predictor = mnist_estimator.deploy(initial_instance_count=1,
                                         instance_type='ml.m4.xlarge',endpoint_name='original-mnist-model-endpoint-v1')

W1018 03:04:40.925576 140475592632128 model.py:103] The Python 2 tensorflow images will be soon deprecated and may not be supported for newer upcoming versions of the tensorflow images.
Please set the argument "py_version='py3'" to use the Python 3 tensorflow image.
W1018 03:04:41.245883 140475592632128 session.py:783] Using already existing model: neo-compile-test-2019-10-17-23-21-27-014


--------------------------------------------------------------------------------------!

## Invoking the endpoint

# Compile and Deploy the trained model using Neo

Now the model is ready to be compiled by Neo to be optimized for our hardware of choice. We are using the  ``TensorFlowEstimator.compile_model`` method to do this. For this example, our target hardware is ``'ml_m4'``. You can changed these to other supported target hardware if you prefer.

## Compiling the model
The ``input_shape`` is the definition for the model's input tensor and ``output_path`` is where the compiled model will be stored in S3. **Important. If the following command result in a permission error, scroll up and locate the value of execution role returned by `get_execution_role()`. The role must have access to the S3 bucket specified in ``output_path``.**

## Compile for an EC2 target

In [16]:
target = 'ml_m4'
output_path = mnist_estimator.output_path + target
optimized_estimator_ml_m4 = mnist_estimator.compile_model(target_instance_family=target, 
                              input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.12.0')

W1018 03:11:55.960345 140475592632128 model.py:103] The Python 2 tensorflow images will be soon deprecated and may not be supported for newer upcoming versions of the tensorflow images.
Please set the argument "py_version='py3'" to use the Python 3 tensorflow image.


?.....!

## Compile for NVIDIA Jetson nano

In [17]:
target = 'jetson_nano'
output_path = mnist_estimator.output_path + target
optimized_estimator_nano = mnist_estimator.compile_model(target_instance_family=target, 
                              input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.12.0')

W1018 03:12:27.240407 140475592632128 model.py:103] The Python 2 tensorflow images will be soon deprecated and may not be supported for newer upcoming versions of the tensorflow images.
Please set the argument "py_version='py3'" to use the Python 3 tensorflow image.


?....!

W1018 03:12:53.606170 140475592632128 model.py:369] The instance type jetson_nano is not supported to deploy via SageMaker,please deploy the model manually.


## Compile for Raspberry Pi

In [18]:
target = 'rasp3b'
output_path = mnist_estimator.output_path + target
optimized_estimator_rpi = mnist_estimator.compile_model(target_instance_family=target, 
                              input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.12.0')

W1018 03:12:53.641491 140475592632128 model.py:103] The Python 2 tensorflow images will be soon deprecated and may not be supported for newer upcoming versions of the tensorflow images.
Please set the argument "py_version='py3'" to use the Python 3 tensorflow image.


?....!

W1018 03:13:19.807258 140475592632128 model.py:369] The instance type rasp3b is not supported to deploy via SageMaker,please deploy the model manually.


### Compiled model summary

In [19]:
def get_model_size(estimator):
    out= !aws s3 ls {estimator.model_data} --human-readable
    return out[0].split(' ')[-3]+' MB'

In [21]:
import pandas as pd

estimators = [mnist_estimator, optimized_estimator_ml_m4, optimized_estimator_rpi, optimized_estimator_nano] 
targets = ['Original','EC2 M4','Raspberry Pi','Jetson Nano']
locations = [e.model_data for e in estimators]
sizes = [get_model_size(e) for e in estimators]
pd.set_option('display.max_colwidth', 0)
pd.DataFrame(list(zip(targets,locations,sizes)), columns =['Targets', 'Locations','Sizes']) 

Unnamed: 0,Targets,Locations,Sizes
0,Original,s3://sagemaker-us-east-1-497456752804/neo-compile-test-2019-10-17-23-21-27-014/output/model.tar.gz,11.6 MB
1,EC2 M4,s3://sagemaker-us-east-1-497456752804/ml_m4/model-ml_m4.tar.gz,11.6 MB
2,Raspberry Pi,s3://sagemaker-us-east-1-497456752804/rasp3b/model-rasp3b.tar.gz,11.6 MB
3,Jetson Nano,s3://sagemaker-us-east-1-497456752804/jetson_nano/model-jetson_nano.tar.gz,11.6 MB


## Deploying the compiled model

In [22]:
optimized_predictor = optimized_estimator_ml_m4.deploy(initial_instance_count = 1,
                                                 instance_type = 'ml.m4.xlarge', endpoint_name='compiled-m4-mnist-model-endpoint-v1')

--------------------------------------------------------------------------------------------------!

In [23]:
def numpy_bytes_serializer(data):
    f = io.BytesIO()
    np.save(f, data)
    f.seek(0)
    return f.read()

optimized_predictor.content_type = 'application/vnd+python.numpy+binary'
optimized_predictor.serializer = numpy_bytes_serializer

## Invoking the endpoints to get average latency stats

### Original model on M4

In [98]:
%%time
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

for i in range(100):
    data = mnist.test.images[i].tolist()
    tensor_proto = tf.make_tensor_proto(values=np.asarray(data), shape=[1, len(data)], dtype=tf.float32)
    predict_response = mnist_predictor.predict(tensor_proto)
    
    print("========================================")
    label = np.argmax(mnist.test.labels[i])
    print("label is {}".format(label))
    prediction = predict_response['outputs']['classes']['int64_val'][0]
    print("prediction is {}".format(prediction))

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
label is 7
prediction is 7
label is 2
prediction is 2
label is 1
prediction is 1
label is 0
prediction is 0
label is 4
prediction is 4
label is 1
prediction is 1
label is 4
prediction is 4
label is 9
prediction is 9
label is 5
prediction is 5
label is 9
prediction is 9
label is 0
prediction is 0
label is 6
prediction is 6
label is 9
prediction is 9
label is 0
prediction is 0
label is 1
prediction is 1
label is 5
prediction is 5
label is 9
prediction is 9
label is 7
prediction is 7
label is 3
prediction is 3
label is 4
prediction is 4
label is 9
prediction is 9
label is 6
prediction is 6
label is 6
prediction is 6
label is 5
prediction is 5
label is 4
prediction is 4
label is 0
prediction is 0
label is 7
prediction is 7
label is 4
prediction is 4
label is 0
prediction is 0
label is 1
prediction is 1


### Compiled M4 model on M4

In [101]:
%%time
from tensorflow.examples.tutorials.mnist import input_data
from IPython import display
import PIL.Image
import io

mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

for i in range(100):
    data = mnist.test.images[i]
    # Display imageM
    #im = PIL.Image.fromarray(data.reshape((28,28))*255).convert('L')
    #display.display(im)
    # Invoke endpoint with image
    predict_response = optimized_predictor.predict(data)
    
    print("========================================")
    label = np.argmax(mnist.test.labels[i])
    print("label is {}".format(label))
    prediction = np.argmax(predict_response)
    print("prediction is {}".format(prediction))

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
label is 7
prediction is 7
label is 2
prediction is 2
label is 1
prediction is 1
label is 0
prediction is 0
label is 4
prediction is 4
label is 1
prediction is 1
label is 4
prediction is 4
label is 9
prediction is 9
label is 5
prediction is 5
label is 9
prediction is 9
label is 0
prediction is 0
label is 6
prediction is 6
label is 9
prediction is 9
label is 0
prediction is 0
label is 1
prediction is 1
label is 5
prediction is 5
label is 9
prediction is 9
label is 7
prediction is 7
label is 3
prediction is 3
label is 4
prediction is 4
label is 9
prediction is 9
label is 6
prediction is 6
label is 6
prediction is 6
label is 5
prediction is 5
label is 4
prediction is 4
label is 0
prediction is 0
label is 7
prediction is 7
label is 4
prediction is 4
label is 0
prediction is 0
label is 1
prediction is 1


## Go to Cloudwatch to view invocation metrics

In [95]:
url = "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2:graph=~(view~'timeSeries~stacked~false~metrics~(~(~'AWS*2fSageMaker~'ModelLatency~'EndpointName~'{}~'VariantName~'AllTraffic)~(~'...~'{}~'.~'.))~region~'us-east-1~start~'-PT15M~end~'P0D~stat~'Maximum~period~1);query=~'*7bAWS*2fSageMaker*2cEndpointName*2cVariantName*7d"
url = url.format(mnist_predictor.endpoint,optimized_predictor.endpoint)
print(url)

https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#metricsV2:graph=~(view~'timeSeries~stacked~false~metrics~(~(~'AWS*2fSageMaker~'ModelLatency~'EndpointName~'original-mnist-model-endpoint-v1~'VariantName~'AllTraffic)~(~'...~'compiled-m4-mnist-model-endpoint-v1~'.~'.))~region~'us-east-1~start~'-PT15M~end~'P0D~stat~'Maximum~period~1);query=~'*7bAWS*2fSageMaker*2cEndpointName*2cVariantName*7d


#### Click ^

## Deleting endpoint

In [None]:
sagemaker.Session().delete_endpoint(optimized_predictor.endpoint)
sagemaker.Session().delete_endpoint(mnist_predictor.endpoint)

## TFlite (Optional)

In [73]:
!aws s3 cp s3://sagemaker-us-east-1-497456752804/neo-compile-test-2019-10-17-23-21-27-014/output/model.tar.gz ./

download: s3://sagemaker-us-east-1-497456752804/neo-compile-test-2019-10-17-23-21-27-014/output/model.tar.gz to ./model.tar.gz


In [74]:
!rm -r compiled/

In [75]:
!mkdir compiled & tar -xvzf model.tar.gz --directory compiled

export/
export/Servo/
export/Servo/1571354731/
export/Servo/1571354731/variables/
export/Servo/1571354731/variables/variables.data-00000-of-00001
export/Servo/1571354731/variables/variables.index
export/Servo/1571354731/saved_model.pb


In [77]:
import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model('./compiled/export/Servo/1571354731/')
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)

13101152

In [85]:
!ls -lh compiled/export/

total 4.0K
drwxr-xr-x 3 ec2-user ec2-user 4.0K Oct 17 23:25 Servo


In [86]:
!ls -lh conv*

-rw-rw-r-- 1 ec2-user ec2-user 13M Oct 18 16:10 converted_model.tflite
