# Compare original and Compiled models

## Step 1: First start by downloading them ...

In [None]:
# Retrieve the stored variables (variables stored in Lab1)
#  The variables represent a string format of the S3 location where the model is stored:
#       Example: s3://sagemaker-us-east-1-<accountnumber>/ml_m4/model-ml_m4.tar.gz

%store -r model_optimized
%store -r model_original

print('S3 Location for Optimized Model:', model_optimized)
print('S3 Location for Original Model:', model_original)

In [None]:
#Copy the model objects to storage local to this notebook instance
!aws s3 cp {model_optimized} ./

In [None]:
!aws s3 cp {model_original} ./

In [None]:
# Create directory and extract original model
!mkdir original & tar -xzvf model.tar.gz -C original

In [None]:
# Create directory and extract compiled model
!mkdir compiled & tar -xzvf model-ml_m4.tar.gz -C compiled

## Step 2: Local inference - original model

**Upgrade TensorFlow on Notebook Instance:** We will upgrade to TF 2.0 to demonstrate how you can use saved_models from older (in this case, 1.18.0) versions.

In [None]:
!pip install --upgrade pip

In [None]:
!conda uninstall wrapt -y
!pip install tensorflow==2.0.0
!pip install opencv-python

In [None]:
import tensorflow as tf
import cv2
print(tf.__version__)
tf.get_logger().setLevel('ERROR')
tf.executing_eagerly()

### Load model and serving signature

Find the path to protobuf file (*pb) file for the original model & load it 

In [None]:
path = !find ./original/ -type f -name "*.pb"
path = path[0][:-14]
print('Notebook Instance Path to pb file:', path)

Load the saved model back into Python...

In [None]:
loaded = tf.saved_model.load(path)

View the serving signature of the original [saved model](https://www.tensorflow.org/guide/saved_model)...

In [None]:
!saved_model_cli show --dir {path} --tag_set serve --signature_def serving_default

In [None]:
print(list(loaded.signatures.keys())) 

Loading signatures and printing dictionary results returned..

In [None]:
infer = loaded.signatures["serving_default"]
print(infer.structured_outputs)

Load example image: Load and resize an image stored on this notebook instance under the /data directory.  

In [None]:
image = cv2.imread("data/cat.png", 1)
print(image.shape)
# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))
i = tf.image.convert_image_dtype(image.reshape(-1,32,32,3),tf.float32)

Check single inference from image loaded above.  The JSON reponse will be the probabilities of the image belonging to one of the 10 classes along with the most likely class the picture belongs to. The classes can be referenced from the [CIFAR-10 website](https://www.cs.toronto.edu/~kriz/cifar.html). Since we didn't train the model for that long, we aren't expecting very accurate results.

In [None]:
%%time
infer(i)['probabilities']

Get mean value using python [timeit](https://docs.python.org/3/library/timeit.html) module for measuring inference time on the original model.  We will use this value for comparison later...

In [None]:
time_original = %timeit -n25 -r25 -o infer(i)['probabilities']

## Step 3: Local inference - compiled model

DLR or Deep Learning Runtime is a part of Neo (https://github.com/neo-ai/neo-ai-dlr) is a compact, common runtime for deep learning models and decision tree models compiled by AWS SageMaker Neo, TVM, or Treelite. DLR uses the TVM runtime, Treelite runtime, NVIDIA TensorRT™, and can include other hardware-specific runtimes. DLR provides unified Python/C++ APIs for loading and running compiled models on various devices. DLR currently supports platforms from Intel, NVIDIA, and ARM, with support for Xilinx, Cadence, and Qualcomm coming soon.

Install Deep Learning Runtime...

In [None]:
!pip install dlr

Load the compiled model, defining its input and output ...

In [None]:
from dlr import DLRModel
import numpy
input_shape = {'data': [1, 3, 224, 224]} # A single RGB 224x224 image
output_shape = [1, 1000]                 # The probability for each one of the 1,000 classes
device = 'cpu'                           # Go, Raspberry Pi, go!

model = DLRModel(model_path='compiled')

Again, we are going to now check single inference against our compiled model using same image we used in our inference against the original model above.  We will first load & resize the image...

In [None]:
image = cv2.imread("data/cat.png", 1)
print(image.shape)
# resize, as our model is expecting images in 32x32.
image = cv2.resize(image, (32, 32))

input_data = {'Placeholder': numpy.asarray(image).astype(float).tolist()}

Now we'll check single a inference against our compiled model...

In [None]:
%%time
model.run(input_data)

We will again get mean value using python timeit module for measuring inference time on the compiled model. We will use this value for comparison against the original model. 

In [None]:
time_compiled = %timeit -n25 -r25 -o model.run(input_data)

In [None]:
o1 = float(str(time_compiled)[:4])

In [None]:
o2 = float(str(time_original)[:4])

### Lets see the results comparing our inference using a single image against our original model and our compiled model..

In [None]:
'Original Model: {} vs Compiled Model: {}ms ... {} x speedup!'.format(o2,o1,o2/o1)

# Thank you!