# TF-TRT Dynamic Mode Demo with TF 1.14, 1.15

In this notebook, we demonstrate the process to create a TF-TRT optimized model from a Tensorflow *saved model*.

This notebook was designed to run with TensorFlow versions 1.14, 1.15 which is included as part of NVIDIA NGC Tensorflow containers from `nvcr.io/nvidia/tensorflow:19.07-py3` to `nvcr.io/nvidia/tensorflow:19.12-tf1-py3` that can be downloaded from the [NGC website](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow).
 

## Notebook  Content
1. [Pre-requisite: data and model](#1)
1. [Verifying the orignal model](#2)
1. [Creating TF-TRT FP16 model - Dynamic mode](#3)
1. [Creating TF-TRT FP16 model - Static mode](#4)
 
## Quick start
We will run this demonstration with a saved Resnet-v1-50 model to be downloaded and stored at `/path/to/saved_model`.

The INT8 calibration process requires access to a small but representative sample of real training or valiation data.

We will use the ImageNet dataset that is stored in TFrecords format. Google provide an excellent all-in-one script for downloading and preparing the ImageNet dataset at 

https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_imagenet.sh.


To run this notebook, start the NGC TF container, providing correct path to the ImageNet validation data `/path/to/image_net` and the folder `/path/to/saved_model` containing the TF saved model:

```bash
nvidia-docker run --rm -it -p 8888:8888 -v /path/to/image_net:/data  -v /path/to/saved_model:/saved_model --name TFTRT nvcr.io/nvidia/tensorflow:19.12-tf1-py3
```

Within the container, we then start Jupyter notebook with:

```bash
jupyter notebook --ip 0.0.0.0 --port 8888  --allow-root
```

Connect to Jupyter notebook web interface on your host http://localhost:8888.


<a id="1"></a>
## 1. Pre-requisite: data and model

We first install some extra packages and external dependencies needed for, e.g. preprocessing ImageNet data. 

In [None]:
%%bash 
bash ../object_detection/install_dependencies.sh;

In [2]:
import os
import time
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
print("TensorFlow version: ", tf.__version__)

from tensorflow.python.compiler.tensorrt import trt_convert as trt


import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

config = tf.ConfigProto()
config.gpu_options.allow_growth=True

# check TensorRT version
print("TensorRT version")
!dpkg -l | grep nvinfer

TensorFlow version:  1.15.0
TensorRT version
ii  libnvinfer-bin                         6.0.1-1+cuda10.2                  amd64        TensorRT binaries
ii  libnvinfer-dev                         6.0.1-1+cuda10.2                  amd64        TensorRT development libraries and headers
ii  libnvinfer-plugin-dev                  6.0.1-1+cuda10.2                  amd64        TensorRT plugin libraries
ii  libnvinfer-plugin6                     6.0.1-1+cuda10.2                  amd64        TensorRT plugin libraries
ii  libnvinfer6                            6.0.1-1+cuda10.2                  amd64        TensorRT runtime libraries
ii  python3-libnvinfer                     6.0.1-1+cuda10.2                  amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                 6.0.1-1+cuda10.2                  amd64        Python 3 development package for TensorRT


### Data
We verify that the correct ImageNet data folder has been mounted and validation data files of the form `validation-00xxx-of-00128` are available.

In [3]:
def get_files(data_dir, filename_pattern):
    if data_dir == None:
        return []
    files = tf.gfile.Glob(os.path.join(data_dir, filename_pattern))
    if files == []:
        raise ValueError('Can not find any files in {} with '
                         'pattern "{}"'.format(data_dir, filename_pattern))
    return files

In [4]:
VALIDATION_DATA_DIR = "/CDR/ImageNet/train-val-tfrecord"
validation_files = get_files(VALIDATION_DATA_DIR, 'validation*')
print('There are %d validation files. \n%s\n%s\n...'%(len(validation_files), validation_files[0], validation_files[-1]))


There are 128 validation files. 
/CDR/ImageNet/train-val-tfrecord/validation-00041-of-00128
/CDR/ImageNet/train-val-tfrecord/validation-00052-of-00128
...


### TF saved model
If not already downloaded, we will be downloading and working with a ResNet-50 v1.5 saved model from https://ngc.nvidia.com/catalog/models/nvidia:rntf_fp16/files

In [None]:
%%bash
rm resnet_model.zip
wget -nc -q --show-progress -O resnet_model.zip \
"https://api.ngc.nvidia.com/v2/models/nvidia/rntf_fp16/versions/1/zip"
unzip -o ./resnet_model.zip -d resnet_model

### Helper functions
We define a few helper functions to read and preprocess Imagenet data from TFRecord files. 

In [6]:
def deserialize_image_record(record):
    feature_map = {
        'image/encoded':          tf.FixedLenFeature([ ], tf.string, ''),
        'image/class/label':      tf.FixedLenFeature([1], tf.int64,  -1),
        'image/class/text':       tf.FixedLenFeature([ ], tf.string, ''),
        'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32)
    }
    with tf.name_scope('deserialize_image_record'):
        obj = tf.parse_single_example(record, feature_map)
        imgdata = obj['image/encoded']
        label   = tf.cast(obj['image/class/label'], tf.int32)
        bbox    = tf.stack([obj['image/object/bbox/%s'%x].values
                            for x in ['ymin', 'xmin', 'ymax', 'xmax']])
        bbox = tf.transpose(tf.expand_dims(bbox, 0), [0,2,1])
        text    = obj['image/class/text']
        return imgdata, label, bbox, text

In [7]:
import sys

!git clone https://github.com/NVIDIA/DeepLearningExamples
sys.path.insert(0,'./DeepLearningExamples/TensorFlow/Classification/RN50v1.5/utils')

from image_processing import preprocess_image_record
def preprocess(record):
    # Parse TFRecord
    image, label = preprocess_image_record(record, 224, 224, 3, is_training=False)
    return image, label

fatal: destination path 'DeepLearningExamples' already exists and is not an empty directory.


<a id="2"></a>
## 2. Verifying the original model
We demonstrate the conversion process with a Resnet-50 v1 model. First, we inspect the original Tensorflow model.

In [8]:
SAVED_MODEL_DIR =  "./resnet_model/SavedModel"

We employ `saved_model_cli` to inspect the inputs and outputs of the model.

In [9]:
!saved_model_cli show --all --dir $SAVED_MODEL_DIR

2020-01-14 04:42:40.895629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: input_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT32
        shape: (-1)
        name: ArgMax:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1001)
        name: resnet50_v1.5/output/softmax:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: in

This give us information on the input and output tensors as `input_tensor:0` and `softmax_tensor:0` respectively. Also note that the number of output classes here is 1001 instead of 1000 Imagenet classes. This is because the network was trained with an extra background class. 

In [10]:
INPUT_TENSOR = 'input_tensor:0'
OUTPUT_TENSOR = 'resnet50_v1.5/output/softmax:0'

Next, we define a function to read in a saved mode, measuring its speed and accuracy on the validation data.

In [11]:
def benchmark_saved_model(SAVED_MODEL_DIR, BATCH_SIZE=64):
    with tf.Session(graph=tf.Graph(), config=config) as sess:
        # prepare dataset iterator
        dataset = tf.data.TFRecordDataset(validation_files)    
        dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=preprocess, batch_size=BATCH_SIZE, num_parallel_calls=20))

        iterator = dataset.make_one_shot_iterator()
        next_element = iterator.get_next()

        tf.saved_model.loader.load(
            sess, [tf.saved_model.tag_constants.SERVING], SAVED_MODEL_DIR)

        print('Warming up for 50 batches...')
        for _ in range (50):
            sess.run(OUTPUT_TENSOR, feed_dict={INPUT_TENSOR: sess.run(next_element)[0]})

        print('Benchmarking inference engine at batch size %d...'%BATCH_SIZE)
        num_hits = 0
        num_predict = 0
        start_time = time.time()
        try:
            while True:        
                image_data = sess.run(next_element)    
                img = image_data[0]
                label = image_data[1].squeeze()
                output = sess.run([OUTPUT_TENSOR], feed_dict={INPUT_TENSOR: img})            
                prediction = np.argmax(output[0], axis=1)
                num_hits += np.sum(prediction == label)
                num_predict += len(prediction)
        except tf.errors.OutOfRangeError as e:
            pass

        print('Accuracy: %.2f%%'%(100*num_hits/num_predict))
        print('Inference speed: %.2f samples/s'%(num_predict/(time.time()-start_time)))

In [12]:
BATCH_SIZE = 64

benchmark_saved_model(SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

Warming up for 50 batches...
Benchmarking inference engine at batch size 64...
Accuracy: 76.17%
Inference speed: 1202.08 samples/s


<a id="4"></a>
## 4. Creating TF-TRT FP16 model - Dynamic mode

Next, we convert the native TF FP16 model to TF-TRT FP16, then verify model accuracy and inference speed.

In [13]:
BATCH_SIZE = 64

FP16_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"_TFTRT_FP16/1"
!rm -rf $FP16_SAVED_MODEL_DIR

# Now we create the TFTRT FP16 engine
converter = trt.TrtGraphConverter(input_saved_model_dir=SAVED_MODEL_DIR,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.FP16,
                                  is_dynamic_op=True)
converter.convert()
converter.save(FP16_SAVED_MODEL_DIR)

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

Warming up for 50 batches...
Benchmarking inference engine at batch size 64...
Accuracy: 76.17%
Inference speed: 1164.21 samples/s


In [14]:

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=128)

Warming up for 50 batches...
Benchmarking inference engine at batch size 128...
Accuracy: 76.26%
Inference speed: 1210.12 samples/s


In [15]:

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=32)

Warming up for 50 batches...
Benchmarking inference engine at batch size 32...
Accuracy: 76.19%
Inference speed: 1057.01 samples/s


<a id="4"></a>
## 4. Creating TF-TRT FP16 model - Static mode

Next, we convert the native TF FP16 model to TF-TRT FP16, then verify model accuracy and inference speed.

In [16]:
BATCH_SIZE = 64

FP16_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"_TFTRT_FP16/2"
!rm -rf $FP16_SAVED_MODEL_DIR

# Now we create the TFTRT FP16 engine
converter = trt.TrtGraphConverter(input_saved_model_dir=SAVED_MODEL_DIR,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.FP16,
                                  is_dynamic_op=False)
converter.convert()
converter.save(FP16_SAVED_MODEL_DIR)

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

Warming up for 50 batches...
Benchmarking inference engine at batch size 64...
Accuracy: 76.18%
Inference speed: 3971.00 samples/s


In [17]:

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=128)

Warming up for 50 batches...
Benchmarking inference engine at batch size 128...
Accuracy: 76.26%
Inference speed: 1233.73 samples/s


In [18]:

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=64)

Warming up for 50 batches...
Benchmarking inference engine at batch size 64...
Accuracy: 76.18%
Inference speed: 3929.13 samples/s


In [19]:
benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=8)

Warming up for 50 batches...
Benchmarking inference engine at batch size 8...
Accuracy: 76.27%
Inference speed: 1577.06 samples/s


In [20]:
benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=32)

Warming up for 50 batches...
Benchmarking inference engine at batch size 32...
Accuracy: 76.20%
Inference speed: 3060.27 samples/s


In [21]:
benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=96)

Warming up for 50 batches...
Benchmarking inference engine at batch size 96...
Accuracy: 76.24%
Inference speed: 1222.90 samples/s


## Conclusion
In this notebook, we have demonstrated the process of creating TF-TRT inference model from an original TF FP32 *saved model*. In every case, we have also verified the accuracy and speed to the resulting model. 
