# TF-TRT Dynamic Mode Demo with TF 1.14, 1.15

In this notebook, we demonstrate the process to create a TF-TRT optimized model from a Tensorflow *saved model*.

This notebook was designed to run with TensorFlow versions 1.14, 1.15 which is included as part of NVIDIA NGC Tensorflow containers from `nvcr.io/nvidia/tensorflow:19.07-py3` to `nvcr.io/nvidia/tensorflow:19.12-tf1-py3` that can be downloaded from the [NGC website](https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow).
 

## Notebook  Content
1. [Pre-requisite: data and model](#1)
1. [Verifying the orignal FP32 model](#2)
1. [Creating TF-TRT FP32 model](#3)
1. [Creating TF-TRT FP16 model](#4)
1. [Creating TF-TRT INT8 model](#5)
1. [Calibrating TF-TRT INT8 model with raw JPEG images](#6)
 
## Quick start
We will run this demonstration with a saved Resnet-v1-50 model to be downloaded and stored at `/path/to/saved_model`.

The INT8 calibration process requires access to a small but representative sample of real training or valiation data.

We will use the ImageNet dataset that is stored in TFrecords format. Google provide an excellent all-in-one script for downloading and preparing the ImageNet dataset at 

https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_imagenet.sh.


To run this notebook, start the NGC TF container, providing correct path to the ImageNet validation data `/path/to/image_net` and the folder `/path/to/saved_model` containing the TF saved model:

```bash
nvidia-docker run --rm -it -p 8888:8888 -v /path/to/image_net:/data  -v /path/to/saved_model:/saved_model --name TFTRT nvcr.io/nvidia/tensorflow:19.12-tf1-py3
```

Within the container, we then start Jupyter notebook with:

```bash
jupyter notebook --ip 0.0.0.0 --port 8888  --allow-root
```

Connect to Jupyter notebook web interface on your host http://localhost:8888.


<a id="1"></a>
## 1. Pre-requisite: data and model

We first install some extra packages and external dependencies needed for, e.g. preprocessing ImageNet data. 

In [1]:
%%bash 
bash ../object_detection/install_dependencies.sh;

Setup local variables...
Download protobuf...
/data2/tf-tensorrt/tftrt/examples/image-classification/protoc /data2/tf-tensorrt/tftrt/examples/image-classification
Archive:  protoc-3.7.1-linux-x86_64.zip
  inflating: include/google/protobuf/wrappers.proto  
  inflating: include/google/protobuf/field_mask.proto  
  inflating: include/google/protobuf/api.proto  
  inflating: include/google/protobuf/struct.proto  
  inflating: include/google/protobuf/descriptor.proto  
  inflating: include/google/protobuf/timestamp.proto  
  inflating: include/google/protobuf/compiler/plugin.proto  
  inflating: include/google/protobuf/empty.proto  
  inflating: include/google/protobuf/any.proto  
  inflating: include/google/protobuf/source_context.proto  
  inflating: include/google/protobuf/type.proto  
  inflating: include/google/protobuf/duration.proto  
  inflating: bin/protoc              
  inflating: readme.txt              
/data2/tf-tensorrt/tftrt/examples/image-classification
Compile object dete


echo Download protobuf...
mkdir -p $PROTOC_DIR
pushd $PROTOC_DIR
ARCH=$(uname -m)
if [ "$ARCH" == "aarch64" ] ; then
  filename="protoc-3.7.1-linux-aarch_64.zip"
elif [ "$ARCH" == "x86_64" ] ; then
  filename="protoc-3.7.1-linux-x86_64.zip"
elif [ "$ARCH" == "ppc64le" ] ; then
  filename="protoc-3.7.1-linux-ppcle_64.zip"
else
  echo ERROR: $ARCH not supported.
  exit 1;
fi
wget --no-check-certificate ${PROTO_BASE_URL}${filename}
--2020-01-14 01:20:14--  https://github.com/google/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip [following]
--2020-01-14 01:20:15--  https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip
Reusing existing con

In [2]:
import os
import time
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
print("TensorFlow version: ", tf.__version__)

from tensorflow.python.compiler.tensorrt import trt_convert as trt


import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"

config = tf.ConfigProto()
config.gpu_options.allow_growth=True

# check TensorRT version
print("TensorRT version")
!dpkg -l | grep nvinfer

TensorFlow version:  1.15.0
TensorRT version
ii  libnvinfer-bin                         6.0.1-1+cuda10.2                  amd64        TensorRT binaries
ii  libnvinfer-dev                         6.0.1-1+cuda10.2                  amd64        TensorRT development libraries and headers
ii  libnvinfer-plugin-dev                  6.0.1-1+cuda10.2                  amd64        TensorRT plugin libraries
ii  libnvinfer-plugin6                     6.0.1-1+cuda10.2                  amd64        TensorRT plugin libraries
ii  libnvinfer6                            6.0.1-1+cuda10.2                  amd64        TensorRT runtime libraries
ii  python3-libnvinfer                     6.0.1-1+cuda10.2                  amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                 6.0.1-1+cuda10.2                  amd64        Python 3 development package for TensorRT


### Data
We verify that the correct ImageNet data folder has been mounted and validation data files of the form `validation-00xxx-of-00128` are available.

In [3]:
def get_files(data_dir, filename_pattern):
    if data_dir == None:
        return []
    files = tf.gfile.Glob(os.path.join(data_dir, filename_pattern))
    if files == []:
        raise ValueError('Can not find any files in {} with '
                         'pattern "{}"'.format(data_dir, filename_pattern))
    return files

In [4]:
VALIDATION_DATA_DIR = "/CDR/ImageNet/train-val-tfrecord"
validation_files = get_files(VALIDATION_DATA_DIR, 'validation*')
print('There are %d validation files. \n%s\n%s\n...'%(len(validation_files), validation_files[0], validation_files[-1]))


There are 128 validation files. 
/CDR/ImageNet/train-val-tfrecord/validation-00041-of-00128
/CDR/ImageNet/train-val-tfrecord/validation-00052-of-00128
...


### TF saved model
If not already downloaded, we will be downloading and working with a ResNet-50 v1.5 saved model from https://ngc.nvidia.com/catalog/models/nvidia:rntf_fp16/files

In [5]:
%%bash
rm resnet_model.zip
wget -nc -q --show-progress -O resnet_model.zip \
"https://api.ngc.nvidia.com/v2/models/nvidia/rntf_fp16/versions/1/zip"
unzip -o ./resnet_model.zip -d resnet_model

Archive:  ./resnet_model.zip
  inflating: resnet_model/SavedModel/saved_model.pb  
  inflating: resnet_model/SavedModel/variables/variables.data-00000-of-00002  
  inflating: resnet_model/SavedModel/variables/variables.data-00001-of-00002  
  inflating: resnet_model/SavedModel/variables/variables.index  
  inflating: resnet_model/checkpoint  
  inflating: resnet_model/graph.pbtxt  
  inflating: resnet_model/nvidia_rntf_fp16_190822.ckpt.data-00000-of-00002  
  inflating: resnet_model/nvidia_rntf_fp16_190822.ckpt.data-00001-of-00002  
  inflating: resnet_model/nvidia_rntf_fp16_190822.ckpt.index  
  inflating: resnet_model/nvidia_rntf_fp16_190822.ckpt.meta  



     0K .......... .......... .......... .......... ..........  0% 2.30M 1m58s
    50K .......... .......... .......... .......... ..........  0% 2.30M 1m58s
   100K .......... .......... .......... .......... ..........  0% 2.31M 1m58s
   150K .......... .......... .......... .......... ..........  0%  111M 89s
   200K .......... .......... .......... .......... ..........  0%  120M 72s
   250K .......... .......... .......... .......... ..........  0% 2.40M 79s
   300K .......... .......... .......... .......... ..........  0%  155M 68s
   350K .......... .......... .......... .......... ..........  0%  147M 60s
   400K .......... .......... .......... .......... ..........  0%  153M 53s
   450K .......... .......... .......... .......... ..........  0%  159M 48s
   500K .......... .......... .......... .......... ..........  0% 2.39M 54s
   550K .......... .......... .......... .......... ..........  0%  150M 50s
   600K .......... .......... .......... .......... ..........  0%  1

### Helper functions
We define a few helper functions to read and preprocess Imagenet data from TFRecord files. 

In [6]:
def deserialize_image_record(record):
    feature_map = {
        'image/encoded':          tf.FixedLenFeature([ ], tf.string, ''),
        'image/class/label':      tf.FixedLenFeature([1], tf.int64,  -1),
        'image/class/text':       tf.FixedLenFeature([ ], tf.string, ''),
        'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32)
    }
    with tf.name_scope('deserialize_image_record'):
        obj = tf.parse_single_example(record, feature_map)
        imgdata = obj['image/encoded']
        label   = tf.cast(obj['image/class/label'], tf.int32)
        bbox    = tf.stack([obj['image/object/bbox/%s'%x].values
                            for x in ['ymin', 'xmin', 'ymax', 'xmax']])
        bbox = tf.transpose(tf.expand_dims(bbox, 0), [0,2,1])
        text    = obj['image/class/text']
        return imgdata, label, bbox, text

In [7]:
from preprocessing import vgg_preprocessing
def preprocess(record):
        # Parse TFRecord
        imgdata, label, bbox, text = deserialize_image_record(record)
        #label -= 1 # Change to 0-based if not using background class
        try:    image = tf.image.decode_jpeg(imgdata, channels=3, fancy_upscaling=False, dct_method='INTEGER_FAST')
        except: image = tf.image.decode_png(imgdata, channels=3)

        image = vgg_preprocessing.preprocess_image(image, 224, 224, is_training=False)
        return image, label

In [8]:
#Define some global variables
BATCH_SIZE = 64


<a id="2"></a>
## 2. Verifying the original FP16 model
We demonstrate the conversion process with a Resnet-50 v1 model. First, we inspect the original Tensorflow model.

In [9]:
SAVED_MODEL_DIR =  "./resnet_model/SavedModel"

We employ `saved_model_cli` to inspect the inputs and outputs of the model.

In [10]:
!saved_model_cli show --all --dir $SAVED_MODEL_DIR

2020-01-14 01:20:49.766261: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: input_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT32
        shape: (-1)
        name: ArgMax:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1001)
        name: resnet50_v1.5/output/softmax:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: in

This give us information on the input and output tensors as `input_tensor:0` and `softmax_tensor:0` respectively. Also note that the number of output classes here is 1001 instead of 1000 Imagenet classes. This is because the network was trained with an extra background class. 

In [11]:
INPUT_TENSOR = 'input_tensor:0'
OUTPUT_TENSOR = 'resnet50_v1.5/output/softmax:0'

Next, we define a function to read in a saved mode, measuring its speed and accuracy on the validation data.

In [12]:
def benchmark_saved_model(SAVED_MODEL_DIR, BATCH_SIZE=64):
    with tf.Session(graph=tf.Graph(), config=config) as sess:
        # prepare dataset iterator
        dataset = tf.data.TFRecordDataset(validation_files)    
        dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=preprocess, batch_size=BATCH_SIZE, num_parallel_calls=20))

        iterator = dataset.make_one_shot_iterator()
        next_element = iterator.get_next()

        tf.saved_model.loader.load(
            sess, [tf.saved_model.tag_constants.SERVING], SAVED_MODEL_DIR)

        print('Warming up for 50 batches...')
        for _ in range (50):
            sess.run(OUTPUT_TENSOR, feed_dict={INPUT_TENSOR: sess.run(next_element)[0]})

        print('Benchmarking inference engine...')
        num_hits = 0
        num_predict = 0
        start_time = time.time()
        try:
            while True:        
                image_data = sess.run(next_element)    
                img = image_data[0]
                label = image_data[1].squeeze()
                output = sess.run([OUTPUT_TENSOR], feed_dict={INPUT_TENSOR: img})            
                prediction = np.argmax(output[0], axis=1)
                num_hits += np.sum(prediction == label)
                num_predict += len(prediction)
        except tf.errors.OutOfRangeError as e:
            pass

        print('Accuracy: %.2f%%'%(100*num_hits/num_predict))
        print('Inference speed: %.2f samples/s'%(num_predict/(time.time()-start_time)))

In [13]:
benchmark_saved_model(SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 0.44%
Inference speed: 1196.57 samples/s


<a id="3"></a>
## 3. Creating TF-TRT FP32 model

Next, we convert the native TF FP16 model to TF-TRT FP32, then verify model accuracy and inference speed.

In [14]:
FP32_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"_TFTRT_FP32/1"
!rm -rf $FP32_SAVED_MODEL_DIR

# Now we create the TFTRT FP32 engine
converter = trt.TrtGraphConverter(input_saved_model_dir=SAVED_MODEL_DIR,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.FP32)
converter.convert()
converter.save(FP32_SAVED_MODEL_DIR)

benchmark_saved_model(FP32_SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 0.44%
Inference speed: 1405.83 samples/s


<a id="4"></a>
## 4. Creating TF-TRT FP16 model

Next, we convert the native TF FP16 model to TF-TRT FP16, then verify model accuracy and inference speed.

In [15]:
FP16_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"_TFTRT_FP16/1"
!rm -rf $FP16_SAVED_MODEL_DIR

# Now we create the TFTRT FP16 engine
converter = trt.TrtGraphConverter(input_saved_model_dir=SAVED_MODEL_DIR,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.FP16)
converter.convert()
converter.save(FP16_SAVED_MODEL_DIR)

benchmark_saved_model(FP16_SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 0.44%
Inference speed: 3667.77 samples/s


<a id="5"></a>
## 5. Creating TF-TRT INT8 model

Creating TF-TRT INT8 inference model requires two steps:

- Step 1: Prepare a calibration dataset

- Step 2: Convert and calibrate the TF-TRT INT8 inference engine

### Step 1: Prepare a calibration dataset

Creating TF-TRT INT8 model requires a small calibration dataset. This data set ideally should represent the test data in production well, and will be used to create a value histogram for each layer in the neural network for effective 8-bit quantization.

In [16]:
num_calibration_batches = 2
batched_input = np.zeros((BATCH_SIZE * num_calibration_batches, 224, 224, 3), dtype=np.float32)

with tf.Session(graph=tf.Graph(), config=config) as sess:
    # prepare dataset iterator
    dataset = tf.data.TFRecordDataset(validation_files)    
    dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=preprocess, batch_size=BATCH_SIZE, num_parallel_calls=20))

    iterator = dataset.make_one_shot_iterator()
    next_element = iterator.get_next()
    for i in range(num_calibration_batches):
        batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :] = sess.run(next_element)[0]

print('Calibration data shape: ', batched_input.shape)

def calibration_input_fn_gen():
    for i in range(num_calibration_batches):
        yield batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :]
        
calibration_input_fn = calibration_input_fn_gen()        

Calibration data shape:  (128, 224, 224, 3)



### Step 2: Convert and calibrate the TF-TRT INT8 inference engine

The calibration step may take a while to complete.

In [None]:
# set a directory to write the saved model
INT8_SAVED_MODEL_DIR =  SAVED_MODEL_DIR + "_TFTRT_INT8/1"
!rm -rf $INT8_SAVED_MODEL_DIR
    
converter = trt.TrtGraphConverter(
      input_saved_model_dir=SAVED_MODEL_DIR,
      precision_mode=trt.TrtPrecisionMode.INT8)
  
converter.convert()

# Run calibration for num_calibration_batches times.
converted_graph_def = converter.calibrate(
      fetch_names=[OUTPUT_TENSOR],
      num_runs=num_calibration_batches,
      feed_dict_fn=lambda: {INPUT_TENSOR: next(calibration_input_fn)})

converter.save(INT8_SAVED_MODEL_DIR)
    

### Benchmarking INT8 saved model

Finally we reload and verify the accuracy and performance of the INT8 saved model from disk.

In [None]:
benchmark_saved_model(INT8_SAVED_MODEL_DIR, BATCH_SIZE=BATCH_SIZE)

In [None]:
!saved_model_cli show --all --dir $INT8_SAVED_MODEL_DIR

<a id="6"></a>
## 6. Calibrating TF-TRT INT8 model with raw JPEG images

As an alternative to taking data in TFRecords format, in this section, we demonstrate the process of calibrating TFTRT INT-8 model from a directory of raw JPEG images. We asume that raw images have been mounted to the directory `/data/Calibration_data`.

As a rule of thumb, calibration data should be a small but representative set of images that is similar to what is expected in deployment. Empirically, for common network architectures trained on imagenet data, calibration data of size 500-1000 provide good accuracy. As such, a good strategy for a dataset such as imagenet is to choose one sample from each class. 

In [None]:
data_directory = "/data/Calibration_data"
calibration_files = [os.path.join(path, name) for path, _, files in os.walk(data_directory) for name in files]
print('There are %d calibration files. \n%s\n%s\n...'%(len(calibration_files), calibration_files[0], calibration_files[-1]))

We define a helper function to read and preprocess image from JPEG file.

In [None]:
def parse_file(filepath):
    image = tf.read_file(filepath)
    image = tf.image.decode_jpeg(image, channels=3)
    image = vgg_preprocessing.preprocess_image(image, 224, 224, is_training=False)
    return image

In [None]:
num_calibration_batches = 2
batched_input = np.zeros((BATCH_SIZE * num_calibration_batches, 224, 224, 3), dtype=np.float32)

with tf.Session(graph=tf.Graph(), config=config) as sess:
    # prepare dataset iterator
    dataset = tf.data.Dataset.from_tensor_slices(calibration_files)
    dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=parse_file, batch_size=BATCH_SIZE, num_parallel_calls=20))
    dataset = dataset.repeat(None)

    iterator = dataset.make_one_shot_iterator()
    next_element = iterator.get_next()
    for i in range(num_calibration_batches):
        batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :] = sess.run(next_element)[0]

print('Calibration data shape: ', batched_input.shape)

def calibration_input_fn_gen():
    for i in range(num_calibration_batches):
        yield batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :]
        
calibration_input_fn = calibration_input_fn_gen()       

Next, we proceed with the two-stage process of creating and calibrating TFTRT INT8 model.

### Convert and calibrate the TF-TRT INT8 inference engine

In [None]:
# set a directory to write the saved model
INT8_SAVED_MODEL_DIR =  SAVED_MODEL_DIR + "_TFTRT_INT8/1"
!rm -rf $INT8_SAVED_MODEL_DIR
    
converter = trt.TrtGraphConverter(
      input_saved_model_dir=SAVED_MODEL_DIR,
      precision_mode=trt.TrtPrecisionMode.INT8)
  
converter.convert()

# Run calibration for num_calibration_batches times.
converted_graph_def = converter.calibrate(
      fetch_names=[OUTPUT_TENSOR],
      num_runs=num_calibration_batches,
      feed_dict_fn=lambda: {INPUT_TENSOR: next(calibration_input_fn)})

converter.save(INT8_SAVED_MODEL_DIR)

As before, we can benchmark the speed and accuracy of the resulting model.

In [None]:
benchmark_saved_model(INT8_SAVED_MODEL_DIR)

## Conclusion
In this notebook, we have demonstrated the process of creating TF-TRT inference model from an original TF FP32 *saved model*. In every case, we have also verified the accuracy and speed to the resulting model. 
