# Tensorflow - TensorRT Inference from Checkpoint with Tensorflow 1.14, 1.15
In this notebook, we demonstrate the process to create a TF-TRT optimized model  for inference from a Tensorflow *checkpoint*. We will also validate the resulting models, both on accuracy and speed.

This notebook was designed to run with TensorFlow versions 1.14, 1.15 which is included as part of NVIDIA NGC Tensorflow containers from `nvcr.io/nvidia/tensorflow:19.07-py3` to `nvcr.io/nvidia/tensorflow:19.12-tf1-py3` that can be downloaded from http://ngc.nvidia.com.

## Notebook  Content
1. [Pre-requisite: data and model](#1)
1. [Verifying the orignal FP32 model](#2)
1. [Creating TF-TRT FP32 model](#3)
1. [Creating TF-TRT FP16 model](#4)
1. [Creating TF-TRT INT8 model](#5)
1. [Calibrating TF-TRT INT8 model with raw JPEG images](#6)


## Quickstart

We wil be using the ImageNet dataset in TFrecords format. Google provides an excellent all-in-one script for downloading and preparing the ImageNet dataset at 

https://github.com/tensorflow/models/blob/master/research/inception/inception/data/download_and_preprocess_imagenet.sh.

We will run this demonstration with a saved model from the Tensorflow Slim model zoo 

https://github.com/tensorflow/models/tree/master/research/slim

To run this notebook, start the NGC TF container providing correct path to ImageNet validation data and a TF Slim saved checkpoint:

```bash
nvidia-docker run -it -p 8888:8888 -v /path/to/image_net/:/data  -v /path/to/saved_model:/saved_model --name TFTRT nvcr.io/nvidia/tensorflow:19.04-py3
```
Then start Jupyter notebook within the container with:

```bash
jupyter notebook --ip 0.0.0.0 --port 8888  --allow-root
```

Connect to Jupyter notebook web interface from your local host http://localhost:8888. 

<a id="1"></a>
## 1. Pre-requisite: data and model

We first install some extra packages and external dependencies. 

In [1]:
%%bash
pushd /workspace/nvidia-examples/tensorrt/tftrt/examples/object_detection
bash install_dependencies.sh;
popd

/workspace/nvidia-examples/tensorrt/tftrt/examples/object_detection /home/vinhngx/vinh-all-data/TME_projects/tensorrt/tftrt/examples/image-classification
Setup local variables...
Download protobuf...
/workspace/nvidia-examples/tensorrt/tftrt/examples/object_detection/protoc /workspace/nvidia-examples/tensorrt/tftrt/examples/object_detection
Archive:  protoc-3.7.1-linux-x86_64.zip
  inflating: include/google/protobuf/wrappers.proto  
  inflating: include/google/protobuf/field_mask.proto  
  inflating: include/google/protobuf/api.proto  
  inflating: include/google/protobuf/struct.proto  
  inflating: include/google/protobuf/descriptor.proto  
  inflating: include/google/protobuf/timestamp.proto  
  inflating: include/google/protobuf/compiler/plugin.proto  
  inflating: include/google/protobuf/empty.proto  
  inflating: include/google/protobuf/any.proto  
  inflating: include/google/protobuf/source_context.proto  
  inflating: include/google/protobuf/type.proto  
  inflating: include/goo


echo Download protobuf...
mkdir -p $PROTOC_DIR
pushd $PROTOC_DIR
ARCH=$(uname -m)
if [ "$ARCH" == "aarch64" ] ; then
  filename="protoc-3.7.1-linux-aarch_64.zip"
elif [ "$ARCH" == "x86_64" ] ; then
  filename="protoc-3.7.1-linux-x86_64.zip"
elif [ "$ARCH" == "ppc64le" ] ; then
  filename="protoc-3.7.1-linux-ppcle_64.zip"
else
  echo ERROR: $ARCH not supported.
  exit 1;
fi
wget --no-check-certificate ${PROTO_BASE_URL}${filename}
--2020-01-09 07:00:21--  https://github.com/google/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip
Resolving github.com (github.com)... 13.237.44.5
Connecting to github.com (github.com)|13.237.44.5|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip [following]
--2020-01-09 07:00:22--  https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip
Reusing existing connectio

In [2]:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import matplotlib.pyplot as plt
import os
import time
import logging

logging.getLogger("tensorflow").setLevel(logging.ERROR)

import os
os.environ['CUDA_VISIBLE_DEVICES']='0'


config = tf.ConfigProto()
config.gpu_options.allow_growth=True

#check TensorRT version
!dpkg -l | grep nvinfer

ii  libnvinfer-bin                         6.0.1-1+cuda10.2                  amd64        TensorRT binaries
ii  libnvinfer-dev                         6.0.1-1+cuda10.2                  amd64        TensorRT development libraries and headers
ii  libnvinfer-plugin-dev                  6.0.1-1+cuda10.2                  amd64        TensorRT plugin libraries
ii  libnvinfer-plugin6                     6.0.1-1+cuda10.2                  amd64        TensorRT plugin libraries
ii  libnvinfer6                            6.0.1-1+cuda10.2                  amd64        TensorRT runtime libraries
ii  python3-libnvinfer                     6.0.1-1+cuda10.2                  amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                 6.0.1-1+cuda10.2                  amd64        Python 3 development package for TensorRT


### Data
We first check that the correct Imagenet validation data folder has been mounted. In this experiment, we shall employ the Imagenet validation data to verify the accuracy and inference speed of the models.

In [3]:
def get_files(data_dir, filename_pattern):
    if data_dir == None:
        return []
    files = tf.gfile.Glob(os.path.join(data_dir, filename_pattern))
    if files == []:
        raise ValueError('Can not find any files in {} with '
                         'pattern "{}"'.format(data_dir, filename_pattern))
    return files

In [4]:
VALIDATION_DATA_DIR = "/data"
validation_files = get_files(VALIDATION_DATA_DIR, 'validation*')
print('There are %d validation files. \n%s\n%s\n...'%(len(validation_files), validation_files[0], validation_files[-1]))

There are 128 validation files. 
/data/validation-00011-of-00128
/data/validation-00086-of-00128
...


### TF model checkpoint
If not already downloaded, we will be downloading and working with a ResNet-50 v1 checkpoint from https://github.com/tensorflow/models/tree/master/research/slim. 

In [5]:
%%bash
FILE=/saved_model/resnet_v1_50_2016_08_28.tar.gz
if [ -f $FILE ]; then
   echo "The file '$FILE' exists."
else
   echo "The file '$FILE' in not found. Downloading..."
   wget -P /saved_model/ http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
fi


The file '/saved_model/resnet_v1_50_2016_08_28.tar.gz' exists.


In [6]:
!tar -xzvf /saved_model/resnet_v1_50_2016_08_28.tar.gz -C /saved_model 

resnet_v1_50.ckpt


### Helper functions
We define a few helper functions to read and preprocess Imagenet data from TFRecord files.

In [7]:
#Define some global variables
BATCH_SIZE = 8
SAVED_MODEL_DIR = "/saved_model/"

In [8]:
def deserialize_image_record(record):
    feature_map = {
        'image/encoded':          tf.FixedLenFeature([ ], tf.string, ''),
        'image/class/label':      tf.FixedLenFeature([1], tf.int64,  -1),
        'image/class/text':       tf.FixedLenFeature([ ], tf.string, ''),
        'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
        'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32)
    }
    with tf.name_scope('deserialize_image_record'):
        obj = tf.parse_single_example(record, feature_map)
        imgdata = obj['image/encoded']
        label   = tf.cast(obj['image/class/label'], tf.int32)
        bbox    = tf.stack([obj['image/object/bbox/%s'%x].values
                            for x in ['ymin', 'xmin', 'ymax', 'xmax']])
        bbox = tf.transpose(tf.expand_dims(bbox, 0), [0,2,1])
        text    = obj['image/class/text']
        return imgdata, label, bbox, text

from preprocessing import vgg_preprocessing
def preprocess(record):
    # Parse TFRecord
    imgdata, label, bbox, text = deserialize_image_record(record)
    label -= 1 # Change to 0-based (don't use background class)
    try:    image = tf.image.decode_jpeg(imgdata, channels=3, fancy_upscaling=False, dct_method='INTEGER_FAST')
    except: image = tf.image.decode_png(imgdata, channels=3)

    image = vgg_preprocessing.preprocess_image(image, 224, 224, is_training=False)
    return image, label


Next are two functions to benchmark models speed and accuracy, either in a `graph_def` form or a `saved model` form.

In [10]:
def benchmark_frozen_graph(frozen_graph, SAVED_MODEL_DIR=None, BATCH_SIZE=8):
    with tf.Session(graph=tf.Graph(), config=config) as sess:
        # prepare dataset iterator
        dataset = tf.data.TFRecordDataset(validation_files)    
        dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=preprocess, batch_size=BATCH_SIZE, num_parallel_calls=20))

        iterator = dataset.make_one_shot_iterator()
        next_element = iterator.get_next()

        output_node = tf.import_graph_def(
            frozen_graph,
            return_elements=['classes'],
            name="")
        
        print('Warming up for 50 batches...')
        for _ in range (50):
            sess.run(['classes:0'], feed_dict={"input:0": sess.run(next_element)[0]})

        num_hits = 0
        num_predict = 0
        start_time = time.time()
        try:
            while True:        
                image_data = sess.run(next_element)    
                img = image_data[0]
                label = image_data[1].squeeze()
                output = sess.run(['classes:0'], feed_dict={"input:0": img})
                prediction = output[0]
                num_hits += np.sum(prediction == label)
                num_predict += len(prediction)
        except tf.errors.OutOfRangeError as e:
            pass

        print('Accuracy: %.2f%%'%(100*num_hits/num_predict)) 
        print('Inference speed: %.2f samples/s'%(num_predict/(time.time()-start_time)))
        
        #Optionally, save model for serving if an ouput directory argument is presented
        if SAVED_MODEL_DIR:
            print('Saving model to %s'%SAVED_MODEL_DIR)
            tf.saved_model.simple_save(
                session=sess,
                export_dir=SAVED_MODEL_DIR,
                inputs={"input":tf.get_default_graph().get_tensor_by_name("input:0")},
                outputs={"classes":tf.get_default_graph().get_tensor_by_name("classes:0")},
                legacy_init_op=None
             )

In [16]:
def benchmark_saved_model(SAVED_MODEL_DIR, BATCH_SIZE=8):
    with tf.Session(graph=tf.Graph(), config=config) as sess:
        # prepare dataset iterator
        dataset = tf.data.TFRecordDataset(validation_files)    
        dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=preprocess, batch_size=BATCH_SIZE, num_parallel_calls=20))

        iterator = dataset.make_one_shot_iterator()
        next_element = iterator.get_next()

        tf.saved_model.loader.load(
            sess, [tf.saved_model.tag_constants.SERVING], SAVED_MODEL_DIR)

        print('Warming up for 50 batches...')
        for _ in range (50):
            sess.run(['classes:0'], feed_dict={"input:0": sess.run(next_element)[0]})

        print('Benchmarking inference engine...')
        num_hits = 0
        num_predict = 0
        start_time = time.time()
        try:
            while True:        
                image_data = sess.run(next_element)    
                img = image_data[0]
                label = image_data[1].squeeze()
                output = sess.run(['classes:0'], feed_dict={"input:0": img})            
                prediction = output[0]
                num_hits += np.sum(prediction == label)
                num_predict += len(prediction)
        except tf.errors.OutOfRangeError as e:
            pass

        print('Accuracy: %.2f%%'%(100*num_hits/num_predict))
        print('Inference speed: %.2f samples/s'%(num_predict/(time.time()-start_time)))

<a id="2"></a>
## 2. Verifying the orignal FP32 model

We first load and benchmark the Resnet-v1-50 model from TF slim. Note that the checkpoint downloaded from http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz doesn't come with any meta data, therefore we will need to employ TF Slim Net factory to get the model definition. The newer checkpoints generated by Tensorflow generally comes with enough meta information to load the network from the achirve. 

In [12]:
import nets.nets_factory

graph = tf.Graph()
with graph.as_default():
    with tf.Session(config=config) as sess:
        tf_input = tf.placeholder(tf.float32, [None, 224, 224, 3], name='input')
        network_fn = nets.nets_factory.get_network_fn('resnet_v1_50', 1000,
                                                      is_training=False)
        tf_net, tf_end_points = network_fn(tf_input)
                
        saver = tf.train.Saver()
        saver.restore(sess, SAVED_MODEL_DIR+"resnet_v1_50.ckpt")
        
        tf_output = tf.identity(tf_net, name='logits')
        tf_output_classes = tf.argmax(tf_output, axis=1, name='classes')        
        #tf_output_classes = tf.reshape(tf_output_classes, (BATCH_SIZE,), name='classes')
        
        # freeze graph
        fp32_frozen_graph = tf.graph_util.convert_variables_to_constants(
            sess,
            sess.graph_def,
            output_node_names=['logits', 'classes']
        ) 

In [13]:
FP32_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"model/Resnet_FP32/1"
!rm -rf $FP32_SAVED_MODEL_DIR

benchmark_frozen_graph(fp32_frozen_graph, FP32_SAVED_MODEL_DIR)

Warming up for 50 batches...
Accuracy: 75.18%
Inference speed: 668.93 samples/s
Saving model to /saved_model/model/Resnet_FP32/1


In [14]:
!saved_model_cli show --all --dir $FP32_SAVED_MODEL_DIR

2020-01-09 07:04:03.274276: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: classes:0
  Method name is: tensorflow/serving/predict


In [17]:
benchmark_saved_model(FP32_SAVED_MODEL_DIR)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 75.18%
Inference speed: 739.00 samples/s


<a id="3"></a>
## 3. Creating TF-TRT FP32 model

Next, we convert the naitive TF FP32 model to TF-TRT FP32, then verify model accuracy and inference speed.

In [20]:
#Now we create the TFTRT FP32 engine
converter = trt.TrtGraphConverter(input_graph_def=fp32_frozen_graph,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.FP32,
                                  nodes_blacklist=['classes', 'logits'])
trt_fp32_graph = converter.convert()

"""
# Alternatively, old API:
trt_fp32_graph = trt.create_inference_graph(
    input_graph_def=fp32_frozen_graph,
    outputs=['classes'],
    max_batch_size=BATCH_SIZE,
    precision_mode="FP32")
"""

'\n# Alternatively, old API:\ntrt_fp32_graph = trt.create_inference_graph(\n    input_graph_def=fp32_frozen_graph,\n    outputs=[\'classes\'],\n    max_batch_size=BATCH_SIZE,\n    precision_mode="FP32")\n'

In [21]:
TRT_FP32_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"model/Resnet_TRT_FP32/1"
!rm -rf $TRT_FP32_SAVED_MODEL_DIR

benchmark_frozen_graph(trt_fp32_graph, TRT_FP32_SAVED_MODEL_DIR)

Warming up for 50 batches...
Accuracy: 75.18%
Inference speed: 968.44 samples/s
Saving model to /saved_model/model/Resnet_TRT_FP32/1


In [22]:
!saved_model_cli show --all --dir $TRT_FP32_SAVED_MODEL_DIR

2020-01-09 07:11:09.291882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: unknown_rank
        name: classes:0
  Method name is: tensorflow/serving/predict


In [23]:
benchmark_saved_model(TRT_FP32_SAVED_MODEL_DIR)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 75.18%
Inference speed: 618.83 samples/s


<a id="4"></a>
## 4. Creating TF-TRT FP16 model


Next, we convert the naitive TF FP32 model to TF-TRT FP16, then verify model accuracy and inference speed.

In [24]:
#Now we create the TFTRT FP16 engine
converter = trt.TrtGraphConverter(input_graph_def=fp32_frozen_graph,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.FP16,
                                  nodes_blacklist=['classes', 'logits'])
trt_fp16_graph = converter.convert()

"""
# Alternatively, old API:
trt_fp16_graph = trt.create_inference_graph(
    input_graph_def=fp32_frozen_graph,
    outputs=['classes'],
    max_batch_size=BATCH_SIZE,
    precision_mode="FP16")
"""

'\n# Alternatively, old API:\ntrt_fp16_graph = trt.create_inference_graph(\n    input_graph_def=fp32_frozen_graph,\n    outputs=[\'classes\'],\n    max_batch_size=BATCH_SIZE,\n    precision_mode="FP16")\n'

In [25]:
TRT_FP16_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"/model/Resnet_TRT_FP16/1"
!rm -rf $TRT_FP16_SAVED_MODEL_DIR

benchmark_frozen_graph(trt_fp16_graph, TRT_FP16_SAVED_MODEL_DIR)

Warming up for 50 batches...
Accuracy: 75.18%
Inference speed: 726.11 samples/s
Saving model to /saved_model//model/Resnet_TRT_FP16/1


In [26]:
benchmark_saved_model(TRT_FP16_SAVED_MODEL_DIR)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 75.18%
Inference speed: 782.17 samples/s


<a id="5"></a>
## 5. Creating TF-TRT INT8 model

Creating TF-TRT INT8 inference model requires two steps:

- Step 1: Prepare a calibration data

- Step 2: Convert and calibrate the TF-TRT INT8 inference engine

### Step 1: Prepare a calibration dataset

Creating TF-TRT INT8 model requires a small calibration dataset. This data set ideally should represent the test data in production well, and will be used to create a value histogram for each layer in the neural network for effective 8-bit quantization.

In [27]:
num_calibration_batches = 2
batched_input = np.zeros((BATCH_SIZE * num_calibration_batches, 224, 224, 3), dtype=np.float32)

with tf.Session(graph=tf.Graph(), config=config) as sess:
    # prepare dataset iterator
    dataset = tf.data.TFRecordDataset(validation_files)    
    dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=preprocess, batch_size=BATCH_SIZE, num_parallel_calls=20))

    iterator = dataset.make_one_shot_iterator()
    next_element = iterator.get_next()
    for i in range(num_calibration_batches):
        batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :] = sess.run(next_element)[0]

#batched_input = tf.constant(batched_input)
print('Calibration data shape: ', batched_input.shape)

def calibration_input_fn_gen():
    for i in range(num_calibration_batches):
        yield batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :]
        
calibration_input_fn = calibration_input_fn_gen()        

Calibration data shape:  (16, 224, 224, 3)



### Step 2: Convert and calibrate the TF-TRT INT8 inference engine

The calibration step may take a while to complete.

In [30]:
#set a directory to write the saved model
INT8_SAVED_MODEL_DIR =  SAVED_MODEL_DIR + "_TFTRT_INT8/1"
!rm -rf $INT8_SAVED_MODEL_DIR

#Now we create the TFTRT FP16 engine
converter = trt.TrtGraphConverter(input_graph_def=fp32_frozen_graph,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.INT8,
                                  nodes_blacklist=['classes', 'logits'])
trt_int8_graph = converter.convert()


# Run calibration for num_calibration_batches times.
trt_int8_calibrated_graph = converter.calibrate(
      fetch_names=['classes:0'],
      num_runs=num_calibration_batches,
      feed_dict_fn=lambda: {"input:0": next(calibration_input_fn)})


In [31]:
INT8_SAVED_MODEL_DIR = SAVED_MODEL_DIR + '/model/Resnet_TRT_INT8/1'
!rm -rf $INT8_SAVED_MODEL_DIR

benchmark_frozen_graph(trt_int8_calibrated_graph, INT8_SAVED_MODEL_DIR)

Warming up for 50 batches...
Accuracy: 75.16%
Inference speed: 1336.62 samples/s
Saving model to /saved_model//model/Resnet_TRT_INT8/1


In [32]:
!saved_model_cli show --all --dir $INT8_SAVED_MODEL_DIR

2020-01-09 07:22:09.282118: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: unknown_rank
        name: classes:0
  Method name is: tensorflow/serving/predict


Finally we reload and verify the performance of the INT8 saved model.

In [33]:
benchmark_saved_model(INT8_SAVED_MODEL_DIR)

Warming up for 50 batches...
Benchmarking inference engine...
Accuracy: 75.15%
Inference speed: 1641.23 samples/s


<a id="6"></a>
## 6. Calibrating TF-TRT INT8 model with raw JPEG images

As an alternative to taking data in TFRecords format, in this section, we demonstrate the process of calibrating TFTRT INT-8 model from a directory of raw JPEG images. We asume that raw images have been mounted to the directory `/data/Calibration_data`.

As a rule of thumb, calibration data should be a small but representative set of images that is similar to what is expected in deployment. Empirically, for common network architectures trained on imagenet data, calibration data of size 500-1000 provide good accuracy. As such, a good strategy for a dataset such as imagenet is to choose one sample from each class. 

In [34]:
data_directory = "/data/Calibration_data"
calibration_files = [os.path.join(path, name) for path, _, files in os.walk(data_directory) for name in files]
print('There are %d calibration files. \n%s\n%s\n...'%(len(calibration_files), calibration_files[0], calibration_files[-1]))

There are 45 calibration files. 
/data/Calibration_data/ILSVRC2012_val_00028553.JPEG
/data/Calibration_data/ILSVRC2012_val_00030154.JPEG
...


In [35]:
def parse_file(filepath):
    image = tf.read_file(filepath)
    image = tf.image.decode_jpeg(image, channels=3)
    image = vgg_preprocessing.preprocess_image(image, 224, 224, is_training=False)
    return image

In [36]:
num_calibration_batches = 2
batched_input = np.zeros((BATCH_SIZE * num_calibration_batches, 224, 224, 3), dtype=np.float32)

with tf.Session(graph=tf.Graph(), config=config) as sess:
    # prepare dataset iterator
    dataset = tf.data.Dataset.from_tensor_slices(calibration_files)
    dataset = dataset.apply(tf.contrib.data.map_and_batch(map_func=parse_file, batch_size=BATCH_SIZE, num_parallel_calls=20))
    dataset = dataset.repeat(None)

    iterator = dataset.make_one_shot_iterator()
    next_element = iterator.get_next()
    for i in range(num_calibration_batches):
        batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :] = sess.run(next_element)[0]

#batched_input = tf.constant(batched_input)
print('Calibration data shape: ', batched_input.shape)

def calibration_input_fn_gen():
    for i in range(num_calibration_batches):
        yield batched_input[i*BATCH_SIZE:(i+1)*BATCH_SIZE, :]
        
calibration_input_fn = calibration_input_fn_gen()       
    

Calibration data shape:  (16, 224, 224, 3)


Next, we proceed with the two-stage process of creating and calibrating TFTRT INT8 model.

### Convert and calibrate the TF-TRT INT8 inference engine

In [None]:
#set a directory to write the saved model
INT8_SAVED_MODEL_DIR =  SAVED_MODEL_DIR + "_TFTRT_INT8/1"
!rm -rf $INT8_SAVED_MODEL_DIR

#Now we create the TFTRT FP16 engine
converter = trt.TrtGraphConverter(input_graph_def=fp32_frozen_graph,
                                  max_batch_size=BATCH_SIZE,
                                  precision_mode=trt.TrtPrecisionMode.INT8,
                                  nodes_blacklist=['classes', 'logits'])
trt_int8_graph = converter.convert()


# Run calibration for num_calibration_batches times.
trt_int8_calibrated_graph = converter.calibrate(
      fetch_names=['classes:0'],
      num_runs=num_calibration_batches,
      feed_dict_fn=lambda: {"input:0": next(calibration_input_fn)})


As before, we can benchmark the speed and accuracy of the resulting model.

In [None]:
benchmark_frozen_graph(trt_int8_calibrated_graph, INT8_SAVED_MODEL_DIR)

## Conclusion
In this notebook, we have demonstrated the process of creating TF-TRT inference models from an original TF FP32 checkpoint. In every case, we have also verified the accuracy and speed to the resulting models. 
