<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# Using Dynamic Shapes with TensorFlow-TensorRT

The NVIDIA TensorRT is a library that facilitates high performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine which performs inference for that network. 

TensorFlow™ integration with TensorRT™ (TF-TRT) optimizes and executes compatible subgraphs, allowing TensorFlow to execute the remaining graph. While you can still use TensorFlow's wide and flexible feature set, TensorRT will parse the model and apply optimizations to the portions of the graph wherever possible.

In this notebook, we demonstrate the use of dynamic shape tensors when using TensorFlow-TensorRT


## Introduction

If you are unfamiliar with how TensorFlow TensorRT works, you can refer to this [video](https://www.youtube.com/watch?v=w7871kMiAs8) for a quick overview. Some understanding of how TF-TRT works is required to digest the information in the following section. A quick and dirty explaination of the above is as follows: TF-TRT partitions the network graph into supported and unsupported sub-graphs. For each of these supported subgraphs, TRTEngineOp builds a TensorRT Engine. With this information in mind, let's proceed to the task at hand.

TensorFlow TensorRT has two concepts relevent to this discussion:
* Dynamic Ops
* Dynamic Shape

#### Explaining Dynamic Ops

Dynamic Ops can be treated as a mode which let's users leverage the optimized model "implicit shape" mode, ie, if the model's input tensor shape is defined as(example) `[?, ?, ?, 3]`. How does this work? The TRTEngineOp creates the TensorRT engine at inference time with the shape of the input tensor (Let's say, `[8, 224, 224, 3]`). So up on execution, if we supply a tensor with a shape (say `[16, 224, 224, 3]`) another engine will be created. While this provides flexibility, the downside is that each TRT Engine consumes memory (a set of model weights for each "profile").

#### Explaining Dynamic Shapes

Dynamic Shape mode reqires the user to define, `minimum`, `optimial` and `maximum` shapes for the input tensor. This shifts the task at hand from being one about supporting implict tensor shape to supporting a set of explict batch shapes. The engine built in this case can handle any shape between the `minimum` and `maximum` shape, without a need for building separate engines.


insert image

## Setting up the environment

TensorFlow-TensorRT comes packaged with TensorFlow, so if you have TensorFlow setup, and are running this on any NVIDIA GPU with CUDA cores (preferablly a Volta, Turing, Ampere or newer generation GPU with Tensor cores) we can proceed. You can also choose to run this inside our [TensorFlow container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) which comes packaged with a host of software which can help acclerate your workloads.

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals
import os
import time

import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import tag_constants
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions

#### Model
For this demonstration, we will be making use of a simple ResNet-50 Model trained on the imagenet dataset.

In [2]:
model = ResNet50(weights='imagenet')
model.save('resnet50_saved_model') 

2022-05-12 19:22:02.968545: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-12 19:22:02.995428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-12 19:22:02.995597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-12 19:22:02.996066: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate 



2022-05-12 19:22:07.287423: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: resnet50_saved_model/assets


Let's take a look at the shape of the input and output tensors

In [3]:
!saved_model_cli show --all --dir resnet50_saved_model


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: serving_default_input_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['predictions'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1000)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

Concrete Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          input_1: 

### TF-TRT using dynamic shape

Dynamic shape mode requires TRT optimization profiles: “A TensorRT optimization profile describes the possible min/max values of each dynamic input shape along with an optimum value. These values are used by the TensorRT builder to select the best kernel for the optimum value among those kernels that are valid for all input tensors in the [min, max] range.”

In dynamic shape mode, we need to collect optimization profiles before we can create the TensorRT engine. Therefore we require users  to provide input tensors using the input_fn when converter.build is called, so that TF-TRT can automatically generate the needed optimization profiles (guided by the shapes of these input tensors) to build a single engine that can take different input shapes.

In the function below, we supply a `minimum`, `optimal`, and `maximum` batch shapes.

In [4]:
def input_fn():
  input_shapes = [[(16, 224, 224, 3)],
                  [(32, 224, 224, 3)],
                  [(64, 224, 224, 3)]]
  for shapes in input_shapes:
    yield [np.zeros(x, dtype=np.float32) for x in shapes]

In [5]:
print('Converting to TF-TRT FP32...')

converter = trt.TrtGraphConverterV2(input_saved_model_dir='resnet50_saved_model',
                                   precision_mode=trt.TrtPrecisionMode.FP32,
                                    max_workspace_size_bytes=1<<32)
converter.convert()
converter.build(input_fn)
converter.save(output_saved_model_dir='resnet50_saved_model_TFTRT_FP32')
print('Done Converting to TF-TRT FP32')

Converting to TF-TRT FP32...
INFO:tensorflow:Linked TensorRT version: (8, 2, 4)
INFO:tensorflow:Loaded TensorRT version: (8, 2, 4)


2022-05-12 19:22:23.416416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-12 19:22:23.416556: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2022-05-12 19:22:23.416638: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2022-05-12 19:22:23.416946: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-12 19:22:23.417066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-12 19:22:23.417166: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] su

INFO:tensorflow:Assets written to: resnet50_saved_model_TFTRT_FP32/assets
Done Converting to TF-TRT FP32


In [6]:
!saved_model_cli show --all --dir resnet50_saved_model_TFTRT_FP32


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 224, 224, 3)
        name: serving_default_input_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['predictions'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1000)
        name: PartitionedCall:0
  Method name is: tensorflow/serving/predict
2022-05-12 19:23:09.362483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had ne

In [7]:
# We are going to use empty tensors as dummy input to make a quick and dirty benchmarking util.
# Feel free to add your own dataloader if you so choose to 

def benchmark_tftrt(input_saved_model, batch_size):
    batched_input = tf.random.uniform(shape=[batch_size, 224, 224, 3])
    
    saved_model_loaded = tf.saved_model.load(input_saved_model, tags=[tag_constants.SERVING])
    infer = saved_model_loaded.signatures['serving_default']

    N_warmup_run = 50
    N_run = 1000
    elapsed_time = []

    for i in range(N_warmup_run):
      labeling = infer(batched_input)

    for i in range(N_run):
      start_time = time.time()
      labeling = infer(batched_input)
      end_time = time.time()
      elapsed_time = np.append(elapsed_time, end_time - start_time)
      if i % 50 == 0:
        print('Step {}: {:4.1f}ms'.format(i, (elapsed_time[-50:].mean()) * 1000))

    print('Throughput: {:.0f} images/s'.format(N_run * batch_size / elapsed_time.sum()))

In [8]:
benchmark_tftrt('resnet50_saved_model_TFTRT_FP32', 16)



Step 0:  7.6ms
Step 50:  7.5ms
Step 100:  7.5ms
Step 150:  7.6ms
Step 200:  7.6ms
Step 250:  7.6ms
Step 300:  7.6ms
Step 350:  7.6ms
Step 400:  7.6ms
Step 450:  7.6ms
Step 500:  7.6ms
Step 550:  7.6ms
Step 600:  7.6ms
Step 650:  7.6ms
Step 700:  7.6ms
Step 750:  7.6ms
Step 800:  7.6ms
Step 850:  7.6ms
Step 900:  7.6ms
Step 950:  7.6ms
Throughput: 2108 images/s


In [9]:
benchmark_tftrt('resnet50_saved_model_TFTRT_FP32', 32)



Step 0: 13.9ms
Step 50: 13.9ms
Step 100: 13.8ms
Step 150: 13.9ms
Step 200: 13.9ms
Step 250: 13.9ms
Step 300: 13.9ms
Step 350: 13.9ms
Step 400: 13.9ms
Step 450: 13.9ms
Step 500: 13.9ms
Step 550: 13.9ms
Step 600: 13.9ms
Step 650: 13.9ms
Step 700: 13.9ms
Step 750: 13.9ms
Step 800: 13.9ms
Step 850: 13.9ms
Step 900: 13.9ms
Step 950: 13.9ms
Throughput: 2303 images/s
