# Image classification example

This example script runs inference using a number of popular image classification models.  This script is included in the NVIDIA TensorFlow Docker containers under `/workspace/nvidia-examples`.  See [Preparing To Use NVIDIA Containers](https://docs.nvidia.com/deeplearning/dgx/preparing-containers/index.html) for more information.

You can enable TF-TRT integration by passing the `--use_trt` flag to the script.  This causes the script to apply TensorRT inference optimization to speed up execution for portions of the model's graph where supported, and to fall back on native TensorFlow for layers and operations which are not supported.  See [Accelerating Inference In TensorFlow With TensorRT User Guide](https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html) for more information.                                                                                                                                                                                          

When using TF-TRT, you can use the precision option (`--precision`) to control precision.  float32 is the default (`--precision fp32`) with float16 (`--precision fp16`) or int8 (`--precision int8`) allowing further performance improvements.                                                                                                 

int8 mode requires a calibration step (which is done automatically), but you also must specificy the directory in which the calibration dataset is stored with `--calib_data_dir /imagenet_validation_data`.  You can use the same data for both calibration and validation.

## Models

We have verified the following models.

* MobileNet v1
* MobileNet v2
* NASNet - Large
* NASNet - Mobile
* ResNet50 v1
* ResNet50 v2
* VGG16
* VGG19
* Inception v3
* Inception v4

For the accuracy numbers of these models on the ImageNet validation dataset, see [Verified Models](https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#verified-models).

## Usage

The example Python script is `image_classification.py`.  You can evaluate inference with TF-TRT integration using the pre-trained ResNet V1 50 model by calling the script with the following arguments:

```
python image_classification.py --model resnet_v1_50 \
    --data_dir /path/to/imagenet/tfrecord/files \
    --use_trt \
    --precision fp16
```

Where:

`--model`: Which model to use to run inference, in this case ResNet V1 50.

`--data_dir`: Path to the ImageNet TFRecord validation files.

`--use_trt`: Convert the graph to a TensorRT graph.

`--precision`: Precision mode to use, in this case FP16.

Run with `--help` to see all available options.

Note: In this notebook, we run the script inside IPython using the `%run` built-in command, so that realtime output and tracebacks are displayed.

In [None]:
%run image_classification --help

Also see [General Script Usage](https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#image-class-usage) for more information.

## Output

The script first loads the pre-trained model.  If given the flag `--use_trt`, the model is converted to a TensorRT graph, and the script displays (in addition to its inital configuration options):

- the number of nodes before conversion (`num_nodes(native_tf)`)

- the number of nodes after conversion (`num_nodes(trt_total)`)

- the number of separate TensorRT nodes (`num_nodes(trt_only)`)

- the size of the graph before conversion (`graph_size(MB)(native_tf)`)

- the size of the graph after conversion (`graph_size(MB)(trt)`)

- how long the conversion took (`time(s)(trt_conversion)`)

For example:

```
num_nodes(native_tf): 741
num_nodes(trt_total): 10
num_nodes(trt_only): 1
graph_size(MB)(native_tf): ***
graph_size(MB)(tft): ***
time(s)(trt_conversion): ***
```

Note: For a list of supported operations that can be converted to a TensorRT graph, see the [Supported
Ops](https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops) section of the [Accelerating Inference In TensorFlow With TensorRT User Guide](https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html).

The script then begins running inference on the ImageNet validation set, displaying run times of each iteration after the interval defined by the `--display_every` option (default: `100`):

```
running inference...
    step 100/6202, iter_time(ms)=**.****, images/sec=***
    step 200/6202, iter_time(ms)=**.****, images/sec=***
    step 300/6202, iter_time(ms)=**.****, images/sec=***
    ...
```

On completion, the script prints overall accuracy and timing information over the inference session:

```
results of resnet_v1_50:
    accuracy: 75.95
    images/sec: ***
    99th_percentile(ms): ***
    total_time(s): ***
    latency_mean(ms): ***
```

The accuracy metric measures the percentage of predictions from inference that match the labels on the ImageNet Validation set.  The remaining metrics capture various performance measurements:

- number of images processed per second (`images/sec`)

- total time of the inference session (`total_time(s)`)

- the mean duration for each iteration (`latency_mean(ms)`)

- the slowest duration for an iteration (`99th_percentile(ms)`)


## Using TF-TRT With ResNet V1 50

Here we walk through how to use the example Python scripts in the with the ResNet V1 50 model.

Using TF-TRT with precision modes lower than FP32, that is, FP16 and INT8, improves the performance of inference.  The FP16 precision mode uses Tensor Cores or half-precision hardware instructions, if possible, while the INT8 precision mode uses Tensor Cores or integer hardware instructions.  INT8 mode also requires running a calibration step, which the script does automatically.

Below we use the example script to compare the accuracy and timing performance of all the precision modes when running inference using the ResNet V1 50 model.

### Native TensorFlow Using FP32

This is our baseline session running inference using native TensorFlow without TensorRT integration/conversion.

First, set `DATA_DIR` to where you stored the ImageNet TFRecord validation files:

In [None]:
DATA_DIR = "/path/to/imagenet/tfrecord/files"

Now we can run the baseline session with native TensorFlow.

Note: We use the `--cache` flag to allow the script to cache checkpoint and frozen graph files to use with future sessions.

In [None]:
%run image_classification --model resnet_v1_50 \
    --data_dir $DATA_DIR \
    --cache

Look for the accuracy and timing information under:

```
results of resnet_v1_50:
    ...
```

You can compare the accuracy metrics for the ResNet 50 models with the metrics listed at: [Pre-trained model](https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model).

### TF-TRT Using FP32

In this session, we use the same precision mode as in our native TensorFlow session (FP32), but this time we use the `--use_trt` flag to convert the graph to a TensorRT optimized graph.

In [None]:
%run image_classification --model resnet_v1_50 \
    --data_dir $DATA_DIR \
    --use_trt \
    --cache

Before the script starts running inference, it converts the TensorFlow graph to a TensorRT optimized graph with fewer nodes.  Look for the following metrics in the log:

```
num_nodes(native_tf): ***
num_nodes(tftrt_total): ***
num_nodes(trt_only): ***
graph_size(MB)(native_tf): ***
graph_size(MB)(tft): ***
...
time(s)(trt_conversion): ***
```

Note: For a list of supported operations that can be converted to a TensorRT graph, see [Supported Ops](https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).

Again, note the accuracy and timing information under:

```
results of resnet_v1_50:
    ...
```

### TF-TRT Using FP16

In this session, we continue to use TF-TRT conversion, but we reduce the precision mode to FP16, allowing the use of Tensor Cores for performance improvements during inference, while preserving accuracy within the acceptable tolerance level (0.1%).

In [None]:
%run image_classification --model resnet_v1_50 \
    --data_dir $DATA_DIR \
    --use_trt \
    --precision fp16 \
    --cache

Again, we see that the native TensorFlow graph gets converted to a TensorRT graph.  Look again for the following in the log to confirm:

```
num_nodes(native_tf): ***
num_nodes(tftrt_total): ***
num_nodes(trt_only): ***
graph_size(MB)(native_tf): ***
graph_size(MB)(tft): ***
...
time(s)(trt_conversion): ***
```

Compare the results with the previous sessions:

```
results of resnet_v1_50:
    ...
```

### TF-TRT Using INT8

For this session we continue to use TF-TRT conversion, and we reduce the precision further to INT8 for faster computation.  Because INT8 has significantly lower precision and dynamic range than FP32, the INT8 precision mode requires an additional calibration step before performing the type conversion.  In this calibration step, inference is first run with FP32 precision on a calibration dataset to generate many INT8 quantizations of the weights and activations in the trained TensorFlow graph, from which are chosen the INT8 quantizations that minimize information loss.  For more details on the calibration process, see the [8-bit Inference with TensorRT presentation](http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf).

The calibration dataset should closely reflect the distribution of the problem dataset.  In this walkthrough, we use the same ImageNet validation set training data for the calibration data, with `--calib_data_dir $DATA_DIR`.

In [None]:
%run image_classification --model resnet_v1_50 \
    --data_dir $DATA_DIR \
    --use_trt \
    --precision int8 \
    --calib_data_dir $DATA_DIR \
    --cache

This time, we see the script performing the calibration step:

```
Calibrating INT8...
...
INFO:tensorflow:Evaluation [6/62]
INFO:tensorflow:Evaluation [12/62]
INFO:tensorflow:Evaluation [18/62]
...
```

The process completes with the message:

```
INT8 graph created.
```

When the calibration step completes -- it may take some time -- we again see that the native TensorFlow graph gets converted to a TensorRT graph.  Look again for the following in the log to confirm:

```
num_nodes(native_tf): ***
num_nodes(tftrt_total): ***
num_nodes(trt_only): ***
graph_size(MB)(native_tf): ***
graph_size(MB)(tft): ***
...
time(s)(trt_conversion): ***
```

Also notice the following INT8-specific timing information:

```
time(s)(trt_calibration): ***
...
time(s)(trt_int8_conversion): ***
```

Compare the results with the previous sessions:

```
results of resnet_v1_50:
    ...
```

## Summary

Congratulations!  You have run inference with an image classification model using various modes of precision and taking advantge of TensorRT inference optimization where possible.