<a href="https://colab.research.google.com/github/vinhngx/TRTorch/blob/colab-lenet/notebooks/Colab-LeNet-example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2019 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# TRTorch LeNet Demo on Google Colab

## Overview

In the practice of developing machine learning models, there are few tools as approachable as PyTorch for developing and experimenting in designing machine learning models. The power of PyTorch comes from its deep integration into Python, its flexibility and its approach to automatic differentiation and execution (eager execution). However, when moving from research into production, the requirements change and we may no longer want that deep Python integration and we want optimization to get the best performance we can on our deployment platform. In PyTorch 1.0, TorchScript was introduced as a method to separate your PyTorch model from Python, make it portable and optimizable. TorchScript uses PyTorch's JIT compiler to transform your normal PyTorch code which gets interpreted by the Python interpreter to an intermediate representation (IR) which can have optimizations run on it and at runtime can get interpreted by the PyTorch JIT interpreter. For PyTorch this has opened up a whole new world of possibilities, including deployment in other languages like C++. It also introduces a structured graph based format that we can use to do down to the kernel level optimization of models for inference.

When deploying on NVIDIA GPUs TensorRT, NVIDIA's Deep Learning Optimization SDK and Runtime is able to take models from any major framework and specifically tune them to perform better on specific target hardware in the NVIDIA family be it an A100, TITAN V, Jetson Xavier or NVIDIA's Deep Learning Accelerator. TensorRT performs a couple sets of optimizations to achieve this. TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU. TensorRT also has strong support for reduced operating precision execution which allows users to leverage the Tensor Cores on Volta and newer GPUs as well as reducing memory and computation footprints on device.

TRTorch is a compiler that uses TensorRT to optimize TorchScript code, compiling standard TorchScript modules into ones that internally run with TensorRT optimizations. This enables you to continue to remain in the PyTorch ecosystem, using all the great features PyTorch has such as module composability, its flexible tensor implementation, data loaders and more. TRTorch is available to use with both PyTorch and LibTorch.

### Learning objectives

This notebook demonstrates the steps for compiling a TorchScript module with TRTorch on a simple LeNet network. 

## Content
1. [Requirements](#1)
1. [Creating TorchScript modules](#2)
1. [Compiling with TRTorch](#3)

<a id="1"></a>
## 1. Requirements
First, we will install several required extra packages, then proceed to compile and install TRTorch from source.

Make sure you choose GPU as the execution environment via `Runtime -> Change runtime type -> GPU`.

In [None]:
!nvidia-smi

Tue Jul  7 13:59:05 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   30C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
! add-apt-repository -y ppa:graphics-drivers/ppa
! apt update


 Fresh drivers from upstream, currently shipping Nvidia.

## Current Status

Current long-lived branch release: `nvidia-430` (430.40)
Dropped support for Fermi series (https://nvidia.custhelp.com/app/answers/detail/a_id/4656)

Old long-lived branch release: `nvidia-390` (390.129)

For GF1xx GPUs use `nvidia-390` (390.129)
For G8x, G9x and GT2xx GPUs use `nvidia-340` (340.107)
For NV4x and G7x GPUs use `nvidia-304` (304.137) End-Of-Life!

Support timeframes for Unix legacy GPU releases:
https://nvidia.custhelp.com/app/answers/detail/a_id/3142

## What we're working on right now:

- Normal driver updates
- Help Wanted: Mesa Updates for Intel/AMD users, ping us if you want to help do this work, we're shorthanded.


This PPA is currently in testing, you should be experienced with packaging before you dive in here:

Volunteers welcome!

### How you can help:

## Install PTS and benchmark your gear:

    sudo apt-get install phoronix-test-suite

Run the benchmark:

    phoronix-test-suite de

In [None]:
!apt-get update && apt-get install build-dep build-essential


0% [Working]            Hit:1 https://storage.googleapis.com/bazel-apt stable InRelease
0% [Connecting to archive.ubuntu.com (91.189.88.152)] [Connecting to security.u0% [1 InRelease gpgv 2,256 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Ign:2 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
0% [1 InRelease gpgv 2,256 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Hit:3 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
0% [1 InRelease gpgv 2,256 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Hit:4 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
0% [1 InRelease gpgv 2,256 B] [Connecting to archive.ubuntu.com (

In [None]:
!sudo apt install nvidia-driver-440

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-driver-440 : Depends: libnvidia-gl-440 (= 440.100-0ubuntu0.18.04.1) but it is not going to be installed
                     Depends: nvidia-dkms-440 (<= 440.100-1)
                     Depends: nvidia-dkms-440 (>= 440.100)
                     Depends: nvidia-kernel-common-440 (<= 440.100-1) but it is not going to be installed
                     Depends: nvidia-kernel-common-440 (>= 440.100) but it is not going to be installed
                     Depends: nvidia-kernel-source-440 (= 440.100-0ubuntu0.18.04.1) but it is not going to be installed
 

In [None]:
!sudo apt-get install -f

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1


### Install Bazel

In [None]:
%%bash
apt update && apt install curl gnupg
curl https://bazel.build/bazel-release.pub.gpg | apt-key add -
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | tee /etc/apt/sources.list.d/bazel.list

apt update && apt install bazel-3.2.0
ln -s /usr/bin/bazel-3.2.0 /usr/bin/bazel

Hit:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
Ign:2 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:6 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:7 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Hit:9 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:11 http://ppa.launchpad.net/marutter/c2d4u3.5/ubuntu bionic InRelease
Hit:13 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Reading package lists...
Building dependency tree...
Reading state information...
65 packages can be upgraded. Run 'apt list --upgrada



W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1


W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3199  100  3199    0     0  35943      0 --:--:-- --:--:-- --:--:-- 35943


W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learnin

In [None]:
!bazel

Extracting Bazel installation...
                                                           [bazel release 3.2.0]
Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  aquery              Analyzes the given targets and queries the action graph.
  build               Builds the specified targets.
  canonicalize-flags  Canonicalizes a list of bazel options.
  clean               Removes output files and optionally stops the server.
  coverage            Generates code coverage report for specified test targets.
  cquery              Loads, analyzes, and queries the specified targets w/ configurations.
  dump                Dumps the internal state of the bazel server process.
  fetch               Fetches external repositories that are prerequisites to the targets.
  help                Prints help for commands, or the index.
  info                Displays runtime info about the bazel server.
  license             Prints the licens

### Install Cuda

In [None]:
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.3 LTS
Release:	18.04
Codename:	bionic


In [None]:
%%bash
os="ubuntu1804"
cuda="10.2.89"
wget https://developer.download.nvidia.com/compute/cuda/repos/${os}/x86_64/cuda-repo-${os}_${cuda}-1_amd64.deb
sudo dpkg --force-all -i cuda-repo-*.deb

(Reading database ... 144495 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1804_10.2.89-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1804 (10.2.89-1) over (10.2.89-1) ...
Setting up cuda-repo-ubuntu1804 (10.2.89-1) ...


--2020-07-07 12:39:38--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2936 (2.9K) [application/x-deb]
Saving to: ‘cuda-repo-ubuntu1804_10.2.89-1_amd64.deb.1’

     0K ..                                                    100%  136M=0s

2020-07-07 12:39:38 (136 MB/s) - ‘cuda-repo-ubuntu1804_10.2.89-1_amd64.deb.1’ saved [2936/2936]


Configuration file '/etc/apt/sources.list.d/cuda.list'
 ==> File on system created by you or by a script.
 ==> File also in package provided by package maintainer.
 ==> Keeping old config file as default.


In [None]:
!apt install cuda-10-2

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  cuda-command-line-tools-10-2 cuda-compiler-10-2 cuda-cudart-10-2
  cuda-cudart-dev-10-2 cuda-cufft-10-2 cuda-cufft-dev-10-2 cuda-cuobjdump-10-2
  cuda-cupti-10-2 cuda-cupti-dev-10-2 cuda-curand-10-2 cuda-curand-dev-10-2
  cuda-cusolver-10-2 cuda-cusolver-dev-10-2 cuda-cusparse-10-2
  cuda-cusparse-dev-10-2 cuda-demo-suite-10-2 cuda-documentation-10-2
  cuda-driver-dev-10-2 cuda-gdb-10-2 cuda-libraries-10-2
  cuda-libraries-dev-10-2 cuda-license-10-2 cuda-memcheck-10-2
  cuda-misc-headers-10-2 cuda-npp-10-2 cuda-npp-dev-10-2 cuda-nsight-10-2
  cuda-nsight-compute-10-2 cuda-nsight-systems-10-2 cuda-nvcc-10-2
  cuda-nvdisasm-10-2 cuda-nvgraph-10-2 cuda-nvgraph-dev-10-2 cuda-nvjpeg-10-2
  cuda-nvjpeg-dev-10-2

In [None]:
!ls -l /usr/local

total 72
drwxr-xr-x  1 root root 4096 Jul  7 12:30 bin
lrwxrwxrwx  1 root root    9 Jul  7 12:47 cuda -> cuda-10.2
drwxr-xr-x 16 root root 4096 Jun 26 16:18 cuda-10.0
drwxr-xr-x  1 root root 4096 Jun 26 16:20 cuda-10.1
drwxr-xr-x 16 root root 4096 Jul  7 12:47 cuda-10.2
drwxr-xr-x  1 root root 4096 Jun 26 16:27 etc
drwxr-xr-x  2 root root 4096 Oct 29  2019 games
drwxr-xr-x  2 root root 4096 Jun 26 16:38 _gcs_config_ops.so
drwxr-xr-x  1 root root 4096 Jun 26 16:46 include
drwxr-xr-x  1 root root 4096 Jun 26 16:47 lib
-rw-r--r--  1 root root 1636 Jun 26 16:40 LICENSE.txt
lrwxrwxrwx  1 root root    9 Oct 29  2019 man -> share/man
drwxr-xr-x  2 root root 4096 Oct 29  2019 sbin
-rw-r--r--  1 root root 7291 Jun 26 16:40 setup.cfg
drwxr-xr-x  1 root root 4096 Jun 26 16:27 share
drwxr-xr-x  2 root root 4096 Oct 29  2019 src
drwxr-xr-x  2 root root 4096 Jun 26 16:48 xgboost


### Install TensorRT

In [None]:
%%bash
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

dpkg -i nvidia-machine-learning-repo-*.deb
apt-get update

(Reading database ... 144495 files and directories currently installed.)
Preparing to unpack nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb ...
Unpacking nvidia-machine-learning-repo-ubuntu1804 (1.0.0-1) over (1.0.0-1) ...
Setting up nvidia-machine-learning-repo-ubuntu1804 (1.0.0-1) ...
Hit:1 https://storage.googleapis.com/bazel-apt stable InRelease
Ign:2 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:3 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Ign:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:6 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:7 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:9 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease
Hit:10 http://archive.ubuntu.com/ubuntu bionic-update

--2020-07-07 12:40:38--  https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2926 (2.9K) [application/x-deb]
Saving to: ‘nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb.1’

     0K ..                                                    100%  117M=0s

2020-07-07 12:40:38 (117 MB/s) - ‘nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb.1’ saved [2926/2926]

W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1

In [None]:
%%bash
version="6.0.1-1+cuda10.2"
sudo apt-get install libnvinfer6=${version} libnvonnxparsers6=${version} libnvparsers6=${version} libnvinfer-plugin6=${version} libnvinfer-dev=${version} libnvonnxparsers-dev=${version} libnvparsers-dev=${version} libnvinfer-plugin-dev=${version} python-libnvinfer=${version} python3-libnvinfer=${version}

Reading package lists...
Building dependency tree...
Reading state information...
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following packages will be upgraded:
  libnvinfer-dev libnvinfer-plugin-dev libnvinfer-plugin6 libnvinfer6
  libnvonnxparsers-dev libnvonnxparsers6 libnvparsers-dev libnvparsers6
  python-libnvinfer python3-libnvinfer
10 upgraded, 0 newly installed, 0 to remove and 53 not upgraded.
Need to get 156 MB of archives.
After this operation, 33.8 MB of additional disk space will be used.
Get:1 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  libnvinfer-plugin-dev 6.0.1-1+cuda10.2 [1,874 kB]
Get:2 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  libnvonnxparsers-dev 6.0.1-1+cuda10.2 [166 kB]
Get:3 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  python3-libnvi

debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 10.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
W: Target Packages (Packages) is configured multiple times in /etc/apt/sources.list.d/nvidia-machine-learning.list:1 and /etc/apt/sources.list.d/nvidia-ml.list:1


Check the installed TensorRT version.

In [None]:
!dpkg -l | grep TensorRT

ii  libnvinfer-dev                          6.0.1-1+cuda10.2                                  amd64        TensorRT development libraries and headers
ii  libnvinfer-plugin-dev                   6.0.1-1+cuda10.2                                  amd64        TensorRT plugin libraries
ii  libnvinfer-plugin6                      6.0.1-1+cuda10.2                                  amd64        TensorRT plugin libraries
ii  libnvinfer6                             6.0.1-1+cuda10.2                                  amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                    6.0.1-1+cuda10.2                                  amd64        TensorRT ONNX libraries
ii  libnvonnxparsers6                       6.0.1-1+cuda10.2                                  amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                        6.0.1-1+cuda10.2                                  amd64        TensorRT parsers libraries
ii  libnvparsers6                           6.0.1-1+cu

### Install PyTorch

In [None]:
#!pip install --upgrade --force-reinstall torch==1.5.0
!pip install --upgrade --force-reinstall torch==1.5.1+cu101 torchvision

[31mERROR: Could not find a version that satisfies the requirement torch==1.5.1+cu101 (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 0.3.1, 0.4.0, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1)[0m
[31mERROR: No matching distribution found for torch==1.5.1+cu101[0m


In [None]:
print(torch.__version__)
print(torch.version.cuda)
print(torch.backends.cudnn.version())
print(torch._C._cuda_getDriverVersion())

1.5.1
10.2
7605
10010


In [None]:
torch._C._cuda_isDriverSufficient()

False

### Clone the repo and build TRTorch

In [None]:
%%bash
cd /content
git clone https://github.com/vinhngx/TRTorch



Cloning into 'TRTorch'...
bash: line 7: bazel: command not found
bash: line 9: cd: /workspace/TRTorch/py: No such file or directory
python3: can't open file 'setup.py': [Errno 2] No such file or directory


Finally, we are ready to build and install TRTorch.

In [None]:
%%bash
cd /content/TRTorch
cp notebooks/WORKSPACE .

bazel build //:libtrtorch --compilation_mode opt



Loading: 
Loading: 0 packages loaded
Analyzing: target //:libtrtorch (0 packages loaded, 0 targets configured)
INFO: Analyzed target //:libtrtorch (5 packages loaded, 2323 targets configured).

INFO: Found 1 target...
[0 / 9] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[260 / 516] [Prepa] Symlinking virtual headers for @libtorch//:c10
[449 / 516] checking cached actions
[449 / 516] [Prepa] action 'SolibSymlink _solib_k8/_U@tensorrt_S_S_Cnvinfer_Ulib___Uexternal_Stensorrt_Slib_Sx86_U64-linux-gnu/libnvinfer.so'
[452 / 516] checking cached actions
[452 / 516] [Prepa] action 'SolibSymlink _solib_k8/_U@cudnn_S_S_Ccudnn_Ulib___Uexternal_Scudnn_Slib_Sx86_U64-linux-gnu/libcudnn.so.7.6.5'
[454 / 516] Compiling cpp/api/src/extra_info.cpp; 6s processwrapper-sandbox
[455 / 516] Compiling core/conversion/conversion.cpp; 3s processwrapper-sandbox ... (2 actions, 1 running)
[456 / 516] Compiling core/conversion/conversion.cpp; 11s processwrapper-sandbox ... (2 actions running)
[457 / 516] Co

In [None]:
%%bash
cd /content/TRTorch/py
python setup.py install

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running install
building libtrtorch
creating version file
copying library into module
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/trtorch
copying trtorch/_version.py -> build/lib.linux-x86_64-3.6/trtorch
copying trtorch/logging.py -> build/lib.linux-x86_64-3.6/trtorch
copying trtorch/_extra_info.py -> build/lib.linux-x86_64-3.6/trtorch
copying trtorch/__init__.py -> build/lib.linux-x86_64-3.6/trtorch
copying trtorch/_types.py -> build/lib.linux-x86_64-3.6/trtorch
copying trtorch/_compiler.py -> build/lib.linux-x86_64-3.6/trtorch
running egg_info
creating trtorch.egg-info
writing trtorch.egg-info/PKG-INFO
writing dependency_links to trtorch.egg-info/dependency_links.txt
writing requirements to trtorch.egg-info/requires.txt
writing top-level names to trtorch.egg-info/top_level.txt
writing manifest file 'trtorch.egg-info/SOURCES.txt'
writing manifest file '

Loading: 
Loading: 0 packages loaded
INFO: Build options --cxxopt, --define, and --linkopt have changed, discarding analysis cache.
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 0 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62 targets configured)
Analyzing: target //cpp/api/lib:libtrtorch.so (0 packages loaded, 62

<a id="2"></a>
## 2. Creating TorchScript modules

Here we create two submodules for a feature extractor and a classifier and stitch them together in a single LeNet module. In this case this is overkill but modules give us granular control over our program including where we decide to optimize and where we don't. It is also the unit that the TorchScript compiler operates on. So you can decide to only convert/optimize the feature extractor and leave the classifier in standard PyTorch or you can convert the whole thing. When compiling your module to TorchScript, there are two paths: Tracing and Scripting.  

In [None]:
import torch 
from torch import nn
import torch.nn.functional as F

class LeNetFeatExtractor(nn.Module):
    def __init__(self):
        super(LeNetFeatExtractor, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        return x

class LeNetClassifier(nn.Module):
    def __init__(self):
        super(LeNetClassifier, self).__init__()
        self.fc1 = nn.Linear(16 * 6 * 6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = torch.flatten(x,1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.feat = LeNetFeatExtractor()
        self.classifer = LeNetClassifier()

    def forward(self, x):
        x = self.feat(x)
        x = self.classifer(x)
        return x


Let us define a helper function to benchmark a model.

In [None]:
import time
import numpy as np

def benchmark(model, input_shape=(1024, 1, 32, 32), dtype='fp32', nwarmup=50, nruns=10000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype=='fp16':
        input_data = input_data.half()
        
    for _ in range(nwarmup):
        results = model(input_data)
    
    start_time = time.time()
    time_arr = []
    for _ in range(1, nruns+1):
        start_time = time.time()
        results = model(input_data)
        time_arr.append(time.time() - start_time)
        
        if _%1000==0:
            print('Iteration %d, ave batch time %.2f ms'%(_, np.mean(time_arr)*1000))
     
    print('Average batch time: %.2f ms'%(np.mean(time_arr)*1000))       

### PyTorch model

In [None]:
model = LeNet()
model.to("cuda").eval()


AssertionError: ignored

In [None]:
benchmark(model)

AssertionError: ignored

### Tracing

In [None]:
traced_model = torch.jit.trace(model, torch.empty([1,1,32,32]).to("cuda"))
traced_model

In [None]:
benchmark(traced_model)

### Scripting

In [None]:
model = LeNet().to("cuda").eval()
script_model = torch.jit.script(model)


In [None]:
script_model

In [None]:
benchmark(script_model)

<a id="3"></a>
## 3. Compiling with TRTorch

### TorchScript traced model

First, we compile the TorchScript traced model with TRTorch. Notice the performance impact.

In [None]:
import trtorch

compile_settings = {
    "input_shapes": [
        {
            "min" : [1, 1, 32, 32],
            "opt" : [1, 1, 33, 33],
            "max" : [1, 1, 34, 34],
        }
    ],
    "op_precision": torch.half # Run with FP16
}

trt_ts_module = trtorch.compile(traced_model, compile_settings)

input_data = torch.randn((1, 1, 32, 32))
input_data = input_data.half().to("cuda")

input_data = input_data.half()
result = trt_ts_module(input_data)
torch.jit.save(trt_ts_module, "trt_ts_module.ts")

In [None]:
benchmark(trt_ts_module, dtype="fp16")

### TorchScript script model

Next, we compile the TorchScript script model with TRTorch. Notice the performance impact.

In [None]:
import trtorch

compile_settings = {
    "input_shapes": [
        {
            "min" : [1, 1, 32, 32],
            "opt" : [1, 1, 33, 33],
            "max" : [1, 1, 34, 34],
        }
    ],
    "op_precision": torch.half # Run with FP16
}

trt_script_module = trtorch.compile(script_model, compile_settings)

input_data = torch.randn((1, 1, 32, 32))
input_data = input_data.half().to("cuda")

input_data = input_data.half()
result = trt_script_module(input_data)
torch.jit.save(trt_script_module, "trt_script_module.ts")

In [None]:
benchmark(trt_ts_module, dtype="fp16")

# Conclusion

In this notebook, we have walked through the complete process of compiling TorchScript models with TRTorch and test the performance impact of the optimization.

## What's next
Now it's time to try TRTorch on your own model. Fill out issues at https://github.com/NVIDIA/TRTorch. Your involvement will help future development of TRTorch.
