## PyTorch Inference Optimization 

## E6692 Spring 2022

In this notebook you will measure the throughput of your custom trained YOLOv4-Tiny model.

In [14]:
import torch # import modules
import sys
import time
import numpy as np
import time

from darknet_utils.darknet_to_pytorch import load_pytorch, load_darknet_as_pytorch # import custom modules
from darknet_utils.inference import measure_throughput, plot_execution_times

cfg_path = './cfg/yolov4-tiny-person-vehicle.cfg' # path to configuration file of custom model

torch_weights_path = './darknet/backup/yolov4-tiny-person-vehicle_final.weights' # path to JIT optimized model

device = 'cuda'

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


**TODO:** Load the custom trained YOLOv4-Tiny PyTorch model in the cell below. 

TIP: If you don't want to see the model summary, you can insert `%%capture` as the first line in the cell to hide all output.

In [15]:
# %%capture
# TODO: load the custom trained YOLOv4-Tiny with load_pytorch()

# pytorch_model = load_pytorch(cfg_path, torch_weights_path)
pytorch_model = load_darknet_as_pytorch(cfg_path, torch_weights_path)


### Define throughput in the context of deep learning models.

**TODO:** Your answer here.

Throughput refers to the number of data units processed in one unit of time.

**TODO:** Complete the function **measure_throughput()** in **darknet_utils/inference.py**.

In [26]:
# TODO: use measure_throughput() to measure the throughput of your PyTorch YOLOv4-Tiny
#       what is a reasonable input shape to measure throughput?
measure_throughput(pytorch_model, (1,3,480,640))

115.84283724449033

Now you will investigate the PyTorch [JIT](https://en.wikipedia.org/wiki/Just-in-time_compilation) functionality to further optimize YOLOv4-Tiny for inference on the Jetson Nano. [PyTorch JIT](https://pytorch.org/docs/stable/jit.html) has two methods for converting a standard PyTorch model to a [TorchScript](https://pytorch.org/docs/stable/jit.html#:~:text=TorchScript%20is%20a%20way%20to,there%20is%20no%20Python%20dependency.) model - script and trace. Torchscript models are optimizable and serializable. That means they can be saved independently from Python and used in other contexts like a C++ program. 

### What is meant by eager execution and graph execution?

**TODO:** Your answer here.

### Explain the difference between "script" and "trace". When would you use jit.script and when would you use jit.trace? Why do you need to include a sample input when using jit.trace?

**TODO:** Your answer here.

**TODO:** Use `torch.jit.trace()` to create a traced TorchScript model. Save the traced model weights to the weights directory with `torch.jit.save()`.

In [41]:
# TODO: Use torch.jit to generate a traced PyTorch version of YOLOv4-Tiny.

save_path = './weights/jit_trace_model_weights.pt'

example_forward_input = torch.rand(1,3,480,640).to(device)
pytorch_model.cuda()
# traced_model = torch.jit.trace()
traced_model = torch.jit.trace(pytorch_model.forward, example_forward_input)

# traced_model
torch.jit.save(traced_model, save_path)

In [42]:
# Verify that output of traced model matches output of original model

input_array = torch.randn((4, 3, 480, 640)).to(device) # define input tensor

torch_output = pytorch_model(input_array)[0].cpu().detach().numpy() # pass input through models
jit_output = traced_model(input_array)[0].cpu().detach().numpy()

print("JIT traced model matches PyTorch: ", np.allclose(torch_output, jit_output)) # compare outputs

JIT traced model matches PyTorch:  True


In [43]:
# TODO: use measure_throughput() to measure the throughput of your traced YOLOv4-Tiny
measure_throughput(traced_model, (1,3,480,640))

132.20748047196855

**TODO:** Calculate and plot throughput as a function of batch size for both models. Calculate throughput for the largest batch size you can fit into the Jetson Nano's memory. You're welcome to use **plot_execution_times()** or your own plotting function.

In [None]:
# TODO: calculate throughput as a function of batch size for both models


In [None]:
# TODO: plot throughput as a function of batch size for both models


### Discuss the results of your throughput measurements. Are there any surprising results? Which batch sizes result in the largest throughputs? Is this batch size reasonable to use in real-time applications?

**TODO:** Your answer here.