## Lab - TensorRT + Profiling - Profiling
## E6692 Spring 2022

In this part you will write two Python scripts: one to do inference using the PyTorch YOLOv4-Tiny model `pytorch_inference.py` and one to do inference using the TensorRT YOLOv4-Tiny model `trt_inference.py`. Use the following guidelines when writing these scripts:

* Model weights and configuration file paths should be passed as command line arguments. Use `sys.argv` to manage the command line arguments.
* Use the OpenCV function`cv2.VideoCapture()` to read frames from the original video and `cv2.VideoWriter()` to write frames to the output file. 
* Measure the inference speed of the model and the end-to-end speed of the script including **reading/frame preprocess/inference/postprocess/frame write** with the `time` module. You're welcome to do more in depth timing, but only end-to-end and inference timing are required. Record the measurements by populating the table below.
* Generate a detected version of the 1st floor intersection video **test-lowres.mp4**. The output video names should be **test-lowres-pytorch-detected.mp4** and **test-lowres-tensorrt-detected.mp4**, respectively.

| Model Type | Model Input Size | Inference Speed (FPS) | End-to-end speed (FPS) |
| --- | --- | --- | --- |
| PyTorch | TODO | TODO | TODO |
| TensorRT | TODO | TODO | TODO |

After you've written the video detection scripts and visually inspected the output for correctness, the next step is to perform CUDA profiling to give some insights into how each program is performing. For the lab we will use the `nvprof` command line profiling tool. Go through the [user guide](https://docs.nvidia.com/cuda/profiler-users-guide/index.html) to familiarize yourself with `nvprof`.

Profiling tools give insights into specific metrics pertaining to memory usage, computational bottlenecks, and power consumption. 

**TODO:** Enter the command `nvprof --query-metrics` to list metrics available for profiling. Choose three that you think could be useful for our use case and describe what they indicate about the program.

A useful feature for identifying where a program could be further optimized is the [dependency analysis](https://docs.nvidia.com/cuda/profiler-users-guide/index.html#dependency-analysis) tool. Briefly explain what the dependency analysis tool does.


**TODO:** Your answer here. 

Next, you will profile your scripts `pytorch_inference.py` and `trt_inference.py`. To profile from the command line enter `nvprof <profiling_options> python3 <script_options>`. You should specify `--unified-memory-profiling off` to disable unified memory profiling (not supported for Jetson Nano) and `--dependency-analysis` to generate the dependency analysis report. Output the profiling results to text files `profiling_torch_log.txt` and `profiling_trt_log.txt` by including `--log-file <txt_file_path>` in the profiling options. 

**TODO:** Profile `pytorch_inference.py` and `trt_inference.py` to the specifications outlined above.

## Discussion

### Provide commentary on the results of the inference speed and the end-to-end speed measurements for the two detection scripts.

**TODO:** Your answer here.

### Identify some differences between the TensorRT and the PyTorch script profile output.

**TODO:** Your answer here.

### What, if anything, does the dependency analysis indicate can be optimized in each of the detection scripts?

**TODO:** Your answer here.