-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update consistent version tensorrt and triton
- Loading branch information
1 parent
b31a935
commit 7e54db0
Showing
4 changed files
with
91 additions
and
95 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,63 +1,86 @@ | ||
# Converting PyTorch to ONNX and TensorRT | ||
|
||
This repository provides a guide and code for converting PyTorch models to ONNX format and subsequently converting ONNX models to TensorRT for more efficient inference on NVIDIA GPUs. This README document will help you understand the project's structure and how to use it effectively. | ||
|
||
## Table of Contents | ||
|
||
1. [Converting PyTorch to ONNX](#converting-pytorch-to-onnx) | ||
|
||
- Export & load ONNX | ||
- Inference using ONNX | ||
- Comparing output and time efficiency between ONNX and PyTorch | ||
- Setting the batch size of input data: explicit batch or implicit batch | ||
|
||
2. [Converting ONNX to TensorRT](#converting-onnx-to-tensorrt) | ||
|
||
- Building & loading TensorRT engine | ||
- Setting the batch size of input data: explicit batch or implicit batch | ||
- Key trtexec options | ||
- Precision of engine: FP32, FP16 | ||
- optShapes: set the most used input data size of the model for inference | ||
- minShapes: set the max input data size of the model for inference | ||
- maxShapes: set the min input data size of the model for inference | ||
- Inference using the TensorRT engine | ||
- Comparing output and time efficiency among TensorRT, ONNX, and PyTorch | ||
|
||
3. [Environment Development](#environment-development) | ||
|
||
1. **Create Environment and Install Packages** | ||
|
||
```shell | ||
conda create -n convert-model python=3.9 | ||
``` | ||
|
||
```shell | ||
conda activate convert-model | ||
``` | ||
|
||
```shell | ||
pip install -r requirements.txt | ||
``` | ||
|
||
2. **Run** | ||
|
||
```shell | ||
python fetch_model.py | ||
``` | ||
|
||
```shell | ||
python pytorch_to_onnx.py | ||
``` | ||
|
||
```shell | ||
python onnx_to_tensorrt.py | ||
``` | ||
This repository is your guide to converting PyTorch models to ONNX format and then further optimizing them for efficient inference on NVIDIA GPUs using TensorRT. | ||
|
||
## Index | ||
|
||
- [Converting PyTorch to ONNX](#converting-pytorch-to-onnx) | ||
- [Converting ONNX to TensorRT](#converting-onnx-to-tensorrt) | ||
- [Download PyTorch model in Development Environment](#download-pytorch-model-in-development-environment) | ||
- [Convert model in TensorRT Docker Environment - Release 23.01](#convert-model-in-tensorrt-docker-environment---release-2301) | ||
- [References](#references) | ||
|
||
### Converting PyTorch to ONNX | ||
|
||
- Export & load ONNX | ||
- Inference using ONNX | ||
- Comparing output and time efficiency between ONNX and PyTorch | ||
- Setting the batch size of input data: explicit batch or implicit batch | ||
|
||
### Converting ONNX to TensorRT | ||
|
||
- Building & loading TensorRT engine | ||
- Setting the batch size of input data: explicit batch or implicit batch | ||
- Key trtexec options | ||
- Precision of engine: FP32, FP16 | ||
- optShapes: set the most used input data size of the model for inference | ||
- minShapes: set the max input data size of the model for inference | ||
- maxShapes: set the min input data size of the model for inference | ||
- Inference using the TensorRT engine | ||
- Comparing output and time efficiency among TensorRT, ONNX, and PyTorch | ||
|
||
### Download PyTorch model in Development Environment | ||
|
||
1. **Create Environment and Install Packages** | ||
|
||
```shell | ||
conda create -n convert-model python=3.9 | ||
``` | ||
|
||
```shell | ||
conda activate convert-model | ||
``` | ||
|
||
```shell | ||
pip install -r requirements.txt | ||
``` | ||
|
||
2. **Download PyTorch model** | ||
```shell | ||
python fetch_model.py | ||
``` | ||
|
||
### Convert model in TensorRT Docker Environment - Release 23.01 | ||
|
||
Release 23.01 is based on CUDA 12.0.1, which requires NVIDIA Driver release 525 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 450.51 (or later R450), 470.57 (or later R470), 510.47 (or later R510), 515.65 (or later R515), or 525.85 (or later R525). | ||
|
||
1. **Download TensorRT Docker environment** | ||
``` | ||
docker pull nvcr.io/nvidia/tensorrt:23.01-py3 | ||
``` | ||
2. **Run TensorRT Docker environment** | ||
``` | ||
nvidia-docker run -it --rm -v ./../image-search-engine:/workspace -w /workspace nvcr.io/nvidia/tensorrt:23.01-py3 bash | ||
``` | ||
3. **Install Packages** | ||
``` | ||
cd model_repository | ||
``` | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
4. **Convert PyTorch model to ONNX** | ||
``` | ||
python pytorch_to_onnx.py --dynamic_axes True --batch_size {batch_size} | ||
``` | ||
5. **Convert ONNX to TensorRT (FP16)** | ||
``` | ||
python onnx_to_tensorrt.py --dynamic_axes True --batch_size {batch_size} --engine_precision FP16 | ||
``` | ||
|
||
## References | ||
|
||
- [GitHub Repository](https://github.com/qbxlvnf11/convert-pytorch-onnx-tensorrt) | ||
- [TensorRT Container Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html#rel-23-01) | ||
- [Triton Inference Server Issue #2377](https://github.com/triton-inference-server/server/issues/2377) | ||
- [Triton Inference Server Issue #6059](https://github.com/triton-inference-server/server/issues/6059) | ||
- [NVIDIA TensorRT Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html#rel-23-01) | ||
|
||
Please follow the sections above to understand how to use the provided code and to perform the PyTorch to ONNX and ONNX to TensorRT conversions. For any issues or questions, refer to the provided references or the GitHub repository for further assistance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,6 @@ | ||
charset-normalizer==3.3.1 | ||
cmake==3.27.7 | ||
coloredlogs==15.0.1 | ||
cuda-python==12.3.0 | ||
filelock==3.13.0 | ||
flatbuffers==23.5.26 | ||
humanfriendly==10.0 | ||
idna==3.4 | ||
Jinja2==3.1.2 | ||
lit==17.0.3 | ||
MarkupSafe==2.1.3 | ||
mpmath==1.3.0 | ||
networkx==3.2.1 | ||
numpy==1.26.1 | ||
nvidia-cublas-cu11==11.10.3.66 | ||
nvidia-cublas-cu12==12.3.2.9 | ||
nvidia-cuda-cupti-cu11==11.7.101 | ||
nvidia-cuda-nvrtc-cu11==11.7.99 | ||
nvidia-cuda-nvrtc-cu12==12.3.52 | ||
nvidia-cuda-runtime-cu11==11.7.99 | ||
nvidia-cuda-runtime-cu12==12.3.52 | ||
nvidia-cudnn-cu11==8.5.0.96 | ||
nvidia-cudnn-cu12==8.9.4.25 | ||
nvidia-cufft-cu11==10.9.0.58 | ||
nvidia-curand-cu11==10.2.10.91 | ||
nvidia-cusolver-cu11==11.4.0.1 | ||
nvidia-cusparse-cu11==11.7.4.91 | ||
nvidia-nccl-cu11==2.14.3 | ||
nvidia-nvtx-cu11==11.7.91 | ||
onnx==1.15.0 | ||
onnxruntime==1.16.1 | ||
packaging==23.2 | ||
Pillow==10.1.0 | ||
protobuf==4.24.4 | ||
requests==2.31.0 | ||
sympy==1.12 | ||
tensorrt==8.6.1.post1 | ||
tensorrt-bindings==8.6.1 | ||
tensorrt-libs==8.6.1 | ||
torch==1.13.0 | ||
torchaudio==0.13.0 | ||
torchvision==0.14.0 | ||
triton==2.0.0 | ||
typing_extensions==4.8.0 | ||
urllib3==2.0.7 |