Skip to content

Commit

Permalink
Update consistent version tensorrt and triton
Browse files Browse the repository at this point in the history
  • Loading branch information
vectornguyen76 committed Nov 9, 2023
1 parent b31a935 commit 7e54db0
Show file tree
Hide file tree
Showing 4 changed files with 91 additions and 95 deletions.
2 changes: 1 addition & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ version: "3"
services:
# triton_server:
# container_name: triton_server
# image: nvcr.io/nvidia/tritonserver:23.09-py3
# image: nvcr.io/nvidia/tritonserver:23.01-py3
# ports:
# - 9000:8000
# - 9001:8001
Expand Down
12 changes: 12 additions & 0 deletions image-search-engine/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,3 +119,15 @@ async def search_image_base64(data: ImageBase64Request):
result = [Product.from_point(point) for point in search_results.result]

return result


@app.post("/search-image-test")
async def test(file: UploadFile = File(...)):
image_path = await save_image_file(file=file)

# Extract features from the uploaded image using the feature extractor
await feature_extractor.triton_extract_feature(
image_path=image_path, model_name=settings.TENSORRT_MODEL_NAME
)

return "done"
133 changes: 78 additions & 55 deletions image-search-engine/model_repository/README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,86 @@
# Converting PyTorch to ONNX and TensorRT

This repository provides a guide and code for converting PyTorch models to ONNX format and subsequently converting ONNX models to TensorRT for more efficient inference on NVIDIA GPUs. This README document will help you understand the project's structure and how to use it effectively.

## Table of Contents

1. [Converting PyTorch to ONNX](#converting-pytorch-to-onnx)

- Export & load ONNX
- Inference using ONNX
- Comparing output and time efficiency between ONNX and PyTorch
- Setting the batch size of input data: explicit batch or implicit batch

2. [Converting ONNX to TensorRT](#converting-onnx-to-tensorrt)

- Building & loading TensorRT engine
- Setting the batch size of input data: explicit batch or implicit batch
- Key trtexec options
- Precision of engine: FP32, FP16
- optShapes: set the most used input data size of the model for inference
- minShapes: set the max input data size of the model for inference
- maxShapes: set the min input data size of the model for inference
- Inference using the TensorRT engine
- Comparing output and time efficiency among TensorRT, ONNX, and PyTorch

3. [Environment Development](#environment-development)

1. **Create Environment and Install Packages**

```shell
conda create -n convert-model python=3.9
```

```shell
conda activate convert-model
```

```shell
pip install -r requirements.txt
```

2. **Run**

```shell
python fetch_model.py
```

```shell
python pytorch_to_onnx.py
```

```shell
python onnx_to_tensorrt.py
```
This repository is your guide to converting PyTorch models to ONNX format and then further optimizing them for efficient inference on NVIDIA GPUs using TensorRT.

## Index

- [Converting PyTorch to ONNX](#converting-pytorch-to-onnx)
- [Converting ONNX to TensorRT](#converting-onnx-to-tensorrt)
- [Download PyTorch model in Development Environment](#download-pytorch-model-in-development-environment)
- [Convert model in TensorRT Docker Environment - Release 23.01](#convert-model-in-tensorrt-docker-environment---release-2301)
- [References](#references)

### Converting PyTorch to ONNX

- Export & load ONNX
- Inference using ONNX
- Comparing output and time efficiency between ONNX and PyTorch
- Setting the batch size of input data: explicit batch or implicit batch

### Converting ONNX to TensorRT

- Building & loading TensorRT engine
- Setting the batch size of input data: explicit batch or implicit batch
- Key trtexec options
- Precision of engine: FP32, FP16
- optShapes: set the most used input data size of the model for inference
- minShapes: set the max input data size of the model for inference
- maxShapes: set the min input data size of the model for inference
- Inference using the TensorRT engine
- Comparing output and time efficiency among TensorRT, ONNX, and PyTorch

### Download PyTorch model in Development Environment

1. **Create Environment and Install Packages**

```shell
conda create -n convert-model python=3.9
```

```shell
conda activate convert-model
```

```shell
pip install -r requirements.txt
```

2. **Download PyTorch model**
```shell
python fetch_model.py
```

### Convert model in TensorRT Docker Environment - Release 23.01

Release 23.01 is based on CUDA 12.0.1, which requires NVIDIA Driver release 525 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 450.51 (or later R450), 470.57 (or later R470), 510.47 (or later R510), 515.65 (or later R515), or 525.85 (or later R525).

1. **Download TensorRT Docker environment**
```
docker pull nvcr.io/nvidia/tensorrt:23.01-py3
```
2. **Run TensorRT Docker environment**
```
nvidia-docker run -it --rm -v ./../image-search-engine:/workspace -w /workspace nvcr.io/nvidia/tensorrt:23.01-py3 bash
```
3. **Install Packages**
```
cd model_repository
```
```
pip install -r requirements.txt
```
4. **Convert PyTorch model to ONNX**
```
python pytorch_to_onnx.py --dynamic_axes True --batch_size {batch_size}
```
5. **Convert ONNX to TensorRT (FP16)**
```
python onnx_to_tensorrt.py --dynamic_axes True --batch_size {batch_size} --engine_precision FP16
```

## References

- [GitHub Repository](https://github.com/qbxlvnf11/convert-pytorch-onnx-tensorrt)
- [TensorRT Container Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html#rel-23-01)
- [Triton Inference Server Issue #2377](https://github.com/triton-inference-server/server/issues/2377)
- [Triton Inference Server Issue #6059](https://github.com/triton-inference-server/server/issues/6059)
- [NVIDIA TensorRT Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html#rel-23-01)

Please follow the sections above to understand how to use the provided code and to perform the PyTorch to ONNX and ONNX to TensorRT conversions. For any issues or questions, refer to the provided references or the GitHub repository for further assistance.
39 changes: 0 additions & 39 deletions image-search-engine/model_repository/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,45 +1,6 @@
charset-normalizer==3.3.1
cmake==3.27.7
coloredlogs==15.0.1
cuda-python==12.3.0
filelock==3.13.0
flatbuffers==23.5.26
humanfriendly==10.0
idna==3.4
Jinja2==3.1.2
lit==17.0.3
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.3.2.9
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.3.52
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.3.52
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.4.25
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
onnx==1.15.0
onnxruntime==1.16.1
packaging==23.2
Pillow==10.1.0
protobuf==4.24.4
requests==2.31.0
sympy==1.12
tensorrt==8.6.1.post1
tensorrt-bindings==8.6.1
tensorrt-libs==8.6.1
torch==1.13.0
torchaudio==0.13.0
torchvision==0.14.0
triton==2.0.0
typing_extensions==4.8.0
urllib3==2.0.7

0 comments on commit 7e54db0

Please sign in to comment.