Update consistent version tensorrt and triton

vectornguyen76 · Nov 9, 2023 · 7e54db0 · 7e54db0
1 parent b31a935
commit 7e54db0
Show file tree

Hide file tree

Showing 4 changed files with 91 additions and 95 deletions.
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -2,7 +2,7 @@ version: "3"
 services:
   # triton_server:
   #   container_name: triton_server
-  #   image: nvcr.io/nvidia/tritonserver:23.09-py3
+  #   image: nvcr.io/nvidia/tritonserver:23.01-py3
   #   ports:
   #     - 9000:8000
   #     - 9001:8001

diff --git a/image-search-engine/app.py b/image-search-engine/app.py
@@ -119,3 +119,15 @@ async def search_image_base64(data: ImageBase64Request):
     result = [Product.from_point(point) for point in search_results.result]
 
     return result
+
+
+@app.post("/search-image-test")
+async def test(file: UploadFile = File(...)):
+    image_path = await save_image_file(file=file)
+
+    # Extract features from the uploaded image using the feature extractor
+    await feature_extractor.triton_extract_feature(
+        image_path=image_path, model_name=settings.TENSORRT_MODEL_NAME
+    )
+
+    return "done"
diff --git a/image-search-engine/model_repository/README.md b/image-search-engine/model_repository/README.md
@@ -1,63 +1,86 @@
 # Converting PyTorch to ONNX and TensorRT
 
-This repository provides a guide and code for converting PyTorch models to ONNX format and subsequently converting ONNX models to TensorRT for more efficient inference on NVIDIA GPUs. This README document will help you understand the project's structure and how to use it effectively.
-
-## Table of Contents
-
-1. [Converting PyTorch to ONNX](#converting-pytorch-to-onnx)
-
-   - Export & load ONNX
-   - Inference using ONNX
-   - Comparing output and time efficiency between ONNX and PyTorch
-   - Setting the batch size of input data: explicit batch or implicit batch
-
-2. [Converting ONNX to TensorRT](#converting-onnx-to-tensorrt)
-
-   - Building & loading TensorRT engine
-   - Setting the batch size of input data: explicit batch or implicit batch
-   - Key trtexec options
-     - Precision of engine: FP32, FP16
-     - optShapes: set the most used input data size of the model for inference
-     - minShapes: set the max input data size of the model for inference
-     - maxShapes: set the min input data size of the model for inference
-   - Inference using the TensorRT engine
-   - Comparing output and time efficiency among TensorRT, ONNX, and PyTorch
-
-3. [Environment Development](#environment-development)
-
-   1. **Create Environment and Install Packages**
-
-      ```shell
-      conda create -n convert-model python=3.9
-      ```
-
-      ```shell
-      conda activate convert-model
-      ```
-
-      ```shell
-      pip install -r requirements.txt
-      ```
-
-   2. **Run**
-
-      ```shell
-      python fetch_model.py
-      ```
-
-      ```shell
-      python pytorch_to_onnx.py
-      ```
-
-      ```shell
-      python onnx_to_tensorrt.py
-      ```
+This repository is your guide to converting PyTorch models to ONNX format and then further optimizing them for efficient inference on NVIDIA GPUs using TensorRT.
+
+## Index
+
+- [Converting PyTorch to ONNX](#converting-pytorch-to-onnx)
+- [Converting ONNX to TensorRT](#converting-onnx-to-tensorrt)
+- [Download PyTorch model in Development Environment](#download-pytorch-model-in-development-environment)
+- [Convert model in TensorRT Docker Environment - Release 23.01](#convert-model-in-tensorrt-docker-environment---release-2301)
+- [References](#references)
+
+### Converting PyTorch to ONNX
+
+- Export & load ONNX
+- Inference using ONNX
+- Comparing output and time efficiency between ONNX and PyTorch
+- Setting the batch size of input data: explicit batch or implicit batch
+
+### Converting ONNX to TensorRT
+
+- Building & loading TensorRT engine
+- Setting the batch size of input data: explicit batch or implicit batch
+- Key trtexec options
+  - Precision of engine: FP32, FP16
+  - optShapes: set the most used input data size of the model for inference
+  - minShapes: set the max input data size of the model for inference
+  - maxShapes: set the min input data size of the model for inference
+- Inference using the TensorRT engine
+- Comparing output and time efficiency among TensorRT, ONNX, and PyTorch
+
+### Download PyTorch model in Development Environment
+
+1. **Create Environment and Install Packages**
+
+   ```shell
+   conda create -n convert-model python=3.9
+   ```
+
+   ```shell
+   conda activate convert-model
+   ```
+
+   ```shell
+   pip install -r requirements.txt
+   ```
+
+2. **Download PyTorch model**
+   ```shell
+   python fetch_model.py
+   ```
+
+### Convert model in TensorRT Docker Environment - Release 23.01
+
+Release 23.01 is based on CUDA 12.0.1, which requires NVIDIA Driver release 525 or later. However, if you are running on a data center GPU (for example, T4 or any other data center GPU), you can use NVIDIA driver release 450.51 (or later R450), 470.57 (or later R470), 510.47 (or later R510), 515.65 (or later R515), or 525.85 (or later R525).
+
+1. **Download TensorRT Docker environment**
+   ```
+   docker pull nvcr.io/nvidia/tensorrt:23.01-py3
+   ```
+2. **Run TensorRT Docker environment**
+   ```
+   nvidia-docker run -it --rm -v ./../image-search-engine:/workspace -w /workspace nvcr.io/nvidia/tensorrt:23.01-py3 bash
+   ```
+3. **Install Packages**
+   ```
+   cd model_repository
+   ```
+   ```
+   pip install -r requirements.txt
+   ```
+4. **Convert PyTorch model to ONNX**
+   ```
+   python pytorch_to_onnx.py --dynamic_axes True --batch_size {batch_size}
+   ```
+5. **Convert ONNX to TensorRT (FP16)**
+   ```
+   python onnx_to_tensorrt.py --dynamic_axes True --batch_size {batch_size} --engine_precision FP16
+   ```
 
 ## References
 
 - [GitHub Repository](https://github.com/qbxlvnf11/convert-pytorch-onnx-tensorrt)
+- [TensorRT Container Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html#rel-23-01)
 - [Triton Inference Server Issue #2377](https://github.com/triton-inference-server/server/issues/2377)
 - [Triton Inference Server Issue #6059](https://github.com/triton-inference-server/server/issues/6059)
-- [NVIDIA TensorRT Release Notes](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-23-01.html#rel-23-01)
-
-Please follow the sections above to understand how to use the provided code and to perform the PyTorch to ONNX and ONNX to TensorRT conversions. For any issues or questions, refer to the provided references or the GitHub repository for further assistance.
diff --git a/image-search-engine/model_repository/requirements.txt b/image-search-engine/model_repository/requirements.txt
@@ -1,45 +1,6 @@
-charset-normalizer==3.3.1
-cmake==3.27.7
-coloredlogs==15.0.1
 cuda-python==12.3.0
-filelock==3.13.0
-flatbuffers==23.5.26
-humanfriendly==10.0
-idna==3.4
-Jinja2==3.1.2
-lit==17.0.3
-MarkupSafe==2.1.3
-mpmath==1.3.0
-networkx==3.2.1
-numpy==1.26.1
-nvidia-cublas-cu11==11.10.3.66
-nvidia-cublas-cu12==12.3.2.9
-nvidia-cuda-cupti-cu11==11.7.101
-nvidia-cuda-nvrtc-cu11==11.7.99
-nvidia-cuda-nvrtc-cu12==12.3.52
-nvidia-cuda-runtime-cu11==11.7.99
-nvidia-cuda-runtime-cu12==12.3.52
-nvidia-cudnn-cu11==8.5.0.96
-nvidia-cudnn-cu12==8.9.4.25
-nvidia-cufft-cu11==10.9.0.58
-nvidia-curand-cu11==10.2.10.91
-nvidia-cusolver-cu11==11.4.0.1
-nvidia-cusparse-cu11==11.7.4.91
-nvidia-nccl-cu11==2.14.3
-nvidia-nvtx-cu11==11.7.91
 onnx==1.15.0
 onnxruntime==1.16.1
-packaging==23.2
-Pillow==10.1.0
-protobuf==4.24.4
-requests==2.31.0
-sympy==1.12
-tensorrt==8.6.1.post1
-tensorrt-bindings==8.6.1
-tensorrt-libs==8.6.1
 torch==1.13.0
 torchaudio==0.13.0
 torchvision==0.14.0
-triton==2.0.0
-typing_extensions==4.8.0
-urllib3==2.0.7