Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows TensorRT Python interface compatibility #390

Closed
IamNaQi opened this issue Apr 14, 2022 · 6 comments · Fixed by #396
Closed

Windows TensorRT Python interface compatibility #390

IamNaQi opened this issue Apr 14, 2022 · 6 comments · Fixed by #396
Labels
bug / fix Something isn't working enhancement New feature or request question Further information is requested

Comments

@IamNaQi
Copy link
Contributor

IamNaQi commented Apr 14, 2022

🐛 Describe the bug

Hi I run your given notebook on windows for python inference on windows 10
https://github.com/zhiqwang/yolov5-rt-stack/blob/main/notebooks/onnx-graphsurgeon-inference-tensorrt.ipynb

but I could not get a better result
here is code sample that I used from your given notebook
I have tried with different thresh holds and but didn't try other precision as it's supports fp32 for now

batch_size = 1
img_size = 640
size_divisible = 64
fixed_shape = True
score_thresh = 0.35
nms_thresh = 0.45
detections_per_img = 100
precision = "fp32"  # Currently only supports fp32
model_path = r"C:\Users\Naqi\Downloads\yolov5n6.pt"
onnx_path = "yolov5n6.onnx"
engine_path = "yolov5n6.engine"
from yolort.runtime.trt_helper import export_tensorrt_engine
input_sample = torch.rand(batch_size, 3, img_size, img_size)
export_tensorrt_engine(
    model_path,
    score_thresh=score_thresh,
    nms_thresh=nms_thresh,
    onnx_path=onnx_path,
    engine_path=engine_path,
    input_sample=input_sample,
    detections_per_img=detections_per_img,
)

output: model saved and can show input shape

Saved ONNX model to yolov5n6.onnx
Network Description
Input 'images' with shape (8, 3, 640, 640) and dtype DataType.FLOAT
Output 'num_detections' with shape (8, 1) and dtype DataType.INT32
Output 'detection_boxes' with shape (8, 1000, 4) and dtype DataType.FLOAT
Output 'detection_scores' with shape (8, 1000) and dtype DataType.FLOAT
Output 'detection_classes' with shape (8, 1000) and dtype DataType.INT32
Building fp32 Engine in yolov5n6.engine
Using fp32 mode.
Serialize engine success, saved as yolov5n6.engine

**While prediction **

import torch
from yolort.runtime import PredictorTRT
import cv2 as cv
import numpy as np
from yolort.runtime import PredictorTRT
device = torch.device("cuda")
engine_path = "yolov5n6.engine"
y_runtime = PredictorTRT(engine_path, device=device)
y_runtime.warmup()
predictions_trt = y_runtime.predict(r"new_york.jpg")
predictions_trt

Error: it seems that it detect but giving empty tensors with size(0,4)

[{'scores': tensor([0.72805, 0.63373, 0.63361, 0.60388, 0.52863], device='cuda:0'),
  'labels': tensor([0, 0, 0, 0, 0], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([[1718.20227,  469.38287, 1798.75513,  772.85791],
          [ 785.19324,  310.30093, 1113.29858, 1166.43042],
          [1403.16125,  484.76923, 1505.06226,  794.00586],
          [1610.10339,  479.57529, 1727.66260,  761.01874],
          [1495.20166,  512.73260, 1584.91882,  764.76276]], device='cuda:0')},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', **size=(0, 4)**)},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', **size=(0, 4)**)},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))},
 {'scores': tensor([], device='cuda:0'),
  'labels': tensor([], device='cuda:0', dtype=torch.int32),
  'boxes': tensor([], device='cuda:0', size=(0, 4))}]

Here is the out put image,
output

please help me out I hope I explain my issue better.

Versions

PyTorch version: 1.8.2+cu111
CUDA used to build PyTorch: 11.1
We're using TensorRT: 8.4.0.6 on cuda device: 0.
OS: Microsoft Windows 10 Home
CMake version: version 3.23.0

@zhiqwang zhiqwang added the question Further information is requested label Apr 15, 2022
@zhiqwang
Copy link
Owner

Hi @IamNaQi ,

Because we use PyTorch to do the data binding in the TensorRT Python interface, this will involve pointer manipulation, and this approach may have some limitations on cross-platform.

https://github.com/zhiqwang/yolov5-rt-stack/blob/0c88e4f44646092078d5d55caed575dc2d26823d/yolort/runtime/y_tensorrt.py#L159-L160

We have verified the accuracy of the C++ example on Windows system #389 , we should add more tests and more docs for this.

@IamNaQi
Copy link
Contributor Author

IamNaQi commented Apr 19, 2022

Thank you very much for your kind response

Accuracy of the C++ example on Windows system is working very smoothly, I have tested without copying DLLs into debug folder and working perfectly with build by new cmake list, and result is awesome.
That was a mistake, I have RTX 3060 but CUDA version was installed 10.2 which is not compatible with RTX 3060. I just update it to CUDA 11.6 and build with new cmake with Visual Studio 2019.

Environment

PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.23.0
Libc version: N/A

Python version: 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 511.65
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.11.0+cu113
[pip3] torchaudio==0.11.0+cu113
[pip3] torchvision==0.12.0+cu113
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h59b6b97_2
[conda] libblas                   3.9.0              12_win64_mkl    conda-forge
[conda] libcblas                  3.9.0              12_win64_mkl    conda-forge
[conda] liblapack                 3.9.0              12_win64_mkl    conda-forge
[conda] mkl                       2021.4.0           h0e2418a_729    conda-forge
[conda] mkl-service               2.4.0            py39h6b0492b_0    conda-forge
[conda] mkl_fft                   1.3.1            py39h0cb33c3_1    conda-forge
[conda] mkl_random                1.2.2            py39h2e25243_0    conda-forge
[conda] mypy_extensions           0.4.3            py39hcbf5309_5    conda-forge
[conda] numpy                     1.22.3                   pypi_0    pypi
[conda] numpy-base                1.20.3           py39hc2deb75_0
[conda] numpydoc                  1.2.1              pyhd8ed1ab_2    conda-forge
[conda] pytorch                   1.11.0          py3.9_cuda11.3_cudnn8_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.11.0               py39_cu113    pytorch
[conda] torchvision               0.12.0               py39_cu113    pytorch

Result of the C++ example on Windows system

result_new_yolork

Python inference still giving errors because of my environment issue. I am working on it and will update when solved

@zhiqwang
Copy link
Owner

The C++ inference results are perfect! And seems that you're using TensorRT EA version. EA version stands for early access (It is before actual release). GA stands for general availability. TensorRT GA is stable version and completely tested by nvidia team. So could you try to test TensorRT latest version GA release - TensorRT 8.2 GA Update 3 for x86_64 Architecture.

@xinsuinizhuan
Copy link

where is the ppl.nn forward?

@zhiqwang
Copy link
Owner

zhiqwang commented Apr 21, 2022

@IamNaQi , Since C++ TensorRT inference can be reproducibly verified, I guess TensorRT's python interface does not support Windows well, so I think this issue has been solved, I'll close this thread for now.

@xinsuinizhuan , Thanks for your interesets here, we don't support ppl.nn yet. We did have a pplnn branch before, but we tested that the ONNX exported by yolort did not work properly on pplnn #147. I'm not sure how well pplnn supports yolov5 (or yolort) now, and I will create a new ticket for pplnn support later to make this thread cleaner, or you can create a new one if you are convenient.

@zhiqwang zhiqwang changed the title The result is not very accurate while Deploying yolort on TensorRT Windows TensorRT Python interface compatibility Apr 26, 2022
@zhiqwang
Copy link
Owner

zhiqwang commented Apr 26, 2022

As described in NVIDIA/TensorRT#1945 (comment) , TensorRT's Windows python interface has a compatibility issue with PyTorch. Reopen this ticket due to we should make yolort compatible with Windows System.

@zhiqwang zhiqwang reopened this Apr 26, 2022
@zhiqwang zhiqwang added the enhancement New feature or request label Apr 26, 2022
@zhiqwang zhiqwang added the bug / fix Something isn't working label Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants