Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ablation experiment results with EfficientNMS_TRT #290

Merged
merged 2 commits into from
Jan 26, 2022
Merged

Conversation

zhiqwang
Copy link
Owner

@zhiqwang zhiqwang commented Jan 26, 2022

Unlike other pipelines that deal with yolov5 on TensorRT, we embed the whole post-processing into the Graph with onnx-graghsurgeon. We gain a lot with this whole pipeline. The ablation experiment results are below. The first one is the result without running EfficientNMS_TRT, and the second one is the result with EfficientNMS_TRT embedded. As you can see, the inference time is even reduced, we guess it is because the data copied to the device will be much less after doing EfficientNMS_TRT. (The mean Latency of D2H is reduced from 0.868048 ms to 0.0102295 ms, running on Nivdia Geforce GTX 1080ti, using TensorRT 8.2 with yolov5n6 and scaling images to 512x640.)

And onnx-graphsurgeon is easy to install, you can just use their prebuilt wheels:

python3 -m pip install onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com

The detailed results:

[I] === Performance summary w/o EfficientNMS_TRT plugin ===
[I] Throughput: 383.298 qps
[I] Latency: min = 3.66479 ms, max = 5.41199 ms, mean = 4.00543 ms, median = 3.99316 ms, percentile(99%) = 4.23831 ms
[I] End-to-End Host Latency: min = 3.76599 ms, max = 6.45874 ms, mean = 5.08597 ms, median = 5.07544 ms, percentile(99%) = 5.50839 ms
[I] Enqueue Time: min = 0.743408 ms, max = 5.27966 ms, mean = 0.940805 ms, median = 0.924805 ms, percentile(99%) = 1.37329 ms
[I] H2D Latency: min = 0.502045 ms, max = 0.62674 ms, mean = 0.538255 ms, median = 0.537354 ms, percentile(99%) = 0.582153 ms
[I] GPU Compute Time: min = 2.23233 ms, max = 3.92395 ms, mean = 2.59913 ms, median = 2.58661 ms, percentile(99%) = 2.8201 ms
[I] D2H Latency: min = 0.851807 ms, max = 0.900421 ms, mean = 0.868048 ms, median = 0.867676 ms, percentile(99%) = 0.889191 ms
[I] Total Host Walltime: 3.0081 s
[I] Total GPU Compute Time: 2.99679 s
[I] Explanations of the performance metrics are printed in the verbose logs.
[I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=yolov5n6-no-nms.onnx --workspace=8096
[I] === Performance summary w/ EfficientNMS_TRT plugin ===
[I] Throughput: 389.234 qps
[I] Latency: min = 2.81482 ms, max = 9.77234 ms, mean = 3.1062 ms, median = 3.07642 ms, percentile(99%) = 3.33548 ms
[I] End-to-End Host Latency: min = 2.82202 ms, max = 11.6749 ms, mean = 4.939 ms, median = 4.95587 ms, percentile(99%) = 5.45207 ms
[I] Enqueue Time: min = 0.999878 ms, max = 11.3833 ms, mean = 1.28942 ms, median = 1.18579 ms, percentile(99%) = 4.53088 ms
[I] H2D Latency: min = 0.488159 ms, max = 0.633881 ms, mean = 0.546754 ms, median = 0.546631 ms, percentile(99%) = 0.570557 ms
[I] GPU Compute Time: min = 2.30298 ms, max = 9.21094 ms, mean = 2.54921 ms, median = 2.51904 ms, percentile(99%) = 2.78528 ms
[I] D2H Latency: min = 0.00610352 ms, max = 0.302734 ms, mean = 0.0102295 ms, median = 0.00976562 ms, percentile(99%) = 0.0151367 ms
[I] Total Host Walltime: 3.00591 s
[I] Total GPU Compute Time: 2.98258 s
[I] Explanations of the performance metrics are printed in the verbose logs.
[I]
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # trtexec --onnx=yolov5n6-efficient-nms.onnx --workspace=8096

@zhiqwang zhiqwang force-pushed the trt-experiments branch 2 times, most recently from 85cd952 to 814c358 Compare January 26, 2022 05:06
@zhiqwang zhiqwang added the documentation Improvements or additions to documentation label Jan 26, 2022
@codecov
Copy link

codecov bot commented Jan 26, 2022

Codecov Report

Merging #290 (57c2af8) into main (3d943e7) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #290   +/-   ##
=======================================
  Coverage   94.01%   94.01%           
=======================================
  Files          11       11           
  Lines         718      718           
=======================================
  Hits          675      675           
  Misses         43       43           
Flag Coverage Δ
unittests 94.01% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d943e7...57c2af8. Read the comment docs.

@zhiqwang zhiqwang merged commit 0dbc93d into main Jan 26, 2022
@zhiqwang zhiqwang deleted the trt-experiments branch January 26, 2022 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant