-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to specify the TensorRT version in Triton Server for inference? #7188
Comments
I've read through some issues where adjustments were made to the Triton Server containers by selecting appropriate versions. I am wondering if it is possible to upgrade only the TensorRT version within the current container, or should I wait for an official NGC container that includes TensorRT 10? |
Hi @Gcstk, thanks for bringing this up. There will be some API changes and fixes needed if you'd like to compile the TRT backend with TRT 10. I'd recommend waiting until we officially support TRT 10, which will happen with Triton 24.05. Note that the integration is still in progress and not all features would be supported as of 24.05. |
Hello, could you please tell us when we should expect 24.05 version with tensorrt 10 support? |
@Prots 24.05 will be released at the end of month, which will be later this week or early next week I believe. |
24.05 containers has been released. These containers support TensorRT 10. |
@tanmayv25 I see in the image layers LABEL TRT_VERSION=9.3.0.1 https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/layers |
@Prots There was a delay in releasing TRT-LLM Triton container because of some major issues being reported for TRT-LLM rel-0.10.0 which support TRT 10. See the issue here. That being said... TRT-LLM container is a special Triton container comprising of only python and TRT-LLM backends. The regular release of Triton container of 24.05 with rest of the backends is released with TRT 10 library only. |
So @tanmayv25 when we should expect triton container based on trtllm v0.10.0 with TRT 10.x.x. ? It's a bit messy that versions are different and requires some time to understand what you should use. |
Description:
I am currently facing an issue with specifying the TensorRT version in Triton Server. I have exported my models as .plan files using TensorRT 10.0 because using version 8.6.1 resulted in unsupported INT64 operations, which led to significant precision loss. Additionally, there were errors related to batch processing when exporting the bce-rerank model using TensorRT 8.6.1. After consulting the documentation and doing some research, it seems that TensorRT 10.0 resolves these issues.
However, the latest NGC container for Triton Server only includes TensorRT 8.6.3, which fails to load my model. I attempted the following methods to upgrade the TensorRT version:
1、Pulled the full Triton Server 24.03 container and upgraded to TensorRT 10 within the container, but it still attempts to load version 8.6.3, which led me to believe that a backend change is necessary. Hence, I tried the next step.
2、Pulled the TensorRT backend from this GitHub repository and attempted to compile it with TensorRT 10, but encountered errors that seem to also indicate a version mismatch.
Question:
How can I resolve this issue to use TensorRT 10 for inference in Triton Server? Any advice or insights on how to successfully deploy and run inference with the latest version of TensorRT in Triton Server would be greatly appreciated!
Triton Information
Triton Server 24.03 container
Expected behavior
A clear and concise description of what you expected to happen.
The text was updated successfully, but these errors were encountered: