Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT version is not compatible, expecting library version 9.2.0.4 got 9.2.0.5 #194

Closed
wjueyao opened this issue Dec 6, 2023 · 14 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@wjueyao
Copy link

wjueyao commented Dec 6, 2023

I encountered the following error when using the newest image nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3

The engine plan file is not compatible with this version of TensorRT, expecting library version 9.2.0.4 got 9.2.0.5, please rebuild

image

I followed the exact steps listed on this issue to build model files. Instead of using main branch, I used v0.6.1.#128 (comment)

And I checked the tensorrt version in the install_tensorrt.sh, it did use 9.2.0.5
https://github.com/NVIDIA/TensorRT-LLM/blob/v0.6.1/docker/common/install_tensorrt.sh

#!/bin/bash

set -ex

TRT_VER="9.2.0.5"
CUDA_VER="12.2"
CUDNN_VER="8.9.4.25-1+cuda12.2"
NCCL_VER="2.18.3-1+cuda12.2"
CUBLAS_VER="12.2.5.6-1"

Therefore, I was wondering which one should be the correct version to use

@anjalibshah
Copy link

Can you use Triton server's 23.10 container with v0.6.1 as documented (https://github.com/NVIDIA/TensorRT-LLM/tree/v0.6.1#release-notes) and let us know?

@zTaoplus
Copy link

zTaoplus commented Dec 7, 2023

Same issue here.
And i use the nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 container to run with v0.6.1 engine, but encountered an error.:
image

@wjueyao
Copy link
Author

wjueyao commented Dec 7, 2023

Can you use Triton server's 23.10 container with v0.6.1 as documented (https://github.com/NVIDIA/TensorRT-LLM/tree/v0.6.1#release-notes) and let us know?

Sure. I converted the engine file using Triton server's 23.10 container (nvcr.io/nvidia/tritonserver:23.10-py3) with v0.6.1, build.py works fine.

However, when I tried to run the model using the newest image nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3, I encountered the above error.

I check the tensorrt version inside nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3, it uses 9.2.0.4.

@byshiue byshiue added the triaged Issue has been triaged by maintainers label Dec 7, 2023
@kaiyux
Copy link
Collaborator

kaiyux commented Dec 8, 2023

@zTaoplus @wjueyao If you want to use Tritonserver container (nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3), could you please use the same container to build the engine? The container has TRT 9.2.0.4 installed, which will not be working with TRT 9.2.0.5 used in v0.6.1.

If you want to use the engine built by v0.6.1, please try deploy the model using the container built by https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/dockerfile/Dockerfile.trt_llm_backend

Thanks.

@FightingMan
Copy link

FightingMan commented Dec 10, 2023

nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3 use trt 9.2.0.4
image
build engine use trt 9.2.0.5 https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/common/install_tensorrt.sh#L5
https://github.com/NVIDIA/TensorRT-LLM/blob/v0.6.1/docker/common/install_tensorrt.sh#L5

run tritonserver error reports:

[TensorRT-LLM][ERROR] 6: The engine plan file is not compatible with this version of TensorRT, expecting library version 9.2.0.4 got 9.2.0.5, please rebuild.

when I build the TensorRT-LLM how can I get 9.2.0.4version, I Pass 9.2.0.4 as trt_ver but it cant download.

264.2 + RELEASE_URL_TRT=https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.4.linux.x86_64-gnu.cuda-12.2.tar.gz
264.2 + wget --no-verbose https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.4.linux.x86_64-gnu.cuda-12.2.tar.gz -O /tmp/TensorRT.tar
266.0 https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.4.linux.x86_64-gnu.cuda-12.2.tar.gz:
266.0 2023-12-10 12:33:44 ERROR 404: Not Found.
------

@FightingMan
Copy link

@kaiyux help,hao can I get the right url of special version:9.2.0.4, or could you support 9.2.0.5 in tritonserver

@shixianc
Copy link

Building a new container using https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/dockerfile/Dockerfile.trt_llm_backend works for me. But still it would be nice to have nvidia pre-built container for 23.11 with 9.2.5

@kaiyux
Copy link
Collaborator

kaiyux commented Dec 15, 2023

@FightingMan Please use nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3 which has TensorRT 9.2.0.4 pre-installed to build the engines, then you can deploy the engine with the same container.

If you want to use TensorRT 9.2.0.5 in tritonserver container, please use Dockerfile.trt_llm_backend to bulid the container and do the deployment. Thanks.

@ccf-yang
Copy link

I encountered the same issue. However ,when using the nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 container, the issue does not occur.

@shannonphu
Copy link

shannonphu commented Dec 20, 2023

Please use nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3 which has TensorRT 9.2.0.4 pre-installed to build the engines, then you can deploy the engine with the same container.

Do we need to re-build TensorRT-LLM in nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3 in order to rebuild the engine? I tried cloning https://github.com/NVIDIA/TensorRT-LLM/tree/rel within the container but I'm not sure what steps in https://github.com/NVIDIA/TensorRT-LLM/blob/rel/docs/source/installation.md to use when already within this container. Do we need to then python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt?
@kaiyux

@shannonphu
Copy link

I ended up using Dockerfile.trt_llm_backend as suggested and it worked. Seems to be TensorRT 9.2.0.5 and the release of TensorRT-LLM also lists this version number install_tensorrt.sh.

Will a future version of nvcr.io/nvidia/tritonserver:2x.xx-trtllm-python-py3 be on TensorRT 9.2.0.5 to align with the TensorRT-LLM version which I use to build the engine separately? It would be easier for deployments if the Triton trtllm backend container did not need to build each time since it takes a few hours. Please correct me if my understanding is wrong about the versions for trtllm_backend vs trtllm.

@kaiyux
Copy link
Collaborator

kaiyux commented Dec 25, 2023

@shannonphu You are right about the versions for trtllm_backend vs trtllm, thanks for the patience on that.

The latest image nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 has TensorRT 9.2.0.5 included, and you may want to use it together with the latest version of TensorRT-LLM. Thanks.

@kaiyux
Copy link
Collaborator

kaiyux commented Dec 25, 2023

I'm closing this issue, please feel free to leave comments or open new issues if you're still seeing problems. Thanks.

@LeatherDeerAU
Copy link

@FightingMan Please use nvcr.io/nvidia/tritonserver:23.11-trtllm-python-py3 which has TensorRT 9.2.0.4 pre-installed to build the engines, then you can deploy the engine with the same container.

I try run example with nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3. How can I build engines in a pre-built container if the tensorrt_llm package is needed for the engine build and its build from source takes a couple of hours, if you do as it says here
https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#prepare-tensorrt-llm-engines

Is there an option to not build a package, but use ready pre-build package like pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com in triton container. (I tried and it didn't work out, but maybe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

10 participants