Skip to content

Can I use triton server tensorrtllm backend to host other tensorrt built models? If not what do you suggest if our models stack is mixed of LLM and non-LLM models #242

@zmy1116

Description

@zmy1116

Hello,

so our current models stack consists of a set of models built in TensorRT and the whisper ASR model.

I'd like to use triton server to host all these models. Since whisper can be converted using TensoRT LLM. I tried to host all models with the triton server with tensorrtLLM backends and I see the error.

And I'm seeing this error

E1220 20:46:48.038714 1 model_lifecycle.cc:621] failed to load 'fer_2' version 1: Invalid argument: unable to find 'libtriton_tensorrt.so' or 'tensorrt/model.py' for model 'fer_2', in /opt/tritonserver/backends/tensorrt

I think this suggests that the triton server tensorrtllm backend do not support tensorrt models. Is it the case?

If so, what should I do? Or what do you recommand if we have a large models stack mixed of LLM and non-LLM models.

Thank you.

Metadata

Metadata

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions