Hello,
so our current models stack consists of a set of models built in TensorRT and the whisper ASR model.
I'd like to use triton server to host all these models. Since whisper can be converted using TensoRT LLM. I tried to host all models with the triton server with tensorrtLLM backends and I see the error.
And I'm seeing this error
E1220 20:46:48.038714 1 model_lifecycle.cc:621] failed to load 'fer_2' version 1: Invalid argument: unable to find 'libtriton_tensorrt.so' or 'tensorrt/model.py' for model 'fer_2', in /opt/tritonserver/backends/tensorrt
I think this suggests that the triton server tensorrtllm backend do not support tensorrt models. Is it the case?
If so, what should I do? Or what do you recommand if we have a large models stack mixed of LLM and non-LLM models.
Thank you.