Deployment of TensorRT-LLM Model on Triton Server

Hi, I am trying to deploy a mistral-7b-instruct model on the triton server, but have met with difficulties. I have successfully converted my Mistral model using `trtllm-build` in the llama example in the TensorRT-LLM repo but I am not sure how to deploy on the Triton Server. There seem to be many ways to do so and I have tried creating a tensorrt_llm backend and an ensemble backend but both does not work. Is it possible to advice on what I should do? I would like to create an endpoint such that I can pass a prompt to the mistral model on the Triton server to return generated text.

Here are the steps I have done:
After pulling the Mistral model weights, I have converted the raw model weights into tensorrt-llm checkpoint format

```
python convert_checkpoint.py --model_dir Mistral-7B-Instruct-v0.2 \
                             --output_dir Mistral-7B-Instruct-TensorRT/ \
                             --dtype float16 \
                             --weight_only_precision int8
```

I have built the engine needed (this returns me with a config.json and rank0.engine file):
```
trtllm-build --checkpoint_dir Mistral-7B-Instruct-TensorRT/ \
            --output_dir Mistral-7B-Instruct-compiled/ \
            --gpt_attention_plugin float16 \
            --gemm_plugin float16 \
            --max_input_len 32256
```
I went on to pull the latest triton server version 24.02 and tried to deploy the tensorrt-llm model but have met with the error: `UNAVAILABLE: Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the model configuration`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deployment of TensorRT-LLM Model on Triton Server #379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deployment of TensorRT-LLM Model on Triton Server #379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions