Skip to content

Inconsistent Batch Index Order in Decoupled Mode with trt-llm and triton trtllm backend #705

@Oldpan

Description

@Oldpan

System Info

System Info

Question:

While using trt-llm (nvcr.io/nvidia/tritonserver:25.01-trtllm-python-py3) with decoupled mode enabled and a batch size greater than 1, I observed an issue where the batch_index in the returned data does not always match the expected order of inputs.

For example, if I input [A, B, C] in a batch(batchsize=3), I expect the model to return [a, b, c] in the same order. However, in some cases, the output batch indices get shuffled:

data: {"batch_index":2, "model_name":"nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"b"}
data: {"batch_index":1, "model_name":"nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"c"}
data: {"batch_index":0, "model_name":"nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"a"}

Here, the batch_index is incorrect, leading to an unexpected order of results.

Is this behavior expected in decoupled mode, or is there a way to ensure the output follows the correct sequence order?

Could this issue be related to triton_trtllm_backend? Thanks!

Who can help?

@byshiue @schetlur-nv

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

none

Expected behavior

none

actual behavior

right order with input

additional notes

none

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions