-
Notifications
You must be signed in to change notification settings - Fork 132
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
System Info
Question:
While using trt-llm (nvcr.io/nvidia/tritonserver:25.01-trtllm-python-py3) with decoupled mode enabled and a batch size greater than 1, I observed an issue where the batch_index in the returned data does not always match the expected order of inputs.
For example, if I input [A, B, C] in a batch(batchsize=3), I expect the model to return [a, b, c] in the same order. However, in some cases, the output batch indices get shuffled:
data: {"batch_index":2, "model_name":"nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"b"}
data: {"batch_index":1, "model_name":"nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"c"}
data: {"batch_index":0, "model_name":"nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"a"}
Here, the batch_index is incorrect, leading to an unexpected order of results.
Is this behavior expected in decoupled mode, or is there a way to ensure the output follows the correct sequence order?
Could this issue be related to triton_trtllm_backend? Thanks!
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
none
Expected behavior
none
actual behavior
right order with input
additional notes
none
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working