Trion server + lora multiple times the same input results are different

### System Info

gpu: A100
trtllm v0.11.0
trtllm-backend v0.11.0
image:  Triton Inference Server: 24.07-trtllm-python-py3
model: llama-7b
lora: Japanese-Alpaca-LoRA-7b-v0


### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Reference doc： https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.11.0/inflight_batcher_llm

python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py \
	--request-output-len 10 \
	--text "hello" \
	--tokenizer-dir /opt/app/ori_models/llama_7b \
	--lora-path /opt/app/TensorRT-LLM/examples/llama/Japanese-Alpaca-LoRA-7b-v0-weights \
	--lora-task-id 1

### Expected behavior

Same parameters, same results multiple times

### actual behavior

![image](https://github.com/user-attachments/assets/f2b206fd-c390-4f62-a01c-0752e887b71c)
![image](https://github.com/user-attachments/assets/0e5c8207-b6d1-4421-8700-bd52de16f53f)




### additional notes

tensorrt_llm: config.pbtxt
[config.pbtxt.txt](https://github.com/user-attachments/files/16736237/config.pbtxt.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trion server + lora multiple times the same input results are different #583

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trion server + lora multiple times the same input results are different #583

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions