The repetition_penalty sampling parameter in the in-flight Triton server seems to have no effect

I tested Baichuan using the TRT-LLM In-flight Triton Server and found many cases of repetition in the test dataset. In offline inference, I tested these repetitive cases and set the repetition_penalty, which helped prevent some inputs from repeating at the end.

![image](https://github.com/triton-inference-server/tensorrtllm_backend/assets/73152103/78c2e62f-ba0d-4137-b959-5e445d92ba23)

However, during inference on the In-flight Triton Server, regardless of how I modify the repetition_penalty, the repetition in the results does not change.

Based on my testing, the top-k, top-p, and temperature sampling parameters are all effective, while the presence_penalty and repetition_penalty do not seem to be effective in my tests.

Could you try to reproduce this issue? You can test the effectiveness of the repetition_penalty in the TRT-LLM In-flight backend with the Baichuan model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The repetition_penalty sampling parameter in the in-flight Triton server seems to have no effect #93

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The repetition_penalty sampling parameter in the in-flight Triton server seems to have no effect #93

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions