Skip to content

The repetition_penalty sampling parameter in the in-flight Triton server seems to have no effect #93

@StarrickLiu

Description

@StarrickLiu

I tested Baichuan using the TRT-LLM In-flight Triton Server and found many cases of repetition in the test dataset. In offline inference, I tested these repetitive cases and set the repetition_penalty, which helped prevent some inputs from repeating at the end.

image

However, during inference on the In-flight Triton Server, regardless of how I modify the repetition_penalty, the repetition in the results does not change.

Based on my testing, the top-k, top-p, and temperature sampling parameters are all effective, while the presence_penalty and repetition_penalty do not seem to be effective in my tests.

Could you try to reproduce this issue? You can test the effectiveness of the repetition_penalty in the TRT-LLM In-flight backend with the Baichuan model.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions