Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Output text is always truncated in some models #3016

Merged
merged 3 commits into from
Mar 1, 2024

Conversation

HyperdriveHustle
Copy link
Contributor

When I use Yi-34B-Chat for inference based on vllm==0.3.2, the text in CompletionOutput is always truncated (see the example below). This PR ensures that when stop_token_ids is configured in the sampling_params, it first checks for the existence of the stop_str before truncating the sequence.

Truncated CompletionOutput example:

CompletionOutput(index=17, text='\n你好!我是一个人工智能助手,由零一万物训练,专门设计用来回答问题和提供信息。你可以问我任何问题,我会尽力帮助你。有什', token_ids=[144, 25902, 103, 59646, 4748, 13992, 44307, 101, 59903, 60738, 59625, 24018, 6994, 101, 11515, 2958, 13231, 8992, 53032, 2479, 2530, 102, 14760, 22126, 3944, 1650, 101, 14107, 32151, 4002, 59725, 102, 9300, 29287, 21645, 19610, 104, 7], cumulative_logprob=-11.930044378333832, logprobs=None, finish_reason=stop)

@HyperdriveHustle HyperdriveHustle changed the title Add stop_str check before truncating according to stop token length Fix: Output text is always truncated in some models Feb 26, 2024
@HyperdriveHustle
Copy link
Contributor Author

fix #3026 and #3034

@learninmou
Copy link
Contributor

This PR solved my accuracy issue on my eval task, thanks .

@WoosukKwon WoosukKwon self-requested a review March 1, 2024 06:51
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for submitting the PR!

@WoosukKwon WoosukKwon enabled auto-merge (squash) March 1, 2024 07:26
@WoosukKwon WoosukKwon merged commit 54d3544 into vllm-project:main Mar 1, 2024
21 of 22 checks passed
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
@currenttime
Copy link

output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants