Fix: Output text is always truncated in some models #3016

HyperdriveHustle · 2024-02-24T01:40:05Z

When I use Yi-34B-Chat for inference based on vllm==0.3.2, the text in CompletionOutput is always truncated (see the example below). This PR ensures that when stop_token_ids is configured in the sampling_params, it first checks for the existence of the stop_str before truncating the sequence.

Truncated CompletionOutput example:

CompletionOutput(index=17, text='\n你好！我是一个人工智能助手，由零一万物训练，专门设计用来回答问题和提供信息。你可以问我任何问题，我会尽力帮助你。有什', token_ids=[144, 25902, 103, 59646, 4748, 13992, 44307, 101, 59903, 60738, 59625, 24018, 6994, 101, 11515, 2958, 13231, 8992, 53032, 2479, 2530, 102, 14760, 22126, 3944, 1650, 101, 14107, 32151, 4002, 59725, 102, 9300, 29287, 21645, 19610, 104, 7], cumulative_logprob=-11.930044378333832, logprobs=None, finish_reason=stop)

HyperdriveHustle · 2024-02-26T08:40:40Z

fix #3026 and #3034

learninmou · 2024-02-29T07:10:46Z

This PR solved my accuracy issue on my eval task, thanks .

WoosukKwon

LGTM. Thanks for submitting the PR!

currenttime · 2024-03-19T09:52:19Z

output = llm.generate(text, sampling_params=SamplingParams(max_tokens=512))

Add stop_str check before truncating according to stop token length

7aab7d9

HyperdriveHustle changed the title ~~Add stop_str check before truncating according to stop token length~~ Fix: Output text is always truncated in some models Feb 26, 2024

hiyouga mentioned this pull request Feb 26, 2024

AsyncLLMEngine cannot stop iteration when generation completes #3024

Closed

Refine judgment logic to prevent unstoppable generation.

59e92be

uuuidada mentioned this pull request Feb 28, 2024

When I call the api '/v1/chat/completions' of API Server, it response incomplete results, but vllm's api response complete results lm-sys/FastChat#3074

Open

LeonG7 mentioned this pull request Mar 1, 2024

vllm deploy with stop_token_ids QwenLM/Qwen2.5#72

Closed

HyperdriveHustle mentioned this pull request Mar 1, 2024

[v0.3.3] Release Tracker #3097

Closed

5 tasks

yapf

50b20f5

WoosukKwon self-requested a review March 1, 2024 06:51

WoosukKwon approved these changes Mar 1, 2024

View reviewed changes

WoosukKwon enabled auto-merge (squash) March 1, 2024 07:26

WoosukKwon merged commit 54d3544 into vllm-project:main Mar 1, 2024
21 of 22 checks passed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Fix: Output text is always truncated in some models (vllm-project#3016)

b427d6f

ywang96 mentioned this pull request Mar 5, 2024

Handle when stop_token_ids contain special_tokens #3150

Closed

fyabc mentioned this pull request Apr 8, 2024

qwen1.5-72b-chat content 结果被截断 QwenLM/Qwen2.5#220

Closed

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Fix: Output text is always truncated in some models (vllm-project#3016)

54c38a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Output text is always truncated in some models #3016

Fix: Output text is always truncated in some models #3016

HyperdriveHustle commented Feb 24, 2024

HyperdriveHustle commented Feb 26, 2024

learninmou commented Feb 29, 2024

WoosukKwon left a comment

currenttime commented Mar 19, 2024

Fix: Output text is always truncated in some models #3016

Fix: Output text is always truncated in some models #3016

Conversation

HyperdriveHustle commented Feb 24, 2024

HyperdriveHustle commented Feb 26, 2024

learninmou commented Feb 29, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

currenttime commented Mar 19, 2024