Skip to content

Conversation

yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 26, 2025

Purpose

Temple fix for #25582

Special thanks to @ywang96 for the context

The default XFORMERS backend has problem and we will meet repeated token issue

The FlashAttn backend has issue too, we will meet This flash attention build does not support headdim not being a multiple of 32.', please check the stack trace above for the root cause

Let's land this temple fix first to require user to use torch SDPA, and we can have other issues fix in following up PRs.

Test

vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct --tensor-parallel-size 4 --enforce_eager

curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "Qwen/Qwen3-VL-235B-A22B-Instruct",
    "temperature": 0.1,
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": [
          {"type":"image_url","image_url":{"url":"https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"}},
          {"type":"text","text":"Read all the text in the image."}
        ]
      }
    ]
  }' | jq .

Origin

Generated text: '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'

Now

Wentao Ye
  [下午 5:34](https://vllm-dev.slack.com/archives/D09HJSC0VC4/p1758922499070009)
        "role": "assistant",
        "content": "Auntie Anne's\n\nCINNAMON SUGAR\n1 x 17,000                      17,000\nSUB TOTAL                       17,000\n\nGRAND TOTAL                     17,000\nCASH IDR                        20,000\nCHANGE DUE                      3,000",
        "refusal": null,

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@mergify mergify bot added the qwen Related to Qwen models label Sep 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a temporary fix for an issue in Qwen3-VL by enforcing the use of the TORCH_SDPA attention backend. While the intent is correct, the implementation uses an assert statement for configuration validation, which is not robust as it can be disabled. My review suggests replacing this with a ValueError to ensure the check is always performed, and also points out a potential logic flaw in the backend selection that could negatively impact user experience.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 changed the title [Bug] Temple Fix for Qwen3-VL Issue [Bug] Only Allow SDPA on B200 for Qwen3-VL-235B-A22B Sep 26, 2025
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 26, 2025
@yewentao256
Copy link
Member Author

@tlrmchlsmth CC

@yewentao256 yewentao256 changed the title [Bug] Only Allow SDPA on B200 for Qwen3-VL-235B-A22B [Bug] Only Allow SDPA Backend on B200 for Qwen3-VL Sep 26, 2025
@ywang96 ywang96 changed the title [Bug] Only Allow SDPA Backend on B200 for Qwen3-VL [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL Sep 26, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
yewentao256 and others added 7 commits September 26, 2025 16:19
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up the PR a bit - thanks for fixing this!

@ywang96 ywang96 added this to the v0.11.0 Cherry Picks milestone Sep 27, 2025
@ywang96 ywang96 merged commit c242c98 into main Sep 27, 2025
51 checks passed
@ywang96 ywang96 deleted the wentao-temple-fix-for-qwen3vl branch September 27, 2025 03:44
simon-mo pushed a commit that referenced this pull request Sep 28, 2025
@DarkLight1337 DarkLight1337 mentioned this pull request Sep 29, 2025
1 task
yewentao256 added a commit that referenced this pull request Oct 3, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants