[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL #25788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

ywang96 merged 11 commits into main from wentao-temple-fix-for-qwen3vl

Sep 27, 2025

Member

yewentao256 commented Sep 26, 2025 •

edited by github-actions bot

Loading

Purpose

Temple fix for #25582

Special thanks to @ywang96 for the context

The default XFORMERS backend has problem and we will meet repeated token issue

The FlashAttn backend has issue too, we will meet This flash attention build does not support headdim not being a multiple of 32.', please check the stack trace above for the root cause

Let's land this temple fix first to require user to use torch SDPA, and we can have other issues fix in following up PRs.

Test

vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct --tensor-parallel-size 4 --enforce_eager

curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "Qwen/Qwen3-VL-235B-A22B-Instruct",
    "temperature": 0.1,
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": [
          {"type":"image_url","image_url":{"url":"https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"}},
          {"type":"text","text":"Read all the text in the image."}
        ]
      }
    ]
  }' | jq .

Origin

Generated text: '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'

Now

Wentao Ye
  [下午 5:34](https://vllm-dev.slack.com/archives/D09HJSC0VC4/p1758922499070009)
        "role": "assistant",
        "content": "Auntie Anne's\n\nCINNAMON SUGAR\n1 x 17,000                      17,000\nSUB TOTAL                       17,000\n\nGRAND TOTAL                     17,000\nCASH IDR                        20,000\nCHANGE DUE                      3,000",
        "refusal": null,


          temple fix qwen3 vl issue

3aae463

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested a review from sighingnow as a code owner

September 26, 2025 22:04

mergify bot added the qwen label

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request introduces a temporary fix for an issue in Qwen3-VL by enforcing the use of the TORCH_SDPA attention backend. While the intent is correct, the implementation uses an assert statement for configuration validation, which is not robust as it can be disabled. My review suggests replacing this with a ValueError to ensure the check is always performed, and also points out a potential logic flaw in the backend selection that could negatively impact user experience.

vllm/model_executor/models/qwen3_vl.py Outdated Show resolved Hide resolved

ywang96 reviewed

View reviewed changes

vllm/model_executor/models/qwen3_vl.py Outdated Show resolved Hide resolved


          update through comments

ab03f32

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 changed the title ~~[Bug] Temple Fix for Qwen3-VL Issue~~ [Bug] Only Allow SDPA on B200 for Qwen3-VL-235B-A22B

yewentao256 added the ready label

Member Author

yewentao256 commented Sep 26, 2025

@tlrmchlsmth CC

yewentao256 changed the title ~~[Bug] Only Allow SDPA on B200 for Qwen3-VL-235B-A22B~~ [Bug] Only Allow SDPA Backend on B200 for Qwen3-VL

ywang96 reviewed

View reviewed changes

vllm/model_executor/models/qwen3_vl.py Outdated Show resolved Hide resolved

ywang96 changed the title ~~[Bug] Only Allow SDPA Backend on B200 for Qwen3-VL~~ [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL


          warning and fallback

24841ca

Signed-off-by: yewentao256 <zhyanwentao@126.com>

ywang96 reviewed

View reviewed changes

vllm/model_executor/models/qwen3_vl.py Show resolved Hide resolved

yewentao256 and others added 7 commits

September 26, 2025 16:19


          update other blocks

ca79b81

Signed-off-by: yewentao256 <zhyanwentao@126.com>


          add TODO

2840a06

Signed-off-by: yewentao256 <zhyanwentao@126.com>


          cleanup

002d7b9

Signed-off-by: Roger Wang <hey@rogerw.io>


          update

7c9c46f

Signed-off-by: Roger Wang <hey@rogerw.io>


          cleanup

d9cf02e

Signed-off-by: Roger Wang <hey@rogerw.io>


          update

df64486

Signed-off-by: Roger Wang <hey@rogerw.io>


          Merge branch 'main' into wentao-temple-fix-for-qwen3vl

ywang96 approved these changes

View reviewed changes

Member

ywang96 left a comment

I cleaned up the PR a bit - thanks for fixing this!


          Merge branch 'main' into wentao-temple-fix-for-qwen3vl

83858c4

ywang96 added this to the v0.11.0 Cherry Picks milestone

ywang96 merged commit c242c98 into main

51 checks passed

ywang96 deleted the wentao-temple-fix-for-qwen3vl branch

September 27, 2025 03:44

This was referenced Sep 27, 2025

[Bug]: support the Qwen3-VL-235B-A22B model？ #25582

Closed

[Model] Support Qwen3-VL Model Series #24727

Merged

simon-mo pushed a commit that referenced this pull request


          [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788)

c2fa2d4

Signed-off-by: simon-mo <simon.mo@hey.com>

JJJYmmm mentioned this pull request

VLLM Qwen3-Vl Thinking model output garbled text(Using H20 96G) QwenLM/Qwen3-VL#1520

Open

DarkLight1337 mentioned this pull request

Closed

1 task

ywang96 mentioned this pull request

[Bugfix] Fallback ViT attn backend to SDPA for blackwell #25851

Merged

5 tasks

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request


          [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (vllm-p…

be7a248

…roject#25788)

yewentao256 added a commit that referenced this pull request


          [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788)

1a893d1

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels