-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL #25788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a temporary fix for an issue in Qwen3-VL
by enforcing the use of the TORCH_SDPA
attention backend. While the intent is correct, the implementation uses an assert
statement for configuration validation, which is not robust as it can be disabled. My review suggests replacing this with a ValueError
to ensure the check is always performed, and also points out a potential logic flaw in the backend selection that could negatively impact user experience.
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@tlrmchlsmth CC |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the PR a bit - thanks for fixing this!
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Temple fix for #25582
Special thanks to @ywang96 for the context
The default XFORMERS backend has problem and we will meet repeated token issue
The FlashAttn backend has issue too, we will meet
This flash attention build does not support headdim not being a multiple of 32.', please check the stack trace above for the root cause
Let's land this temple fix first to require user to use torch SDPA, and we can have other issues fix in following up PRs.
Test
vllm serve Qwen/Qwen3-VL-235B-A22B-Instruct --tensor-parallel-size 4 --enforce_eager
Origin
Now