[Bug]: Qwen 30ba3 VL Does not work

### Your current environment


on vllm v0.11.1rc0 - Nvidia GH200 (arm64) 

```
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 190, in forward
browser-1  | (EngineCore_DP0 pid=546)     x = x + self.attn(self.norm1(x),
browser-1  | (EngineCore_DP0 pid=546)             ^^^^^^^^^^^^^^^^^^^^^^^^
browser-1  | (EngineCore_DP0 pid=546)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
browser-1  | (EngineCore_DP0 pid=546)     return self._call_impl(*args, **kwargs)
browser-1  | (EngineCore_DP0 pid=546)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
browser-1  | (EngineCore_DP0 pid=546)   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
browser-1  | (EngineCore_DP0 pid=546)     return forward_call(*args, **kwargs)
browser-1  | (EngineCore_DP0 pid=546)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
browser-1  | (EngineCore_DP0 pid=546)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_5_vl.py", line 384, in forward
browser-1  | (EngineCore_DP0 pid=546)     output = flash_attn_varlen_func(q,
browser-1  | (EngineCore_DP0 pid=546)              ^^^^^^^^^^^^^^^^^^^^^^^^^
browser-1  | (EngineCore_DP0 pid=546)   File "/usr/local/lib/python3.12/dist-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
browser-1  | (EngineCore_DP0 pid=546)     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
browser-1  | (EngineCore_DP0 pid=546)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
browser-1  | (EngineCore_DP0 pid=546)   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1158, in __call__
browser-1  | (EngineCore_DP0 pid=546)     return self._op(*args, **(kwargs or {}))
browser-1  | (EngineCore_DP0 pid=546)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
browser-1  | (EngineCore_DP0 pid=546) RuntimeError: This flash attention build does not support headdim not being a multiple of 32.
```

### 🐛 Describe the bug

Using VLLM with FA2 or FA3 results in this error. 

For FA#, used latest FA3 built from source which should allow multiples of 8, but looks like functions from vllm fa2 get pulled. 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Qwen 30ba3 VL Does not work #26989

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Qwen 30ba3 VL Does not work #26989

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions