[qwen] refactor attentions for vision/audio #38930

zucchini-nlp · 2025-06-20T08:22:17Z

What does this PR do?

As per title, these modalities were left out when refactoring VLMs earlier

HuggingFaceDocBuilderDev · 2025-06-20T08:35:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez

Very very much welcome!! 🤗 Main concern is about the kwargs here, they are not flowing correctly in this PR, and I'm not sure if we can flow them all the way in general (e.g., some seqlens passed as kwargs for the language model would clash with those explicitly defined in the other modalities), i.e. does it make sense to have them?

src/transformers/models/qwen2_5_omni/modular_qwen2_5_omni.py

src/transformers/models/qwen2_vl/modeling_qwen2_vl.py

src/transformers/models/qwen2_5_omni/modular_qwen2_5_omni.py

src/transformers/models/qwen2_vl/modeling_qwen2_vl.py

refactor attentions in vision/audio

d2b326a

zucchini-nlp requested a review from Cyrilvallez June 20, 2025 08:22

Cyrilvallez approved these changes Jun 20, 2025

View reviewed changes

zucchini-nlp added 5 commits June 20, 2025 12:48

remove fa2 import

e98a128

make config the only args

bc4022f

pass along kwargs from modality encoders

526b46d

Merge remote-tracking branch 'upstream/main' into qwen-attentions

17ca65f

style

9728610

zucchini-nlp merged commit d3d835d into huggingface:main Jun 24, 2025
14 checks passed

ydshieh mentioned this pull request Jun 26, 2025

[qwen2-vl] fix vision attention scaling #39043

Merged

petkokp mentioned this pull request Jun 29, 2025

Fixed flash attention 2 crash in qwen vl and omni models #39110

Closed

5 tasks

ydshieh mentioned this pull request Jul 7, 2025

[vlm] fix loading of retrieval VLMs #39242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[qwen] refactor attentions for vision/audio #38930

[qwen] refactor attentions for vision/audio #38930

Uh oh!

zucchini-nlp commented Jun 20, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 20, 2025

Uh oh!

Cyrilvallez left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[qwen] refactor attentions for vision/audio #38930

[qwen] refactor attentions for vision/audio #38930

Uh oh!

Conversation

zucchini-nlp commented Jun 20, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 20, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!