[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect `logical_not` #25925

zhoukezi · 2025-09-30T04:40:09Z

Purpose

In PR #25854, we addressed an issue in MiDashengLM’s audio encoder attention. The initial fix used a hand-written attention implementation, which, per review feedback, was then replaced with an SDPA-based implementation before merge.

SDPA interprets attention masks in the opposite way compared to our original hand-written code. To stay consistent, the previous logical inversion of the mask should have been removed. However, that adjustment was inadvertently omitted in the submitted changes, causing the audio encoder to produce incorrect embeddings.

This PR removes the leftover inversion and restores correct outputs from the MiDashengLM encoder.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zhoukz <me@zhoukz.com>

gemini-code-assist

Code Review

This pull request provides a crucial bugfix for the MiDashengLM audio encoder. The change correctly removes an erroneous logical_not() call on the attention mask. This inversion was a leftover from a previous implementation and is incompatible with the scaled_dot_product_attention function, which expects True values for unmasked positions. By removing the inversion, this PR ensures the attention mask is correctly interpreted, fixing the generation of incorrect audio embeddings. The change is precise, well-explained, and I find no further issues.

…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com>

…ect `logical_not` (#25925) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Fix MiDashengLM audio encoder mask by removing incorrect logical_not

f1dd8c2

Signed-off-by: zhoukz <me@zhoukz.com>

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

DarkLight1337 requested a review from Isotr0py September 30, 2025 04:52

Isotr0py approved these changes Sep 30, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) September 30, 2025 06:29

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

Isotr0py merged commit 2e1b8bc into vllm-project:main Sep 30, 2025
52 checks passed

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorr…

e78363f

…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorr…

0da98ff

…ect `logical_not` (#25925) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect `logical_not` #25925

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect `logical_not` #25925

zhoukezi commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not #25925

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not #25925

Conversation

zhoukezi commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect `logical_not` #25925

[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect `logical_not` #25925

zhoukezi commented Sep 30, 2025 •

edited by github-actions bot

Loading