[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361

benchislett · 2025-10-07T16:14:05Z

Purpose

The reorder_batch_threshold is set based on can_use_trtllm_attention, assuming that use_trtllm_attention will return True if spec is enabled and it can be used.

There is a missing edge-case here for the force disable trtllm attention flag. In this case can_use_trtllm_attention says True, but use_trtllm_attention says False, and the mismatch causes incorrect padding leading to the crash observed in #26312

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

gemini-code-assist

Code Review

This pull request correctly fixes a crash that occurs when TRTLLM attention is available but explicitly disabled. The fix synchronizes the logic in can_use_trtllm_attention with use_trtllm_attention by checking if the user has force-disabled the feature. My review includes a suggestion to refactor the newly added code for better readability and conciseness, which is important for this critical piece of logic.

vllm/utils/flashinfer.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>

…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

bugfix

41576e2

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

benchislett added the bug Something isn't working label Oct 7, 2025

benchislett mentioned this pull request Oct 7, 2025

[Bug]: Qwen3-next. MTP mode. FlashInfer attn fail. PrefillSplitQOKVIndptr #26312

Closed

gemini-code-assist bot reviewed Oct 7, 2025

View reviewed changes

vllm/utils/flashinfer.py Outdated Show resolved Hide resolved

benchislett and others added 3 commits October 7, 2025 12:38

Update vllm/utils/flashinfer.py

a164b1a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>

Merge branch 'main' into bugfix-mtp-flashinfer-force-disable

e4e3b09

Merge branch 'main' into bugfix-mtp-flashinfer-force-disable

10bc03f

mgoin approved these changes Oct 7, 2025

View reviewed changes

mgoin enabled auto-merge (squash) October 7, 2025 20:22

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2025

mgoin merged commit caf8b1c into vllm-project:main Oct 7, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361

[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361

Uh oh!

benchislett commented Oct 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361

[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361

Uh oh!

Conversation

benchislett commented Oct 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benchislett commented Oct 7, 2025 •

edited by github-actions bot

Loading