-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled #26361
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a crash that occurs when TRTLLM attention is available but explicitly disabled. The fix synchronizes the logic in can_use_trtllm_attention with use_trtllm_attention by checking if the user has force-disabled the feature. My review includes a suggestion to refactor the newly added code for better readability and conciseness, which is important for this critical piece of logic.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ut disabled (vllm-project#26361) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Purpose
The
reorder_batch_thresholdis set based oncan_use_trtllm_attention, assuming thatuse_trtllm_attentionwill return True if spec is enabled and it can be used.There is a missing edge-case here for the force disable trtllm attention flag. In this case
can_use_trtllm_attentionsays True, butuse_trtllm_attentionsays False, and the mismatch causes incorrect padding leading to the crash observed in #26312