Fix routing_bias dtype #25711

wenscarl · 2025-09-25T21:24:15Z

Fix the routing_bias dtype to bf16 for flashinfer.fused_moe.trtllm_fp4_block_scale_moe

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Shu Wang. <shuw@nvidia.com>

gemini-code-assist

Code Review

This pull request aims to fix the data type of routing_bias before passing it to a FlashInfer kernel. While the change correctly identifies the need for a type cast, it hardcodes torch.bfloat16. This can lead to data type mismatches and potential runtime errors if the model uses a different activation data type, such as torch.float16. My review provides a more robust solution that dynamically casts routing_bias to match the data type of routing_logits, ensuring consistency.

gemini-code-assist · 2025-09-25T21:25:40Z

vllm/model_executor/layers/quantization/modelopt.py

+            routing_bias = e_score_correction_bias
+            if routing_bias is not None:
+                routing_bias = routing_bias.to(torch.bfloat16)


Hardcoding the routing_bias dtype to torch.bfloat16 can cause a dtype mismatch with routing_logits, which could lead to runtime errors or incorrect behavior for models using other dtypes like torch.float16.

The routing_logits passed to the kernel are either of router_logits.dtype (when use_llama4_routing is true) or torch.float32. To ensure consistency, routing_bias should be cast to the same dtype as the routing_logits being passed to the kernel.

The suggested change determines the target dtype dynamically.

routing_bias = e_score_correction_bias if routing_bias is not None: target_dtype = router_logits.dtype if use_llama4_routing else torch.float32 routing_bias = routing_bias.to(target_dtype)

Agree with this comment

According to the flashinfer kernel, it looks like bfloat16 is hardcoded https://github.com/flashinfer-ai/flashinfer/blob/59a0c514b1fbdf1868466d6e7a98a7c4a1e26908/csrc/trtllm_fused_moe_kernel_launcher.cu#L79

alexm-redhat

@wenscarl thanks for the fix!

Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Fix routing_bias dtype

9e00a83

Signed-off-by: Shu Wang. <shuw@nvidia.com>

wenscarl requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 25, 2025 21:24

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

alexm-redhat approved these changes Sep 25, 2025

View reviewed changes

alexm-redhat enabled auto-merge (squash) September 25, 2025 21:26

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 25, 2025

alexm-redhat merged commit 081b559 into vllm-project:main Sep 25, 2025
58 of 60 checks passed

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

Fix routing_bias dtype (#25711)

1d21080

Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix routing_bias dtype #25711

Fix routing_bias dtype #25711

Uh oh!

wenscarl commented Sep 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 25, 2025

Uh oh!

pavanimajety Sep 25, 2025

Uh oh!

mgoin Sep 25, 2025

Uh oh!

alexm-redhat left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix routing_bias dtype #25711

Fix routing_bias dtype #25711

Uh oh!

Conversation

wenscarl commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

pavanimajety Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

alexm-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wenscarl commented Sep 25, 2025 •

edited by github-actions bot

Loading