[Bugfix] fix IMA issue in certain cases of the moe marlin kernel #28619

jinzhen-lin · 2025-11-13T06:46:04Z

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

gemini-code-assist

Code Review

This pull request provides a critical fix for a race condition in the MoE Marlin kernel. The change in csrc/moe/marlin_moe_wna16/marlin_template.h correctly addresses an issue where multiple threads could write to the same shared memory location, which would lead to incorrect results. By introducing a proper boundary check, the fix ensures memory safety and correctness. Additionally, this PR enables overlapped execution for Marlin kernels in vllm/model_executor/layers/fused_moe/shared_fused_moe.py, a performance optimization that was likely blocked by this bug. The changes are well-implemented and address the issue effectively.

gemini-code-assist · 2025-11-13T06:47:44Z

csrc/moe/marlin_moe_wna16/marlin_template.h

+          if (idx < block_num_valid_tokens) {
+            if constexpr (w_type == vllm::kFE2M1f && s_type == vllm::kFE4M3fn) {
+              sh_block_topk_weights[idx] = __hmul2(
+                  global_scale, Dtype::num2num2(Dtype::float2num(
+                                    topk_weights_ptr[sh_block_sorted_ids[idx]])));
+            } else {
+              sh_block_topk_weights[idx] = Dtype::num2num2(
+                  Dtype::float2num(topk_weights_ptr[sh_block_sorted_ids[idx]]));
+            }
          }


This change correctly fixes a critical bug. The previous logic, which reset out-of-bounds idx values to 0, could lead to a race condition where multiple threads would write to sh_block_topk_weights[0] simultaneously. This would cause incorrect results and undefined behavior. By wrapping the operation in an if (idx < block_num_valid_tokens) check, you ensure that out-of-bounds accesses are safely skipped. This is the correct and robust approach to prevent this issue.

gemini-code-assist

Code Review

This pull request addresses a critical bug in the MoE marlin kernel and re-enables a related optimization. The change in csrc/moe/marlin_moe_wna16/marlin_template.h correctly fixes a race condition where multiple threads could write to the same shared memory location when handling tokens near block boundaries. The previous logic incorrectly reset an out-of-bounds index to 0, causing this data race. The new implementation properly guards the memory access with a conditional check, resolving the issue. The second change in vllm/model_executor/layers/fused_moe/shared_fused_moe.py re-enables overlapped execution for marlin kernels. This optimization was likely disabled due to the bug, and its re-introduction is a good performance improvement. The changes are correct and effectively address the underlying problem.

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

youkaichao · 2025-11-13T07:22:37Z

cc @mgoin @vadiklyutiy

vadiklyutiy · 2025-11-13T09:41:15Z

So, the illegal memory access appeared when we read some garbage from x=sh_block_topk_weights[0] and then use x as index in topk_weights_ptr[x]?

jinzhen-lin · 2025-11-13T10:46:46Z

So, the illegal memory access appeared when we read some garbage from x=sh_block_topk_weights[0] and then use x as index in topk_weights_ptr[x]?

Yes.

vadiklyutiy · 2025-11-13T10:49:24Z

So, the illegal memory access appeared when we read some garbage from x=sh_block_topk_weights[0] and then use x as index in topk_weights_ptr[x]?

Yes.

Just wondering how multi stream impact on appearance this issue...

jinzhen-lin · 2025-11-13T11:07:22Z

Just wondering how multi stream impact on appearance this issue...

I’m not sure. Before we read values from global memory into sh_block_sorted_ids[0], the value of sh_block_sorted_ids[0] might be a leftover from the previous kernel execution, or it could be some other garbage value. In addition, the layout of memory allocation in GPU memory might also affect whether an IMA behavior actually occurs.

vadiklyutiy · 2025-11-20T16:10:45Z

Although I still have a slight feeling of something left unsaid because I don’t understand how multi-stream impacted.
Meantime the changes look reasonable - I don't see the reason to write in idx=0 when we out of bound.

vllm/model_executor/layers/fused_moe/shared_fused_moe.py

yewentao256

Let's try again before force merge

jinzhen-lin · 2025-11-26T06:30:36Z

@yewentao256 failed lora test seems not related

fix ima

6583e3b

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

jinzhen-lin requested review from mgoin and pavanimajety as code owners November 13, 2025 06:46

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

fix pre-commit

1d3d82f

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

vadiklyutiy approved these changes Nov 20, 2025

View reviewed changes

youkaichao approved these changes Nov 20, 2025

View reviewed changes

youkaichao enabled auto-merge (squash) November 20, 2025 16:19

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025

Merge branch 'main' into fix-moe-marlin-multi-stream

1173a80

yewentao256 reviewed Nov 21, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/shared_fused_moe.py Show resolved Hide resolved

Merge branch 'main' into fix-moe-marlin-multi-stream

71b7f7c

yewentao256 approved these changes Nov 21, 2025

View reviewed changes

Merge branch 'main' into fix-moe-marlin-multi-stream

b3f59bb

yewentao256 approved these changes Nov 25, 2025

View reviewed changes

Merge branch 'main' into fix-moe-marlin-multi-stream

555aa47

jinzhen-lin mentioned this pull request Nov 26, 2025

[Opt Kimi k2 thinking] Fix shared memory allocation in Marlin MoE kernel for large block sizes sgl-project/sglang#13902

Open

mgoin approved these changes Nov 27, 2025

View reviewed changes

vllm-bot merged commit a67dec7 into vllm-project:main Nov 27, 2025
88 of 90 checks passed

Uh oh!

[Bugfix] fix IMA issue in certain cases of the moe marlin kernel #28619

[Bugfix] fix IMA issue in certain cases of the moe marlin kernel #28619

Conversation

jinzhen-lin commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

youkaichao commented Nov 13, 2025

Uh oh!

vadiklyutiy commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinzhen-lin commented Nov 13, 2025

Uh oh!

vadiklyutiy commented Nov 13, 2025

Uh oh!

jinzhen-lin commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadiklyutiy commented Nov 20, 2025

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

jinzhen-lin commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vadiklyutiy commented Nov 13, 2025 •

edited

Loading

jinzhen-lin commented Nov 13, 2025 •

edited

Loading