[Bugfix] Disable shared expert overlap if Marlin MoE is used #28410

mgoin · 2025-11-10T18:01:58Z

Purpose

For now it seems that Marlin MoE might not be safe with multiple CUDA streams, which is triggered when shared expert overlap is used. This was disabled within CI in #28324 so this PR disables shared expert overlap whenever Marlin is used to avoid user issues as well as fix CI.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a bugfix to disable shared expert overlap when Marlin MoE kernels are used. This is achieved by adding a use_marlin_kernels property to the FusedMoE layer, which checks for a use_marlin attribute in the quantization method. The SharedFusedMoE layer is then updated to use this property to conditionally disable overlapping computation. The relevant Marlin-based MoE quantization methods (AWQMoEMethod, CompressedTensorsWNA16MarlinMoEMethod, GPTQMarlinMoEMethod, and Mxfp4MoEMethod) have been correctly updated to set this use_marlin flag. The changes are well-contained and correctly implemented to address the issue. I have no further comments.

robertgshaw2-redhat · 2025-11-10T18:20:41Z

Can you also revert the disable of overlap from that CI test? Otherwise LGTM

Signed-off-by: mgoin <mgoin64@gmail.com>

mergify · 2025-11-10T20:42:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

vadiklyutiy

I second the motion

vadiklyutiy · 2025-11-11T19:41:23Z

But actually my worry is how do we know that other MoE backends is multi streams safe?
It is hard to be 100% sure but maybe we have at least review them and check.
Otherwise such race condition bugs may be really hard to find and identify.

vadiklyutiy · 2025-11-11T19:43:47Z

it is good if it fails with illegal memory as in this case: at least we know that there is a bug. But frequently race condition might just corrupt hidden state and we got random incorrect output...

zhewenl · 2025-11-11T23:41:43Z

But actually my worry is how do we know that other MoE backends is multi streams safe?

I agree with this - looks like we are disable this as it's failing in CI, but we need to do a more comprehensive testing for other kernels as well? (especially on older HWs with lesser memory)

cc @mgoin / @vadiklyutiy

vadiklyutiy · 2025-11-12T00:14:25Z

But actually my worry is how do we know that other MoE backends is multi streams safe?

I agree with this - looks like we are disable this as it's failing in CI, but we need to do a more comprehensive testing for other kernels as well? (especially on older HWs with lesser memory)

cc @mgoin / @vadiklyutiy

My experience says that just comprehensive testing isn't enough (bugs are rare and random). The multi streams part of code should be design and reviewed to be stream safe.

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com>

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com>

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Disable shared expert overlap if Marlin MoE is used

ec055d3

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested a review from alexm-redhat November 10, 2025 18:01

mgoin requested review from pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners November 10, 2025 18:01

gemini-code-assist bot reviewed Nov 10, 2025

View reviewed changes

Revert test env override

36696c2

Signed-off-by: mgoin <mgoin64@gmail.com>

robertgshaw2-redhat approved these changes Nov 10, 2025

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) November 10, 2025 19:30

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 10, 2025

mgoin added bug Something isn't working moe labels Nov 10, 2025

mergify bot added the needs-rebase label Nov 10, 2025

robertgshaw2-redhat mentioned this pull request Nov 11, 2025

[Bugfix] Fix Stream Sync for Shared Expert Overlap #28430

Merged

5 tasks

Merge branch 'main' into disable-marlin-shared-expert-overlap

ae94765

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

mergify bot removed the needs-rebase label Nov 11, 2025

Merge branch 'main' into disable-marlin-shared-expert-overlap

d68d135

vadiklyutiy approved these changes Nov 11, 2025

View reviewed changes

robertgshaw2-redhat merged commit e5f599d into vllm-project:main Nov 11, 2025
55 checks passed

vadiklyutiy mentioned this pull request Nov 12, 2025

[Bug]: Find the root cause of SHARED_EXPERTS_STREAM fail #28220

Closed

fangyuchu pushed a commit to fangyuchu/vllm that referenced this pull request Nov 12, 2025

[Bugfix] Disable shared expert overlap if Marlin MoE is used (vllm-pr…

511ae0b

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com>

yewentao256 deleted the disable-marlin-shared-expert-overlap branch November 12, 2025 14:38

geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025

[Bugfix] Disable shared expert overlap if Marlin MoE is used (vllm-pr…

1d0d40f

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Bugfix] Disable shared expert overlap if Marlin MoE is used (vllm-pr…

eb00ffd

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com>

charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025

[Bugfix] Disable shared expert overlap if Marlin MoE is used (vllm-pr…

ddb86cb

…oject#28410) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Disable shared expert overlap if Marlin MoE is used #28410

[Bugfix] Disable shared expert overlap if Marlin MoE is used #28410

Uh oh!

mgoin commented Nov 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

robertgshaw2-redhat commented Nov 10, 2025

Uh oh!

mergify bot commented Nov 10, 2025

Uh oh!

vadiklyutiy left a comment

Uh oh!

vadiklyutiy commented Nov 11, 2025

Uh oh!

vadiklyutiy commented Nov 11, 2025

Uh oh!

Uh oh!

zhewenl commented Nov 11, 2025 •

edited

Loading

Uh oh!

vadiklyutiy commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Disable shared expert overlap if Marlin MoE is used #28410

[Bugfix] Disable shared expert overlap if Marlin MoE is used #28410

Uh oh!

Conversation

mgoin commented Nov 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

robertgshaw2-redhat commented Nov 10, 2025

Uh oh!

mergify bot commented Nov 10, 2025

Uh oh!

vadiklyutiy left a comment

Choose a reason for hiding this comment

Uh oh!

vadiklyutiy commented Nov 11, 2025

Uh oh!

vadiklyutiy commented Nov 11, 2025

Uh oh!

Uh oh!

zhewenl commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadiklyutiy commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mgoin commented Nov 10, 2025 •

edited by github-actions bot

Loading

zhewenl commented Nov 11, 2025 •

edited

Loading

vadiklyutiy commented Nov 12, 2025 •

edited

Loading