[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels #25488

varun-sundar-rabindranath · 2025-09-23T16:18:13Z

Purpose

Enable GPTOSS DP/EP for DeepEPHighThroughput All2All using Marlin codepath. This could be an alternative to matmul_ogs from Triton

Example serving command : VLLM_MXFP4_USE_MARLIN=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput" canhazgpu run -g2 -- vllm serve openai/gpt-oss-120b --tensor-parallel-size 1 --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching --port 9010

Test Plan

GPT-OSS evals
server command : VLLM_MXFP4_USE_MARLIN=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput" vllm serve openai/gpt-oss-120b --tensor-parallel-size 1 --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching --port 9010
command: OPENAI_API_KEY=empty python -m gpt_oss.evals --model openai/gpt-oss-120b --eval gpqa --n-threads 128 --base-url http://localhost:9010/v1

Test Result

gpt-oss-120b
Reasoning Effort	GPQA
low	0.652
medium	0.723
high	0.797

Benchmarks

Please find results here
TLDR;
comparing with TP : Better TTFT; Bad TPOT as deepep_high_throughput enforces eager mode
comparing with OAITritonExperts : Worse TTFT and TPOT

gemini-code-assist

Code Review

This pull request enables GPT-OSS DP/EP with Marlin kernels by introducing a MarlinExperts modular kernel and integrating it into the mxfp4 quantization backend. The changes correctly refactor the MoE logic to use the modular kernel framework. However, I've found a high-severity issue in the MarlinExperts implementation where workspace management is incorrect, which could lead to performance degradation due to repeated memory allocations. I've provided a suggestion to fix this.

vllm/model_executor/layers/fused_moe/fused_marlin_moe.py

vllm/model_executor/layers/fused_moe/modular_kernel.py

varun-sundar-rabindranath · 2025-09-30T14:54:30Z

@zyongye @mgoin @bnellnm please take a look when you have a moment. Thanks 🙌

vllm/model_executor/layers/fused_moe/fused_marlin_moe.py

bnellnm · 2025-09-30T18:29:30Z

Looks good to me. Just had a few minor comments. Will PR interfere with #21166?

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

docs/design/moe_kernel_features.md

varun-sundar-rabindranath · 2025-10-01T19:30:51Z

Looks good to me. Just had a few minor comments. Will PR interfere with #21166?

Thanks for the review!! I dont think this PR will interfere with #21166 - This PR has more to do with Marlin than mxfp4.

bnellnm

LGTM!

mgoin

Great work dealing with the tricky cases in Marlin, LGTM!

mgoin · 2025-09-25T20:39:32Z

vllm/model_executor/layers/fused_moe/modular_kernel.py

 #


+def _is_marlin_mxfp4_w4an(quant_config: Optional[FusedMoEQuantConfig] = None):


nit: call it _is_marlin_mxfp4_w4aN to make it more clear

mgoin · 2025-10-03T21:00:09Z

vllm/model_executor/layers/quantization/utils/marlin_utils.py

+    Given Marlin packed weight matrices w1_packed, and w2_packed,
+    return the MoE intermediate size N 
+    """
+    marlin_tile_size = 16


What does this tile size actually correspond to? Would be good to leave a note

…m-project#25488) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

…m-project#25488) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

…m-project#25488) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>

varun-sundar-rabindranath requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 23, 2025 16:18

varun-sundar-rabindranath marked this pull request as draft September 23, 2025 16:18

mergify bot added the gpt-oss Related to GPT-OSS models label Sep 23, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 23, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/fused_marlin_moe.py Outdated Show resolved Hide resolved

varun-sundar-rabindranath marked this pull request as ready for review September 24, 2025 14:06

varun-sundar-rabindranath commented Sep 24, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/modular_kernel.py Outdated Show resolved Hide resolved

bnellnm reviewed Sep 30, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/fused_marlin_moe.py Show resolved Hide resolved

bnellnm reviewed Sep 30, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/fused_marlin_moe.py Show resolved Hide resolved

bnellnm reviewed Sep 30, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/fused_marlin_moe.py Outdated Show resolved Hide resolved

varun-sundar-rabindranath mentioned this pull request Sep 30, 2025

[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels #25997

Open

GPTOSS High-Throughput All2All via Marlin

d9d38b0

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath force-pushed the varun/gptoss-marlin branch from c1c70de to d9d38b0 Compare October 1, 2025 17:10

Varun Sundar Rabindranath added 2 commits October 1, 2025 17:47

fix docs

fee1a2e

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

move moe_problem_size inside fusedexperts

c3f548a

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

mergify bot added the documentation Improvements or additions to documentation label Oct 1, 2025

varun-sundar-rabindranath commented Oct 1, 2025

View reviewed changes

docs/design/moe_kernel_features.md Show resolved Hide resolved

varun-sundar-rabindranath requested a review from bnellnm October 1, 2025 19:38

bnellnm approved these changes Oct 1, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 3, 2025

mgoin approved these changes Oct 3, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Oct 3, 2025

Merge branch 'main' into varun/gptoss-marlin

4b3c5ac

mgoin merged commit 7ef40bb into vllm-project:main Oct 4, 2025
56 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Oct 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels #25488

[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels #25488

varun-sundar-rabindranath commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bnellnm commented Sep 30, 2025

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Oct 1, 2025

Uh oh!

bnellnm left a comment

Uh oh!

mgoin left a comment

Uh oh!

mgoin Sep 25, 2025

Uh oh!

mgoin Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

		#


		def _is_marlin_mxfp4_w4an(quant_config: Optional[FusedMoEQuantConfig] = None):

Uh oh!

[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels #25488

[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels #25488

Conversation

varun-sundar-rabindranath commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Benchmarks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Sep 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bnellnm commented Sep 30, 2025

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Oct 1, 2025

Uh oh!

bnellnm left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Sep 23, 2025 •

edited by github-actions bot

Loading