[QWEN NEXT] Fused MoE kernels Optimization configs #24924

samanamp · 2025-09-15T23:45:40Z

Purpose

This includes optimized configs for GB200 on tp=1,2,4

Test Plan

python benchmarks/kernels/benchmark_moe_old.py --model $MODEL --dtype "fp8_w8a8" --tp 4

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Saman Keon <samanamp@outlook.com>

gemini-code-assist

Code Review

This pull request introduces optimized configurations for Fused MoE kernels on the NVIDIA GB200 architecture. The changes consist of adding three new JSON configuration files for different tensor parallelism sizes (tp=1, 2, and 4) for a model with 512 experts and fp8_w8a8 data type. The configurations appear to be the result of performance tuning, are well-structured, and follow the existing conventions. The changes are straightforward and I don't see any issues. The PR looks good to merge.

Signed-off-by: Saman Keon <samanamp@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

mergify bot added the qwen Related to Qwen models label Sep 15, 2025

QWEN next gb200 kernels

7a6a89c

Signed-off-by: Saman Keon <samanamp@outlook.com>

gemini-code-assist bot reviewed Sep 15, 2025

View reviewed changes

samanamp force-pushed the qwen-next-kernels branch from f0ce07d to 7a6a89c Compare September 15, 2025 23:46

jeejeelee approved these changes Sep 16, 2025

View reviewed changes

Merge branch 'main' into qwen-next-kernels

812effe

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

DarkLight1337 merged commit 238c4c1 into vllm-project:main Sep 16, 2025
55 of 56 checks passed

samanamp deleted the qwen-next-kernels branch September 16, 2025 13:31

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[QWEN NEXT] Fused MoE kernels Optimization configs (vllm-project#24924)

67301af

Signed-off-by: Saman Keon <samanamp@outlook.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[QWEN NEXT] Fused MoE kernels Optimization configs #24924

[QWEN NEXT] Fused MoE kernels Optimization configs #24924

Uh oh!

samanamp commented Sep 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[QWEN NEXT] Fused MoE kernels Optimization configs #24924

[QWEN NEXT] Fused MoE kernels Optimization configs #24924

Uh oh!

Conversation

samanamp commented Sep 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

samanamp commented Sep 15, 2025 •

edited by github-actions bot

Loading