Skip to content

Conversation

samanamp
Copy link
Contributor

@samanamp samanamp commented Sep 15, 2025

Purpose

This includes optimized configs for GB200 on tp=1,2,4

Test Plan

python benchmarks/kernels/benchmark_moe_old.py --model $MODEL --dtype "fp8_w8a8" --tp 4

Test Result

Screenshot 2025-09-15 at 4 44 57 PM Screenshot 2025-09-15 at 4 48 02 PM
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the qwen Related to Qwen models label Sep 15, 2025
Signed-off-by: Saman Keon <samanamp@outlook.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces optimized configurations for Fused MoE kernels on the NVIDIA GB200 architecture. The changes consist of adding three new JSON configuration files for different tensor parallelism sizes (tp=1, 2, and 4) for a model with 512 experts and fp8_w8a8 data type. The configurations appear to be the result of performance tuning, are well-structured, and follow the existing conventions. The changes are straightforward and I don't see any issues. The PR looks good to merge.

@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@DarkLight1337 DarkLight1337 merged commit 238c4c1 into vllm-project:main Sep 16, 2025
55 of 56 checks passed
@samanamp samanamp deleted the qwen-next-kernels branch September 16, 2025 13:31
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Saman Keon <samanamp@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants