[Refactor] Remove DeepGEMM OP Register #25710

yewentao256 · 2025-09-25T21:07:10Z

Purpose

A follow up for #19085

We now have Warm up for DeepGEMM so JIT won't be a problem.

And we don't use torch compile for DeepGEMM after checking the code, so it is safe to remove the redundant logic

Test

Acc

lm_eval --model vllm --model_args "pretrained=Qwen/Qwen3-30B-A3B-FP8,max_model_len=32768,enforce_eager=True" --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto

# now
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8529|±  |0.0098|
|     |       |strict-match    |     5|exact_match|↑  |0.8855|±  |0.0088|
# main
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8529|±  |0.0098|
|     |       |strict-match    |     5|exact_match|↑  |0.8855|±  |0.0088|

Perf

vllm bench throughput --model Qwen/Qwen3-30B-A3B-FP8 --load-format dummy --input-len 1000 --output-len 100 --trust_remote_code --enable-expert-parallel

# now
Throughput: 41.33 requests/s, 45353.06 total tokens/s, 4133.26 output tokens/s
# main
Throughput: 41.28 requests/s, 45299.81 total tokens/s, 4128.41 output tokens/s

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist

Code Review

This pull request is a good refactoring that simplifies the DeepGEMM integration. It removes the redundant custom operator registration for w8a8_block_fp8_matmul_deepgemm by deleting the vllm/model_executor/layers/quantization/deepgemm.py file. The usage of the custom op is correctly replaced with a direct call to fp8_gemm_nt in vllm/model_executor/layers/quantization/utils/fp8_utils.py. This change removes unnecessary indirection and makes the code cleaner and easier to follow. The justification provided in the pull request description is sound, and the changes appear correct.

mgoin

Good call, unnecessary overhead

Signed-off-by: yewentao256 <zhyanwentao@126.com>

remove deepgemm register

2a64311

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners September 25, 2025 21:07

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

yewentao256 changed the title ~~[Refactor] Remove DeepGEMM Register~~ [Refactor] Remove DeepGEMM OP Register Sep 25, 2025

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 25, 2025

mgoin approved these changes Sep 26, 2025

View reviewed changes

mgoin merged commit 9fe4c2b into main Sep 26, 2025
63 checks passed

mgoin deleted the wentao-remove-deepgemm-register branch September 26, 2025 00:13

yewentao256 added a commit that referenced this pull request Oct 3, 2025

[Refactor] Remove DeepGEMM OP Register (#25710)

3a32aa8

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Refactor] Remove DeepGEMM OP Register #25710

[Refactor] Remove DeepGEMM OP Register #25710

Uh oh!

yewentao256 commented Sep 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Refactor] Remove DeepGEMM OP Register #25710

[Refactor] Remove DeepGEMM OP Register #25710

Uh oh!

Conversation

yewentao256 commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Acc

Perf

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 commented Sep 25, 2025 •

edited by github-actions bot

Loading