Skip to content

Conversation

yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 23, 2025

Purpose

vllm bench throughput --model Qwen/Qwen3-30B-A3B-FP8 --load-format dummy --input-len 1000 --output-len 100 --trust_remote_code --enable-expert-parallel

will raise error

(EngineCore_DP0 pid=265229)     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/v1/executor/abstract.py", line 75, in initialize_from_config
(EngineCore_DP0 pid=265229)     self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=265229)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=265229)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/utils/__init__.py", line 3010, in run_method
(EngineCore_DP0 pid=265229)     return func(*args, **kwargs)
(EngineCore_DP0 pid=265229)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_worker.py", line 341, in compile_or_warm_up_model
(EngineCore_DP0 pid=265229)     kernel_warmup(self)
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/model_executor/warmup/kernel_warmup.py", line 34, in kernel_warmup
(EngineCore_DP0 pid=265229)     deep_gemm_warmup(model, max_tokens)
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/model_executor/warmup/deep_gemm_warmup.py", line 228, in deep_gemm_warmup
(EngineCore_DP0 pid=265229)     deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(model, max_tokens)
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/model_executor/warmup/deep_gemm_warmup.py", line 221, in deepgemm_grouped_fp8_gemm_nt_contiguous_warmup
(EngineCore_DP0 pid=265229)     _extract_data_from_fused_moe_module(dgm))
(EngineCore_DP0 pid=265229)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=265229)   File "/home/wentao/vllm-source/vllm/model_executor/warmup/deep_gemm_warmup.py", line 56, in _extract_data_from_fused_moe_module
(EngineCore_DP0 pid=265229)     w13_s = getattr(m, "w13_weight_scale_inv", m.w13_weight_scale)
(EngineCore_DP0 pid=265229)                                                ^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=265229)   File "/home/wentao/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1940, in __getattr__
(EngineCore_DP0 pid=265229)     raise AttributeError(
(EngineCore_DP0 pid=265229) AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv'?

This PR fixes that

Test

Throughput: 43.80 requests/s, 48056.39 total tokens/s, 4379.63 output tokens/s
Total num prompt tokens:  997271
Total num output tokens:  100000

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a critical AttributeError that occurred during model warmup in deep_gemm_warmup.py. The original code's use of getattr with a direct attribute access as the default value led to an eager evaluation that caused a crash when the fallback attribute didn't exist. The new implementation correctly defers access to the fallback attribute, resolving the issue. The fix is sound. I've added a couple of suggestions to make the implementation more concise and Pythonic by using conditional expressions.

yewentao256 and others added 4 commits September 23, 2025 14:41
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
@mgoin mgoin enabled auto-merge (squash) September 24, 2025 00:01
@mgoin mgoin merged commit 88d7bdb into vllm-project:main Sep 24, 2025
50 of 52 checks passed
@yewentao256 yewentao256 deleted the wye-fix-w13_weight_scale_inv-no-attr-error branch September 24, 2025 00:15
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…ght_scale'. Did you mean: 'w13_weight_scale_inv' (vllm-project#25519)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
yewentao256 added a commit that referenced this pull request Oct 3, 2025
…ght_scale'. Did you mean: 'w13_weight_scale_inv' (#25519)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants