Skip to content

Conversation

CSWYF3634076
Copy link
Contributor

@CSWYF3634076 CSWYF3634076 commented Sep 30, 2025

fix issue #25833

refer transformers:https://github.com/huggingface/transformers/blob/main/src/transformers/models/ernie4_5_moe/modeling_ernie4_5_moe.py#L342

moe_gate -> float32
e_score_correction_bias -> float32

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly changes the data type of the MoE gate and bias parameters to float32 for the ernie45_moe model, aligning it with the original implementation. However, a similar change in ernie45_vl_moe.py is incomplete. While the params_dtype is set to float32, the model's general quantization configuration is still passed to the gate layers. This could lead to the gates being quantized, which would negate the intended fix and potentially cause correctness issues. I've added comments to explicitly set quant_config=None for the gate layers in ernie45_vl_moe.py to ensure they remain unquantized.

config.moe_num_experts[0],
params_dtype=torch.float32,
bias=False,
quant_config=quant_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

To ensure the gate layer is not quantized, quant_config should be explicitly set to None. Currently, it's passing the general quant_config from the model, which could lead to the gate being quantized if any quantization method is enabled for the model. This would contradict the purpose of setting params_dtype=torch.float32.

Suggested change
quant_config=quant_config,
quant_config=None,

config.moe_num_experts[1],
bias=False,
params_dtype=torch.float32,
quant_config=quant_config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to the text_experts_gate, the quant_config for vision_experts_gate should also be set to None to prevent it from being quantized. This ensures the gate operates in float32 as intended.

Suggested change
quant_config=quant_config,
quant_config=None,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fix

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bias=False,
quant_config=quant_config,
params_dtype=torch.float32,
quant_config=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Why is it set to None? Is it because of GPTQ quantization?

Copy link
Contributor Author

@CSWYF3634076 CSWYF3634076 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Why is it set to None? Is it because of GPTQ quantization?

Based on the advice of the ai-assistant and referencing the gate of other models.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please don't modify it. Setting quant_config directly to None may affect quantized models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please don't modify it. Setting quant_config directly to None may affect quantized models.

Okay, it's already done

Signed-off-by: wangyafeng <wangyafeng@baidu.com>
@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025
@jeejeelee jeejeelee merged commit ef6e0e7 into vllm-project:main Sep 30, 2025
50 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants