-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 #25936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly changes the data type of the MoE gate and bias parameters to float32
for the ernie45_moe
model, aligning it with the original implementation. However, a similar change in ernie45_vl_moe.py
is incomplete. While the params_dtype
is set to float32
, the model's general quantization configuration is still passed to the gate layers. This could lead to the gates being quantized, which would negate the intended fix and potentially cause correctness issues. I've added comments to explicitly set quant_config=None
for the gate layers in ernie45_vl_moe.py
to ensure they remain unquantized.
config.moe_num_experts[0], | ||
params_dtype=torch.float32, | ||
bias=False, | ||
quant_config=quant_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure the gate layer is not quantized, quant_config
should be explicitly set to None
. Currently, it's passing the general quant_config
from the model, which could lead to the gate being quantized if any quantization method is enabled for the model. This would contradict the purpose of setting params_dtype=torch.float32
.
quant_config=quant_config, | |
quant_config=None, |
config.moe_num_experts[1], | ||
bias=False, | ||
params_dtype=torch.float32, | ||
quant_config=quant_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have fix
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bias=False, | ||
quant_config=quant_config, | ||
params_dtype=torch.float32, | ||
quant_config=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Why is it set to None? Is it because of GPTQ quantization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Why is it set to None? Is it because of GPTQ quantization?
Based on the advice of the ai-assistant and referencing the gate of other models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please don't modify it. Setting quant_config
directly to None may affect quantized models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please don't modify it. Setting
quant_config
directly to None may affect quantized models.
Okay, it's already done
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…ct#25936) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
fix issue #25833
refer transformers:https://github.com/huggingface/transformers/blob/main/src/transformers/models/ernie4_5_moe/modeling_ernie4_5_moe.py#L342
moe_gate -> float32
e_score_correction_bias -> float32