Update hf2mcore_deepseek_v3_moe.py #495

lmc8133 · 2025-03-07T09:16:44Z

When setting expert-tensor-parallel-size=1, the weights of routed-experts will be saved repeatly during different tp_rank. It will consume ~1.3T*${TP} disk space, it's too huge!!

when setting expert-tensor-parallel-size=1, the weights of routed-experts will be saved repeatly during different tp_rank

CLAassistant · 2025-03-07T09:16:51Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Update hf2mcore_deepseek_v3_moe.py

5d693f5

when setting expert-tensor-parallel-size=1, the weights of routed-experts will be saved repeatly during different tp_rank

lmc8133 mentioned this pull request Mar 7, 2025

DeepSeek V3权重转换脚本存在bug #494

Closed

jerryli1981 force-pushed the main branch from 0f87624 to 2753cac Compare March 11, 2025 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update hf2mcore_deepseek_v3_moe.py #495

Update hf2mcore_deepseek_v3_moe.py #495

Uh oh!

lmc8133 commented Mar 7, 2025

Uh oh!

CLAassistant commented Mar 7, 2025

Uh oh!

Uh oh!

Update hf2mcore_deepseek_v3_moe.py #495

Are you sure you want to change the base?

Update hf2mcore_deepseek_v3_moe.py #495

Uh oh!

Conversation

lmc8133 commented Mar 7, 2025

Uh oh!

CLAassistant commented Mar 7, 2025

Uh oh!

Uh oh!