Skip to content

Comments

feat: add w8a8_dynamic quant & support deepseek quant#391

Merged
wangxiyuan merged 3 commits intovllm-project:v0.7.3-devfrom
zzzzwwjj:v0.7.3-dev
Mar 28, 2025
Merged

feat: add w8a8_dynamic quant & support deepseek quant#391
wangxiyuan merged 3 commits intovllm-project:v0.7.3-devfrom
zzzzwwjj:v0.7.3-dev

Conversation

@zzzzwwjj
Copy link
Collaborator

@zzzzwwjj zzzzwwjj commented Mar 25, 2025

add w8a8_dynamic quant & support deepseek quant

@MengqingCao
Copy link
Collaborator

MengqingCao commented Mar 27, 2025

Try to record the reason why we patching deepseekv2 model, please add if there are any omissions:

  1. load offset weight in fused_moe_perchannel_weight_loader, the original weight loader in vllm 0.7.3 could only load scale and zero
  2. flatten weight to adapt npu ops
            if "scale" in name or "offset" in name:
                loaded_weight = loaded_weight.flatten()
  1. add packed_modules_mapping in deepseekv2 model so that the gate proj and up proj could be splitted

TODO:

zzzzwwjj and others added 3 commits March 27, 2025 22:23
Signed-off-by: zzzzwwjj <1183291235@qq.com>
fix vllm_ascend/models/deepseek_v2.py license

Co-authored-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
@MengqingCao
Copy link
Collaborator

LGTM, thx!

@wangxiyuan wangxiyuan merged commit 12390af into vllm-project:v0.7.3-dev Mar 28, 2025
13 checks passed
wangxiyuan pushed a commit that referenced this pull request Apr 7, 2025
- support deepseek quant
  - add w8a8_dynamic quant
see #391

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
- support deepseek quant
  - add w8a8_dynamic quant
see vllm-project#391

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
- support deepseek quant
  - add w8a8_dynamic quant
see vllm-project#391

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants