Skip to content

XIANGLONGYAN/D2Quant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

D2Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

Xianglong Yan, Chengzhu Bao, Zhiteng Li, Tianao Zhang, Shaoqiu Zhang, Ruobing Xie, Xingwu Sun, and Yulun Zhang, "D²Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs", arXiv, 2026


🔥 News

  • 2026-01-30: This repo is released.

Abstract: Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory usage and enables practical speedup without low-bit operators or specialized hardware. However, accuracy often degrades significantly in weight-only PTQ at sub-4-bit precision, and our analysis identifies two main causes: (1) down-projection matrices are a well-known quantization bottleneck, but maintaining their fidelity often requires extra bit-width; (2) weight quantization induces activation deviations, but effective correction strategies remain underexplored. To address these issues, we propose D²Quant, a novel weight-only PTQ framework that improves quantization from both the weight and activation perspectives. On the weight side, we design a Dual-Scale Quantizer (DSQ) tailored to down-projection matrices, with an absorbable scaling factor that significantly improves accuracy without increasing the bit budget. On the activation side, we propose Deviation-Aware Correction (DAC), which incorporates a mean-shift correction within LayerNorm to mitigate quantization-induced activation distribution shifts. Extensive experiments across multiple LLM families and evaluation metrics show that D²Quant delivers superior performance for weight-only PTQ at sub-4-bit precision. The code and models will be available at https://github.com/XIANGLONGYAN/D2Quant.


⚒️ TODO

  • Complete this repository

🔗 Contents

🔎 Results

D2Quant demonstrates superior performance on the Qwen-3 model series under 2-bit weight-only quantization. (click to expand)

D2Quant demonstrates superior performance on the LLaMA-3 and LLaMA-3.1 model series under 2-bit weight-only quantization. (click to expand)

Citation

If you find the code helpful in your research or work, please cite the following paper.

@article{yan2026d,
  title={D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs},
  author={Yan, Xianglong and Bao, Chengzhu and Li, Zhiteng and Zhang, Tianao and Zhang, Shaoqiu and Xie, Ruobing and Sun, Xingwu and Zhang, Yulun},
  journal={arXiv preprint arXiv:2602.02546},
  year={2026}
}

💡 Acknowledgements

This work is released under the Apache 2.0 license.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors