D²Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

Xianglong Yan, Chengzhu Bao, Zhiteng Li, Tianao Zhang, Shaoqiu Zhang, Ruobing Xie, Xingwu Sun, and Yulun Zhang, "D²Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs", arXiv, 2026

🔥 News

2026-01-30: This repo is released.

Abstract: Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory usage and enables practical speedup without low-bit operators or specialized hardware. However, accuracy often degrades significantly in weight-only PTQ at sub-4-bit precision, and our analysis identifies two main causes: (1) down-projection matrices are a well-known quantization bottleneck, but maintaining their fidelity often requires extra bit-width; (2) weight quantization induces activation deviations, but effective correction strategies remain underexplored. To address these issues, we propose D²Quant, a novel weight-only PTQ framework that improves quantization from both the weight and activation perspectives. On the weight side, we design a Dual-Scale Quantizer (DSQ) tailored to down-projection matrices, with an absorbable scaling factor that significantly improves accuracy without increasing the bit budget. On the activation side, we propose Deviation-Aware Correction (DAC), which incorporates a mean-shift correction within LayerNorm to mitigate quantization-induced activation distribution shifts. Extensive experiments across multiple LLM families and evaluation metrics show that D²Quant delivers superior performance for weight-only PTQ at sub-4-bit precision. The code and models will be available at https://github.com/XIANGLONGYAN/D2Quant.

⚒️ TODO

Complete this repository

🔗 Contents

🔎 Results

D²Quant demonstrates superior performance on the Qwen-3 model series under 2-bit weight-only quantization. (click to expand)

D²Quant demonstrates superior performance on the LLaMA-3 and LLaMA-3.1 model series under 2-bit weight-only quantization. (click to expand)

Citation

If you find the code helpful in your research or work, please cite the following paper.

@article{yan2026d,
  title={D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs},
  author={Yan, Xianglong and Bao, Chengzhu and Li, Zhiteng and Zhang, Tianao and Zhang, Shaoqiu and Xie, Ruobing and Sun, Xingwu and Zhang, Yulun},
  journal={arXiv preprint arXiv:2602.02546},
  year={2026}
}

💡 Acknowledgements

This work is released under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D²Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

🔥 News

⚒️ TODO

🔗 Contents

🔎 Results

Citation

💡 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

D2Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

🔥 News

⚒️ TODO

🔗 Contents

🔎 Results

Citation

💡 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

D²Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs

Packages