Xianglong Yan, Chengzhu Bao, Zhiteng Li, Tianao Zhang, Shaoqiu Zhang, Ruobing Xie, Xingwu Sun, and Yulun Zhang, "D²Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs", arXiv, 2026
- 2026-01-30: This repo is released.
Abstract: Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quantization (PTQ) is appealing, as it reduces memory usage and enables practical speedup without low-bit operators or specialized hardware. However, accuracy often degrades significantly in weight-only PTQ at sub-4-bit precision, and our analysis identifies two main causes: (1) down-projection matrices are a well-known quantization bottleneck, but maintaining their fidelity often requires extra bit-width; (2) weight quantization induces activation deviations, but effective correction strategies remain underexplored. To address these issues, we propose D²Quant, a novel weight-only PTQ framework that improves quantization from both the weight and activation perspectives. On the weight side, we design a Dual-Scale Quantizer (DSQ) tailored to down-projection matrices, with an absorbable scaling factor that significantly improves accuracy without increasing the bit budget. On the activation side, we propose Deviation-Aware Correction (DAC), which incorporates a mean-shift correction within LayerNorm to mitigate quantization-induced activation distribution shifts. Extensive experiments across multiple LLM families and evaluation metrics show that D²Quant delivers superior performance for weight-only PTQ at sub-4-bit precision. The code and models will be available at https://github.com/XIANGLONGYAN/D2Quant.
- Complete this repository
- Post-training quantization
- Models
- Results
- Citation
- Acknowledgements
D2Quant demonstrates superior performance on the Qwen-3 model series under 2-bit weight-only quantization. (click to expand)
D2Quant demonstrates superior performance on the LLaMA-3 and LLaMA-3.1 model series under 2-bit weight-only quantization. (click to expand)
If you find the code helpful in your research or work, please cite the following paper.
@article{yan2026d,
title={D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs},
author={Yan, Xianglong and Bao, Chengzhu and Li, Zhiteng and Zhang, Tianao and Zhang, Shaoqiu and Xie, Ruobing and Sun, Xingwu and Zhang, Yulun},
journal={arXiv preprint arXiv:2602.02546},
year={2026}
}This work is released under the Apache 2.0 license.


