Release v0.1.3 · yanfeiwong/adafactor-8bit

Release Notes (v0.1.3)

底层量化逻辑重构，引入对数空间映射与即时 CUDA 算子：

对数空间量化重构：将二阶矩（方差）的量化方式从线性空间迁移至对数空间（log2/exp2）。更契合方差的长尾分布特性，有效缓解极小方差被截断为零导致的权重震荡问题。
CUDA 算子零物化融合：移除了 Python 端的中间张量实例化。反量化、EMA 更新与重新量化等操作均在 CUDA 内部即时（On-the-fly）完成，减少了显存读写开销与峰值显存占用。
状态管理精简：将 step 计数器从 GPU Tensor 调整为 Python 原生标量，减少了设备端显存开销，使控制流更加轻量。

Fundamental quantization logic overhaul, introducing log-space mapping and on-the-fly CUDA kernels to improve numerical stability and memory efficiency:

Log-Space Quantization Overhaul: Migrated the quantization of the second moment (variance) from linear space to log-space (log2/exp2). Better accommodates the long-tail distribution of variances, effectively mitigating weight oscillations caused by small variances being truncated to zero and improving numerical stability.
Zero-Materialization CUDA Kernels: Removed intermediate tensor instantiation in Python. Operations such as dequantization, EMA updates, and requantization are now performed on-the-fly entirely within CUDA, reducing memory bandwidth overhead and peak memory usage.
Streamlined State Management: Downgraded the step counter from a GPU Tensor to a native Python scalar, reducing device-side memory overhead and making the control flow more lightweight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Release Notes (v0.1.3)

Uh oh!