v0.1.4
Pre-release
Pre-release
Release Notes (v0.1.4)
数学逻辑对齐与训练鲁棒性增强
- 严格对齐官方限幅:移除了硬截断,恢复
eps1²限幅机制,并增加 FP32 下溢保护。 - 修复状态污染 Bug:修正了非量化 1D 路径中因原地操作(in-place)导致的 EMA 状态被意外覆盖的问题。
- 引入 NaN/Inf 防御:在 CUDA 算子中增加了对极端梯度的清洗逻辑,防止 Loss Spike 摧毁局部量化状态。
- 新增解耦权重衰减:提供
decoupled_weight_decay选项。 - 默认参数对齐:将
eps和relative_step的默认值对齐 PyTorch 官方。
Mathematical Logic Alignment & Training Robustness Enhancements
- Strict clamping alignment: Removed hard truncation, restored the
eps²clamping mechanism, and added FP32 underflow protection. - Fixed state pollution bug: Corrected an issue where EMA states were accidentally overwritten due to in-place operations in the non-quantized 1D path.
- Added NaN/Inf defense: Introduced gradient cleaning logic in CUDA operators to prevent loss spikes from destroying local quantization states.
- Added decoupled weight decay: Provides a
decoupled_weight_decayoption. - Default parameter alignment: Aligned default values of
epsandrelative_stepwith PyTorch official defaults.