Skip to content

v0.1.4

Pre-release
Pre-release

Choose a tag to compare

@yanfeiwong yanfeiwong released this 08 Jun 10:47
· 18 commits to main since this release

Release Notes (v0.1.4)

数学逻辑对齐与训练鲁棒性增强

  • 严格对齐官方限幅:移除了硬截断,恢复 eps1² 限幅机制,并增加 FP32 下溢保护。
  • 修复状态污染 Bug:修正了非量化 1D 路径中因原地操作(in-place)导致的 EMA 状态被意外覆盖的问题。
  • 引入 NaN/Inf 防御:在 CUDA 算子中增加了对极端梯度的清洗逻辑,防止 Loss Spike 摧毁局部量化状态。
  • 新增解耦权重衰减:提供 decoupled_weight_decay 选项。
  • 默认参数对齐:将 epsrelative_step 的默认值对齐 PyTorch 官方。

Mathematical Logic Alignment & Training Robustness Enhancements

  • Strict clamping alignment: Removed hard truncation, restored the eps² clamping mechanism, and added FP32 underflow protection.
  • Fixed state pollution bug: Corrected an issue where EMA states were accidentally overwritten due to in-place operations in the non-quantized 1D path.
  • Added NaN/Inf defense: Introduced gradient cleaning logic in CUDA operators to prevent loss spikes from destroying local quantization states.
  • Added decoupled weight decay: Provides a decoupled_weight_decay option.
  • Default parameter alignment: Aligned default values of eps and relative_step with PyTorch official defaults.