v0.1.4

Pre-release

Pre-release

yanfeiwong released this 08 Jun 10:47

· 18 commits to main since this release

4210192

Release Notes (v0.1.4)

数学逻辑对齐与训练鲁棒性增强

严格对齐官方限幅：移除了硬截断，恢复 eps1² 限幅机制，并增加 FP32 下溢保护。
修复状态污染 Bug：修正了非量化 1D 路径中因原地操作（in-place）导致的 EMA 状态被意外覆盖的问题。
引入 NaN/Inf 防御：在 CUDA 算子中增加了对极端梯度的清洗逻辑，防止 Loss Spike 摧毁局部量化状态。
新增解耦权重衰减：提供 decoupled_weight_decay 选项。
默认参数对齐：将 eps 和 relative_step 的默认值对齐 PyTorch 官方。

Mathematical Logic Alignment & Training Robustness Enhancements

Strict clamping alignment: Removed hard truncation, restored the eps² clamping mechanism, and added FP32 underflow protection.
Fixed state pollution bug: Corrected an issue where EMA states were accidentally overwritten due to in-place operations in the non-quantized 1D path.
Added NaN/Inf defense: Introduced gradient cleaning logic in CUDA operators to prevent loss spikes from destroying local quantization states.
Added decoupled weight decay: Provides a decoupled_weight_decay option.
Default parameter alignment: Aligned default values of eps and relative_step with PyTorch official defaults.

Assets 2