Release Notes (v0.1.9)
新功能
-
Adafactor 路径支持 Fira 限制器
新增
enable_fira_for_adafactor参数。标准 Adafactor 更新路径现已支持 Fira 范数增长限制器,可用于平滑参数更新并减少 Loss 尖峰。在许多训练场景下,这使得外部梯度裁剪(如clip_grad_norm_)不再必要,从而简化训练流程。 -
可配置的
fira_margin新增全局参数
fira_margin(默认值:0.01)。此前 APOLLO 路径中硬编码的 1% 范数增长阈值现已提取为可配置项,并同时作用于 APOLLO 与 Adafactor 两条更新路径。用户可以根据具体任务灵活调整范数增长控制的容忍范围。
文档与示例
-
新增高级混合路径示例
新增
examples/advanced_usage.py,展示针对不同参数类型与张量维度的混合路径实践,并演示如何在同一个优化器配置中组合使用 APOLLO、Adafactor、量化以及 Fira 限制器。
New Features
-
Fira Limiter Support for the Adafactor Path
Added the
enable_fira_for_adafactoroption. The standard Adafactor update path can now leverage the Fira Norm-Growth Limiter to smooth parameter updates and reduce loss spikes. In many training setups, this makes external gradient clipping (e.g.clip_grad_norm_) unnecessary, resulting in a simpler training pipeline. -
Configurable
fira_marginAdded the global
fira_marginparameter (default:0.01). The previously hardcoded 1% norm-growth threshold used by the APOLLO path is now fully configurable and shared across both APOLLO and Adafactor routes. This allows users to tune the aggressiveness of norm-growth control for different workloads.
Documentation & Examples
-
Advanced Hybrid Routing Example
Added
examples/advanced_usage.py, demonstrating practical hybrid routing strategies for architectures containing different parameter types and tensor shapes. The example also showcases how to combine APOLLO, Adafactor, quantization, and the Fira Limiter in a single optimizer configuration.