Skip to content

v0.1.9

Latest

Choose a tag to compare

@yanfeiwong yanfeiwong released this 21 Jun 12:02
· 2 commits to main since this release

Release Notes (v0.1.9)

新功能

  • Adafactor 路径支持 Fira 限制器

    新增 enable_fira_for_adafactor 参数。标准 Adafactor 更新路径现已支持 Fira 范数增长限制器,可用于平滑参数更新并减少 Loss 尖峰。在许多训练场景下,这使得外部梯度裁剪(如 clip_grad_norm_)不再必要,从而简化训练流程。

  • 可配置的 fira_margin

    新增全局参数 fira_margin(默认值:0.01)。此前 APOLLO 路径中硬编码的 1% 范数增长阈值现已提取为可配置项,并同时作用于 APOLLO 与 Adafactor 两条更新路径。用户可以根据具体任务灵活调整范数增长控制的容忍范围。

文档与示例

  • 新增高级混合路径示例

    新增 examples/advanced_usage.py,展示针对不同参数类型与张量维度的混合路径实践,并演示如何在同一个优化器配置中组合使用 APOLLO、Adafactor、量化以及 Fira 限制器。


New Features

  • Fira Limiter Support for the Adafactor Path

    Added the enable_fira_for_adafactor option. The standard Adafactor update path can now leverage the Fira Norm-Growth Limiter to smooth parameter updates and reduce loss spikes. In many training setups, this makes external gradient clipping (e.g. clip_grad_norm_) unnecessary, resulting in a simpler training pipeline.

  • Configurable fira_margin

    Added the global fira_margin parameter (default: 0.01). The previously hardcoded 1% norm-growth threshold used by the APOLLO path is now fully configurable and shared across both APOLLO and Adafactor routes. This allows users to tune the aggressiveness of norm-growth control for different workloads.

Documentation & Examples

  • Advanced Hybrid Routing Example

    Added examples/advanced_usage.py, demonstrating practical hybrid routing strategies for architectures containing different parameter types and tensor shapes. The example also showcases how to combine APOLLO, Adafactor, quantization, and the Fira Limiter in a single optimizer configuration.