v0.1.6
Release Notes (v0.1.6)
v0.1.6
CUDA 数值稳定性增强与 Kernel 性能优化
- CUDA 更新逻辑引入对数空间 (Log-Space) 计算:重构了 1D 和 2D 参数的更新逻辑,将线性空间的乘除法转换为对数空间的加减法,保持数学逻辑严格等价。
- 解决极小方差相乘导致的浮点下溢 (Underflow) 问题:针对极端梯度场景下极小方差相乘导致结果归零的失真现象进行了修复,提升了长尾分布下的数值鲁棒性。
- 优化 CUDA Kernel 数学指令:调整底层数学指令组合,利用硬件 SFU 指令替代部分复杂的指数与开方运算,降低 Kernel 执行开销。
CUDA Numerical Stability Enhancements & Kernel Performance Optimization
- Log-Space Computation in CUDA Kernels: Refactored 1D and 2D parameter updates, replacing linear multiplications and divisions with log-space additions and subtractions while maintaining strict mathematical equivalence.
- Underflow Mitigation for Small Variances: Fixed the zeroing distortion that occurs when multiplying very small variances in extreme gradient scenarios, improving numerical robustness for long-tail distributions.
- CUDA Math Instruction Optimization: Adjusted underlying math instruction combinations, leveraging hardware SFU instructions to replace certain complex exponential and square root operations, thereby reducing kernel execution overhead.