Join GitHub today
[Intel MKL] SparseAdam: an optimizer to improve the accuracy of LazyAdam #24788
LazyAdam is not convergent on NCF (MLPerf requires NCF to reach 0.635 HR in 10 epchos, but with LazyAdam HR can’t reach the goal).
The main problem of LazyAdam is it doesn't update m and v when the gradient is 0. Actually, m and v should be updated when other weight changed. So the convergence accuracy of LazyAdam is lower than Adam.
I have designed a method to optimize Adam on Sparse data (named SparseAdam) : SparseAdam provides similar semantics as the original Adam algorithm, so the convergence is also close to Adam, and the TPT is about the same as LazyAdam.
The only change of our method to LazyAdam is we compute the learning rate based the the steps skipped.
Normally I'd accept this, but we're in the process of transitioning out of contrib and into the new addons SIG ( https://groups.google.com/a/tensorflow.org/forum/#!forum/addons for more information). So until that release is done I'd like to soft-freeze contrib to make the migration easier.