New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AdaBound.iterations #4
Comments
The implementation is correct. State initialization occurs here : https://github.com/titu1994/keras-adabound/blob/master/adabound.py#L66
Exponential weight saving is done automatically via lines https://github.com/titu1994/keras-adabound/blob/master/adabound.py#L76-L82
Lastly, before accusing something of being "wrong", be quite sure that you fully understand how the class works. |
show me where iterations saved by base Optimizer https://github.com/keras-team/keras/blob/master/keras/optimizers.py |
there is no code in keras which saves .iterations https://github.com/keras-team/keras/search?q=.iterations&unscoped_q=.iterations |
Fair point, iterations is not saved, and that's a simple fix. It affects continued retraining, not initial training. |
Sure. |
and .weights are also not saved, because they consume
so I think you dont know how keras works. |
https://github.com/keras-team/keras/blob/master/keras/engine/network.py#L1090-L1162 Thats how Optimizer weights are saved and restored, not by the optimizer class directly. Please read how to save models and their optimizers jointly at Keras documentation at https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model |
ok then, sorry |
@titu1994 do you know what will happen with Adam and model, if all moving weights reinitialized to zero every N iterations ? |
Any optimizer that uses bias correction, such as Adam or RMSProp I believe, requires the iteration count to steadily increase so that the exponential bias correction factor can decrease to zero. If iteration count is reset, this will affect the moving average ws, vs. If these moving averages are affected, then the gradient updated will be somewhat incorrect as the weight update is a moving average between the current gradient gt and the history of gradients, ws. As ws is compromised due to being set to zeros, gradient update is somewhat weaker and that will affect accuracy. Similar case with vs and that will affect the effective learning rate of the Adam optimizer. |
this param is not saved.
I looked at official pytorch implementation from original paper.
https://github.com/Luolc/AdaBound/blob/master/adabound/adabound.py
it has
state is saved with the optimizer.
also it has
these values should also be saved
So your keras implementation is wrong.
The text was updated successfully, but these errors were encountered: