Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaBound.iterations #4

Closed
iperov opened this issue Mar 11, 2019 · 10 comments
Closed

AdaBound.iterations #4

iperov opened this issue Mar 11, 2019 · 10 comments

Comments

@iperov
Copy link

iperov commented Mar 11, 2019

this param is not saved.

I looked at official pytorch implementation from original paper.
https://github.com/Luolc/AdaBound/blob/master/adabound/adabound.py

it has

# State initialization
if len(state) == 0:
    state['step'] = 0

state is saved with the optimizer.

also it has

# Exponential moving average of gradient values
state['exp_avg'] = torch.zeros_like(p.data)
# Exponential moving average of squared gradient values
state['exp_avg_sq'] = torch.zeros_like(p.data)

these values should also be saved

So your keras implementation is wrong.

@titu1994
Copy link
Owner

titu1994 commented Mar 11, 2019

The implementation is correct.

State initialization occurs here : https://github.com/titu1994/keras-adabound/blob/master/adabound.py#L66

self.iterations is inherited from the base class Optimizer.

Exponential weight saving is done automatically via lines https://github.com/titu1994/keras-adabound/blob/master/adabound.py#L76-L82

self.weigts and self.iterations is saved when you say model.save() or use Model checkpoint with save_weights_only=False.

Lastly, before accusing something of being "wrong", be quite sure that you fully understand how the class works.

@iperov
Copy link
Author

iperov commented Mar 11, 2019

show me where iterations saved by base Optimizer

https://github.com/keras-team/keras/blob/master/keras/optimizers.py

@iperov
Copy link
Author

iperov commented Mar 11, 2019

there is no code in keras which saves .iterations

https://github.com/keras-team/keras/search?q=.iterations&unscoped_q=.iterations

@titu1994
Copy link
Owner

titu1994 commented Mar 11, 2019

Fair point, iterations is not saved, and that's a simple fix. It affects continued retraining, not initial training.

@iperov
Copy link
Author

iperov commented Mar 11, 2019

Sure.
LR decay just reset every time I restore model from save LOL

titu1994 added a commit that referenced this issue Mar 11, 2019
@iperov
Copy link
Author

iperov commented Mar 11, 2019

and .weights are also not saved, because they consume (same amount of memory as trainable weights) * 2
due to

ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]

so I think you dont know how keras works.

@titu1994
Copy link
Owner

titu1994 commented Mar 11, 2019

@iperov
Copy link
Author

iperov commented Mar 11, 2019

ok then, sorry

titu1994 added a commit that referenced this issue Mar 11, 2019
@iperov
Copy link
Author

iperov commented Mar 11, 2019

@titu1994 do you know what will happen with Adam and model, if all moving weights reinitialized to zero every N iterations ?

@titu1994
Copy link
Owner

Any optimizer that uses bias correction, such as Adam or RMSProp I believe, requires the iteration count to steadily increase so that the exponential bias correction factor can decrease to zero.

If iteration count is reset, this will affect the moving average ws, vs.

If these moving averages are affected, then the gradient updated will be somewhat incorrect as the weight update is a moving average between the current gradient gt and the history of gradients, ws. As ws is compromised due to being set to zeros, gradient update is somewhat weaker and that will affect accuracy.

Similar case with vs and that will affect the effective learning rate of the Adam optimizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants