-
Notifications
You must be signed in to change notification settings - Fork 615
Add support for scheduled weight decays in RectifiedAdam. #1974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for scheduled weight decays in RectifiedAdam. #1974
Conversation
RAdam implements weight decay based on AdamW, and the latter supports scheduling for both learning rate and weight decays as part of its warm restarts version. This patch extends existing support of Keras schedulers for the learning rate to weight decay, matching the weight decay features of AdamW.
You are owner of some files modified in this pull request. |
This problem appears to be already affecting the deserialization of learning rate schedulers regardless of this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM :-) Thanks for the contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
…#1974) * Add support for scheduled weight decays in RectifiedAdam. RAdam implements weight decay based on AdamW, and the latter supports scheduling for both learning rate and weight decays as part of its warm restarts version. This patch extends existing support of Keras schedulers for the learning rate to weight decay, matching the weight decay features of AdamW. * Fix code style issues. * Fix deserialization of schedulers. This problem appears to be already affecting the deserialization of learning rate schedulers regardless of this patch. * Fix comparison when using the optimizer inside a tf.function.
…#1974) * Add support for scheduled weight decays in RectifiedAdam. RAdam implements weight decay based on AdamW, and the latter supports scheduling for both learning rate and weight decays as part of its warm restarts version. This patch extends existing support of Keras schedulers for the learning rate to weight decay, matching the weight decay features of AdamW. * Fix code style issues. * Fix deserialization of schedulers. This problem appears to be already affecting the deserialization of learning rate schedulers regardless of this patch. * Fix comparison when using the optimizer inside a tf.function.
RAdam implements weight decay based on AdamW, and the latter supports scheduling for both learning rate and weight decays.
This patch extends existing support of Keras schedulers for the learning rate to weight decay, matching the weight decay features of AdamW.
fixes #1908