Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive gradient clipping shouldn't be used in the final classifier layer #8

Closed
sidml opened this issue Feb 17, 2021 · 2 comments
Closed

Comments

@sidml
Copy link

sidml commented Feb 17, 2021

Describe the bug
In the paper it is mentioned that AGC is NOT used in the final layer.
image

Expected behavior
It will be great AGC can be disabled for final layer. Adding a check here may be sufficient to get the desired behavior.
Please let me know if i am misunderstanding something. Thanks.

@vballoli
Copy link
Owner

Yeah, I did notice that in the paper, but I forgot to take note of it. I'll definitely add it. Thank you!

@vballoli
Copy link
Owner

Hi, the SGD_AGC implementation kind of restricts the implementation since checking for linear layers makes it unusable for Transformer like architectures, but I could generalize this function in the generic AGC optimizer where I ignore the names of the modules. You can find the code in this commit eaa4b87 and should be availble in the latest release. Do let me know if this works out !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants