You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In the paper it is mentioned that AGC is NOT used in the final layer.
Expected behavior
It will be great AGC can be disabled for final layer. Adding a check here may be sufficient to get the desired behavior.
Please let me know if i am misunderstanding something. Thanks.
The text was updated successfully, but these errors were encountered:
Hi, the SGD_AGC implementation kind of restricts the implementation since checking for linear layers makes it unusable for Transformer like architectures, but I could generalize this function in the generic AGC optimizer where I ignore the names of the modules. You can find the code in this commit eaa4b87 and should be availble in the latest release. Do let me know if this works out !
Describe the bug
In the paper it is mentioned that AGC is NOT used in the final layer.
Expected behavior
It will be great AGC can be disabled for final layer. Adding a check here may be sufficient to get the desired behavior.
Please let me know if i am misunderstanding something. Thanks.
The text was updated successfully, but these errors were encountered: