-
I think that the implementation of nfnet does not consider the update of beta. In the paper https://arxiv.org/pdf/2101.08692.pdf , the authors update beta after each residual block as:
In the implementation, it seems to me that beta is equal to the initialization everywhere. Is this correct? |
Beta Was this translation helpful? Give feedback.
Answered by
vballoli
Apr 21, 2021
Replies: 1 comment 1 reply
-
Yeah, it'll be fixed soon in addition to the training scripts. Thanks a lot for noticing this and bringing this up. Appreciate the effort! |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
simomagi
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Yeah, it'll be fixed soon in addition to the training scripts. Thanks a lot for noticing this and bringing this up. Appreciate the effort!