-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Doubts on the paper "universal transformers". #1215
Comments
I'm pretty sure there's a typo in equation 4. |
@senarvi thx, I consider the same as you. |
I believe Eq 4 is typo. Eq 5 may be typo as well, but could also be misinterpretation of Figure 4. I think you can have a check on the code to figure it out. |
Yes! there are small typos as well as a problem in fig4 in the current arXiv version of the paper. We'll update it soon. In the meantime, you can check the slides here and, as always, a better way to understand what's going on exactly is digging into the code :) |
@MostafaDehghani Very lucky to have the slides, thanks! |
Hi @MostafaDehghani , thank you for the slides! they are really helpful. On a side note, may i inquire if UT and transformer both use the EN-DE default generator provided in the tensor2tensor library? i noticed the version is the same, but i want to be certain. |
Yes, we used |
thank you |
Thanks @MostafaDehghani and others. |
Description
The detailed figure 4 in appendix seems to do not follow the iterative equations (4)(5) in the paper. If I follow the figure, it should be H^t = LayerNorm(A^t+Transition(A^t)), and A^t = LayerNorm(H^(t-1)+P^t+MultiHeadSelfAttention(H^(t-1)+P^t)). It is very confusing. Could anyone help me to figure this doubt out? Thank you!
The text was updated successfully, but these errors were encountered: