You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first of all, thank you for sharing this amazing work!
I had a question with regards to how you derived equation(13), and (14) on your paper.
(Part about implicit differentiation when calculating the gradients).
Please correct me if I'm wrong, but I think the sign of the second term in (13) should be negative.
Could you explain in details how you derived (13) from (12)?
Thank you.
The text was updated successfully, but these errors were encountered:
equation 13 is derived by taking the derivative of equation 12 wrt network weights sigma.
The derivative of the first term in 12, d_sigma(x, B), is the two terms on the left-hand side in 13 (total derivative + chain rule). And hence the plus sign in 13 (total derivative).
The derivative of the second term in 12, x', is 0, because x' is the query input and is independent of network weights.
I hope this helps. Let me know if something is still not clear.
Hi, first of all, thank you for sharing this amazing work!
I had a question with regards to how you derived equation(13), and (14) on your paper.
(Part about implicit differentiation when calculating the gradients).
Please correct me if I'm wrong, but I think the sign of the second term in (13) should be negative.
Could you explain in details how you derived (13) from (12)?
Thank you.
The text was updated successfully, but these errors were encountered: