Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question #9

Closed
songwoh opened this issue May 24, 2022 · 2 comments
Closed

Question #9

songwoh opened this issue May 24, 2022 · 2 comments

Comments

@songwoh
Copy link

songwoh commented May 24, 2022

Hi, first of all, thank you for sharing this amazing work!
I had a question with regards to how you derived equation(13), and (14) on your paper.
(Part about implicit differentiation when calculating the gradients).

Please correct me if I'm wrong, but I think the sign of the second term in (13) should be negative.

Could you explain in details how you derived (13) from (12)?

Thank you.

@xuchen-ethz
Copy link
Owner

Hi @songwoh,

equation 13 is derived by taking the derivative of equation 12 wrt network weights sigma.

The derivative of the first term in 12, d_sigma(x, B), is the two terms on the left-hand side in 13 (total derivative + chain rule). And hence the plus sign in 13 (total derivative).

The derivative of the second term in 12, x', is 0, because x' is the query input and is independent of network weights.

I hope this helps. Let me know if something is still not clear.

@songwoh
Copy link
Author

songwoh commented May 24, 2022

Thank you for clearing this up for me! Now it makes sense.

@songwoh songwoh closed this as completed May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants