Question #9

songwoh · 2022-05-24T02:46:50Z

Hi, first of all, thank you for sharing this amazing work!
I had a question with regards to how you derived equation(13), and (14) on your paper.
(Part about implicit differentiation when calculating the gradients).

Please correct me if I'm wrong, but I think the sign of the second term in (13) should be negative.

Could you explain in details how you derived (13) from (12)?

Thank you.

xuchen-ethz · 2022-05-24T07:22:51Z

Hi @songwoh,

equation 13 is derived by taking the derivative of equation 12 wrt network weights sigma.

The derivative of the first term in 12, d_sigma(x, B), is the two terms on the left-hand side in 13 (total derivative + chain rule). And hence the plus sign in 13 (total derivative).

The derivative of the second term in 12, x', is 0, because x' is the query input and is independent of network weights.

I hope this helps. Let me know if something is still not clear.

songwoh · 2022-05-24T07:40:42Z

Thank you for clearing this up for me! Now it makes sense.

songwoh closed this as completed May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question #9

Question #9

songwoh commented May 24, 2022 •

edited

Loading

xuchen-ethz commented May 24, 2022

songwoh commented May 24, 2022

Question #9

Question #9

Comments

songwoh commented May 24, 2022 • edited Loading

xuchen-ethz commented May 24, 2022

songwoh commented May 24, 2022

songwoh commented May 24, 2022 •

edited

Loading