You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for your great paper. I'm sorry that I can't understand parts of the proof for theorem 2.
(1) Here you assume L() denotes the cross-entropy loss with softmax operation. Does that mean y is the input of the softmax function? Because only in this case the linear relation between the target y and the source y's holds.
(2) I'm also confused how you derive from the first line to the second line (including how you take the sum over j outside of the loss function and how you change p* to pj).
(3) What's more, you said you used the upper bound of log-sum-exp function to derive the second line. And {a1, a2, ..., an} are the outputs of the softmax function. Then I'm not sure where the exponential comes in the log-sum-exp function that takes the exponential of a.
Thank you for your kind reply.
The text was updated successfully, but these errors were encountered:
Hi, thank you for your great paper. I'm sorry that I can't understand parts of the proof for theorem 2.
(1) Here you assume L() denotes the cross-entropy loss with softmax operation. Does that mean y is the input of the softmax function? Because only in this case the linear relation between the target y and the source y's holds.
(2) I'm also confused how you derive from the first line to the second line (including how you take the sum over j outside of the loss function and how you change p* to pj).
(3) What's more, you said you used the upper bound of log-sum-exp function to derive the second line. And {a1, a2, ..., an} are the outputs of the softmax function. Then I'm not sure where the exponential comes in the log-sum-exp function that takes the exponential of a.
Thank you for your kind reply.
The text was updated successfully, but these errors were encountered: