Proof for Theorem 2 #3

cnyanhao · 2020-11-12T17:15:54Z

Hi, thank you for your great paper. I'm sorry that I can't understand parts of the proof for theorem 2.

(1) Here you assume L() denotes the cross-entropy loss with softmax operation. Does that mean y is the input of the softmax function? Because only in this case the linear relation between the target y and the source y's holds.

(2) I'm also confused how you derive from the first line to the second line (including how you take the sum over j outside of the loss function and how you change p* to pj).

(3) What's more, you said you used the upper bound of log-sum-exp function to derive the second line. And {a1, a2, ..., an} are the outputs of the softmax function. Then I'm not sure where the exponential comes in the log-sum-exp function that takes the exponential of a.

Thank you for your kind reply.

wyf0912 · 2020-11-24T10:40:09Z

Hi, thanks for your interest.

Yes, y is the input of the softmax, i.e., y has not undergone softmax operation.
The derivation from the first line to the second uses the property of the convex function
The exponential comes from the softmax operation. You can refer to the link
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?highlight=cross%20entrop#torch.nn.CrossEntropyLoss

wyf0912 closed this as completed Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof for Theorem 2 #3

Proof for Theorem 2 #3

cnyanhao commented Nov 12, 2020

wyf0912 commented Nov 24, 2020 •

edited

Proof for Theorem 2 #3

Proof for Theorem 2 #3

Comments

cnyanhao commented Nov 12, 2020

wyf0912 commented Nov 24, 2020 • edited

wyf0912 commented Nov 24, 2020 •

edited