Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof for Theorem 2 #3

Closed
cnyanhao opened this issue Nov 12, 2020 · 1 comment
Closed

Proof for Theorem 2 #3

cnyanhao opened this issue Nov 12, 2020 · 1 comment

Comments

@cnyanhao
Copy link

Hi, thank you for your great paper. I'm sorry that I can't understand parts of the proof for theorem 2.

(1) Here you assume L() denotes the cross-entropy loss with softmax operation. Does that mean y is the input of the softmax function? Because only in this case the linear relation between the target y and the source y's holds.

(2) I'm also confused how you derive from the first line to the second line (including how you take the sum over j outside of the loss function and how you change p* to pj).

(3) What's more, you said you used the upper bound of log-sum-exp function to derive the second line. And {a1, a2, ..., an} are the outputs of the softmax function. Then I'm not sure where the exponential comes in the log-sum-exp function that takes the exponential of a.

Thank you for your kind reply.

image

@wyf0912
Copy link
Owner

wyf0912 commented Nov 24, 2020

Hi, thanks for your interest.

  1. Yes, y is the input of the softmax, i.e., y has not undergone softmax operation.
  2. The derivation from the first line to the second uses the property of the convex function
  3. The exponential comes from the softmax operation. You can refer to the link
    https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?highlight=cross%20entrop#torch.nn.CrossEntropyLoss

@wyf0912 wyf0912 closed this as completed Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants