Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about this paper #7

Closed
haohang96 opened this issue Jul 6, 2020 · 7 comments
Closed

Some questions about this paper #7

haohang96 opened this issue Jul 6, 2020 · 7 comments

Comments

@haohang96
Copy link

Really a nice work ! But I have some minor questions:

1: I think Eq.5 should be writen as : log E(p,q) + log N = log<Q, -log P>. Can you give some points about how Eq.5 holds on? (althoug it does not influence any conclusion in this paper, I just want to confirm it)

2: After reading your answer about "How do you transition from the probability matrix Q to the labels?" on openreview website, I have some confusion:
1) Do you mean that in step1(representation learning), the Q of eq.6 is actually a one-hot matrix by applying argmax on probability matrix Q^* (the Q^* here means the direct solution of eq.7 in step2)?
2) If I understand correctly, how about use Q^* directly in step1 to compute the cross-entropy loss, but not use argmax(Q^*) ? If a soft label will be better?

p.s. I have asked above questions on openreview website but maybe you did not receive it, so sorry to bother you again here.

@yukimasano
Copy link
Owner

yukimasano commented Jul 6, 2020

Hi, thanks for your comments and your interest :)!

1:

log E(p,q) + log N = log<Q, -log P>

no, it needs to be E(p,q) + log N = <Q, -log P> as the cross-entropy is - q*log(p) and you wouldn't want to be taking two logs on P.

2.1: Yes, we use the argmax on Q and just use one-hot cross-entropy for learning the CNN, see here

2.2: Good point, it seems to work too, in fact the current SOTA does that (and a lot of other things):

Let me know if you have any other questions! 👍

@haohang96
Copy link
Author

微信截图_20200706163357

@haohang96
Copy link
Author

haohang96 commented Jul 6, 2020

Here is a simple derivation, I still can not figure out why the Eq.5 holds on.

https://www.overleaf.com/8871293439jtyhrmghmkkb

Feel free to edit on above overleaf project if you are convenice (this is just a tmp project for above derivation). Thanks very much !

@yukimasano
Copy link
Owner

yukimasano commented Jul 6, 2020

Hi, there's no reason to take log(E). E = - sum(q( log(p)). I've adjusted my comment above too.
note our definitions of P and Q:
$P_{yi} = p(y|xi)\frac{1}{N}$ and $Q_{yi} = q(y|xi)\frac{1}{N}$

@haohang96
Copy link
Author

Oh, I understand it, my mistake is I confuse p(y|xi) and P_(yi), a really stupid mistake.

Thanks for your patient answer very much !

@yukimasano
Copy link
Owner

For anyone interested, here's a nicer derivation of above:

image

@yukimasano
Copy link
Owner

Of course!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants