Some questions about this paper #7

haohang96 · 2020-07-06T07:07:14Z

Really a nice work ! But I have some minor questions:

1: I think Eq.5 should be writen as : log E(p,q) + log N = log<Q, -log P>. Can you give some points about how Eq.5 holds on? (althoug it does not influence any conclusion in this paper, I just want to confirm it)

2: After reading your answer about "How do you transition from the probability matrix Q to the labels?" on openreview website, I have some confusion：
1) Do you mean that in step1(representation learning), the Q of eq.6 is actually a one-hot matrix by applying argmax on probability matrix Q^* (the Q^* here means the direct solution of eq.7 in step2)?
2) If I understand correctly, how about use Q^* directly in step1 to compute the cross-entropy loss, but not use argmax(Q^*) ? If a soft label will be better?

p.s. I have asked above questions on openreview website but maybe you did not receive it, so sorry to bother you again here.

yukimasano · 2020-07-06T07:58:23Z

Hi, thanks for your comments and your interest :)!

1:

log E(p,q) + log N = log<Q, -log P>

no, it needs to be E(p,q) + log N = <Q, -log P> as the cross-entropy is - q*log(p) and you wouldn't want to be taking two logs on P.

2.1: Yes, we use the argmax on Q and just use one-hot cross-entropy for learning the CNN, see here

2.2: Good point, it seems to work too, in fact the current SOTA does that (and a lot of other things):

Let me know if you have any other questions! 👍

haohang96 · 2020-07-06T08:34:32Z

haohang96 · 2020-07-06T08:38:35Z

Here is a simple derivation, I still can not figure out why the Eq.5 holds on.

https://www.overleaf.com/8871293439jtyhrmghmkkb

Feel free to edit on above overleaf project if you are convenice (this is just a tmp project for above derivation). Thanks very much !

yukimasano · 2020-07-06T09:00:16Z

Hi, there's no reason to take log(E). E = - sum(q( log(p)). I've adjusted my comment above too.
note our definitions of P and Q:
$P_{yi} = p(y|xi)\frac{1}{N}$ and $Q_{yi} = q(y|xi)\frac{1}{N}$

haohang96 · 2020-07-06T09:14:02Z

Oh, I understand it, my mistake is I confuse p(y|xi) and P_(yi), a really stupid mistake.

Thanks for your patient answer very much !

yukimasano · 2020-07-06T09:14:07Z

For anyone interested, here's a nicer derivation of above:

yukimasano · 2020-07-06T09:14:29Z

Of course!

yukimasano closed this as completed Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about this paper #7

Some questions about this paper #7

haohang96 commented Jul 6, 2020

yukimasano commented Jul 6, 2020 •

edited

haohang96 commented Jul 6, 2020

haohang96 commented Jul 6, 2020 •

edited

yukimasano commented Jul 6, 2020 •

edited

haohang96 commented Jul 6, 2020

yukimasano commented Jul 6, 2020

yukimasano commented Jul 6, 2020

Some questions about this paper #7

Some questions about this paper #7

Comments

haohang96 commented Jul 6, 2020

yukimasano commented Jul 6, 2020 • edited

haohang96 commented Jul 6, 2020

haohang96 commented Jul 6, 2020 • edited

yukimasano commented Jul 6, 2020 • edited

haohang96 commented Jul 6, 2020

yukimasano commented Jul 6, 2020

yukimasano commented Jul 6, 2020

yukimasano commented Jul 6, 2020 •

edited

haohang96 commented Jul 6, 2020 •

edited

yukimasano commented Jul 6, 2020 •

edited