key-value updates #4

devraj89 · 2019-02-13T17:02:27Z

Hi

Thanks for the wonderful work. I found it to be a great read and very easy to understand.

(1) I am wondering what loss function did you use for the key value updates?
based on eqn. (3) it seems that mean square error loss has been utilized like
loss_k_j = sum_i=1^n_j (k_j - x_i)^2
Is there any particular reason that 1/(n_j + 1) has been selected instead of n_j ?

(2) I am also wondering if the MND and ME loss was simply defined for the unlabeled portion of the data, how does the performance degrade ?

(3) lastly what happens if instead of updating the key and values after every epoch, we simply average out the intermediate representations and the softmax of the labeled data to re-define the key and value pairs ?

(4) Any plans to release the code in Pytorch ? I am not very familiar with tensorflow but would like to understand your method more by studying the code.

any clarifications will be helpful.

Thanks
Devraj

The text was updated successfully, but these errors were encountered:

yanbeic · 2019-02-15T00:24:38Z

Hi Devraj,

Here are my understanding/answers:
(1) using 1/(n_j + 1) can avoid case when n_j=0, leading to 1/n_j=1/0, i.e. nan.
(2) both losses are defined for labelled v.s. unlabelled data.
(3) updating every iteration converges to more reliable distribution, i.e. reflect the up-to-date distribution.
(4) we currently can only provide code in tensorflow. sorry for the inconvenience.

devraj89 · 2019-02-21T14:07:47Z

Hi @yanbeic
Thanks for your prompt reply!
I see in your code that you have used a parameter called label-ratio? Why is that needed can you specify?

Thanks

yanbeic · 2019-02-22T14:06:38Z

Hi,
It is a parameter to control the proportion of labelled data in each mini-batch.

devraj89 · 2019-03-01T05:29:03Z

Hi
Thanks again for your response.

I am curious about one thing.

Since the unlabeled portion of the data is much larger than the labeled portion of the data how is the data loading part working ? for example you have taken around 10000 labeled and 40000 unlabeled samples. Now when we are selecting a batch-size of 32 obviously the labeled portion of the data will get exhausted earlier.

Does it mean we need to repeat the labeled portion of the data multiple times?

Thanks again for all your help!

yanbeic · 2019-03-06T13:14:45Z

In this implementation yes.

But one can also use annealing to decrease this portion during training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

key-value updates #4

key-value updates #4

devraj89 commented Feb 13, 2019 •

edited

Loading

yanbeic commented Feb 15, 2019 •

edited

Loading

devraj89 commented Feb 21, 2019

yanbeic commented Feb 22, 2019

devraj89 commented Mar 1, 2019

yanbeic commented Mar 6, 2019

key-value updates #4

key-value updates #4

Comments

devraj89 commented Feb 13, 2019 • edited Loading

yanbeic commented Feb 15, 2019 • edited Loading

devraj89 commented Feb 21, 2019

yanbeic commented Feb 22, 2019

devraj89 commented Mar 1, 2019

yanbeic commented Mar 6, 2019

devraj89 commented Feb 13, 2019 •

edited

Loading

yanbeic commented Feb 15, 2019 •

edited

Loading