How to ensure that the negative sampled words are not the target word? #9

jeffchy · 2018-11-20T10:31:16Z

First, thanks for you excellent code :)

In model.py, the following piece of code suggests that we may get positive word when we do negative sampling, though the probability is very small.
nwords = t.multinomial(self.weights, batch_size * context_size * self.n_negs, replacement=True).view(batch_size, -1)
I'm wondering why you didn't perform equality check, is that because it doesn't affect the quality of trained word vectors but slow down the training speed?
Are there other reasons?

The text was updated successfully, but these errors were encountered:

theeluwin · 2018-11-20T13:39:45Z

Simply because of the training speed, yes.
I once implemented the 'correct sampling'

pytorch-sgns/model.py

Line 62 in 2ba6c24

def sample(self, iword_b, owords_b):

but it was way too slow, while faster training yields more iterations.
But I believe that the correct sampling is still the right way.

theeluwin closed this as completed Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to ensure that the negative sampled words are not the target word? #9

How to ensure that the negative sampled words are not the target word? #9

jeffchy commented Nov 20, 2018

theeluwin commented Nov 20, 2018

How to ensure that the negative sampled words are not the target word? #9

How to ensure that the negative sampled words are not the target word? #9

Comments

jeffchy commented Nov 20, 2018

theeluwin commented Nov 20, 2018