Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative sampling in Word2Vec tutorial #49490

Open
bizzyvinci opened this issue May 22, 2021 · 4 comments
Open

negative sampling in Word2Vec tutorial #49490

bizzyvinci opened this issue May 22, 2021 · 4 comments
Assignees
Labels
type:docs-bug Document issues

Comments

@bizzyvinci
Copy link

URL(s) with the issue:

negative sampling
generate training data

Description of issue (what needs changing):

I was going through the tutorial on skipgram word2vec and I noticed that positive sample candidates are also negative sample candidates too.

Clear description

For example, we have sentence = "The wide road shimmered in the hot sun" and window_size = 2 for tf.keras.preprocessing.sequence.skipgrams. Therefore positive skip-grams for include (road, the), (road, wide), (road, shimmered), & (road, in).

I guess the, wide, shimmered, & in should not be later labeled as negative skip-grams for road, right?.

PS: I'm a newbie

@bizzyvinci bizzyvinci changed the title negative sampliing in Word2Vec tutorial negative sampling in Word2Vec tutorial May 22, 2021
@Saduf2019
Copy link
Contributor

@bizzyvinci
Thank you for the request, we will update you on the earliest on this.

@Saduf2019 Saduf2019 added the type:docs-bug Document issues label May 25, 2021
@gqqnbig
Copy link
Contributor

gqqnbig commented Jun 6, 2021

The incorrect negative samples are calculated from tf.random.log_uniform_candidate_sampler, which may have bugs. See #44758.

@8bitmp3
Copy link
Contributor

8bitmp3 commented Jun 7, 2021

Thank you @bizzyvinci and @gqqnbig. We'll look into this cc @MarkDaoust

@Saduf2019 Saduf2019 removed their assignment Jun 8, 2021
@wangpengmit
Copy link
Member

Please see my comment #44758 (comment) . The phrase "negative samples" may have different meanings in different candidate-sampling algorithms. For some algorithms "negative samples" can overlap with true classes.

We do need to mention this caveat in the tutorial though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:docs-bug Document issues
Projects
None yet
Development

No branches or pull requests

7 participants