-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
negative sampling in Word2Vec tutorial #49490
Comments
@bizzyvinci |
The incorrect negative samples are calculated from |
Thank you @bizzyvinci and @gqqnbig. We'll look into this cc @MarkDaoust |
Please see my comment #44758 (comment) . The phrase "negative samples" may have different meanings in different candidate-sampling algorithms. For some algorithms "negative samples" can overlap with true classes. We do need to mention this caveat in the tutorial though. |
URL(s) with the issue:
negative sampling
generate training data
Description of issue (what needs changing):
I was going through the tutorial on skipgram word2vec and I noticed that positive sample candidates are also negative sample candidates too.
Clear description
For example, we have
sentence = "The wide road shimmered in the hot sun"
andwindow_size = 2
fortf.keras.preprocessing.sequence.skipgrams
. Therefore positive skip-grams for include(road, the), (road, wide), (road, shimmered), & (road, in)
.I guess
the, wide, shimmered, & in
should not be later labeled as negative skip-grams forroad
, right?.PS: I'm a newbie
The text was updated successfully, but these errors were encountered: