Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

negative sampling excludes positive class #1229

Merged
merged 3 commits into from Mar 5, 2024

Conversation

hoosierEE
Copy link
Contributor

Addressing issue #1228

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

docs/tutorials/word2vec.ipynb Outdated Show resolved Hide resolved
docs/tutorials/word2vec.ipynb Outdated Show resolved Hide resolved
docs/tutorials/word2vec.ipynb Outdated Show resolved Hide resolved
docs/tutorials/word2vec.ipynb Outdated Show resolved Hide resolved
docs/tutorials/word2vec.ipynb Outdated Show resolved Hide resolved
@cantonios
Copy link
Collaborator

Also, I just reviewed the documentation for tf.random.log_uniform_candidate_sampler. It explicitly states that it does not reject any accidental positive hits, and then links to the Candidate Sampling Algorithms Reference. In that reference, the "Negative Sampling" row says that it considers negative training classes to be the full set S_i, which does include positive samples. This is opposed to "Sampled Logistic", which considers the set (S_i - T_i). So it may be intentional that there could be accidental hits.

@hoosierEE
Copy link
Contributor Author

Your feedback was very helpful, thanks! Building a set from the positive_skip_grams ended up being much faster. On average I see around 10% of negative samples discarded because they overlap with the positive context, resulting in about 1 percentage point improvement in training accuracy at 20 epochs.

@cantonios cantonios merged commit 8d6b5e0 into tensorflow:master Mar 5, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants