You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pick a context word, make the first row of list is the positive row and from the second row, we pick "target word" randomly. For the "positive example" we label it "1" and "0" for the "negative example". And here we have a supervised learning problem. With the pair of words (context, target) we will predict the respond label (1 or 0) for that pair.
We have k "negative examples" to train the model. After the embedding vector, we have possible 10,000 logistic regression classification problems, one of these is a classifier "is the target word: juice or not ?" or "king or not ?"... But we can just train with 10,000 binary classifiers (1 positive example and k negative examples for each of them) rather than 10,000 softmax classifiers.
One thing you could do is sample according to empirical frequency of words in your corpus, but the problem is we will end-up with the representation of the words like: "the", "a", "of", ...
Pick a context word, make the first row of list is the positive row and from the second row, we pick "target word" randomly. For the "positive example" we label it "1" and "0" for the "negative example". And here we have a supervised learning problem. With the pair of words (context, target) we will predict the respond label (
1
or0
) for that pair.We have k "negative examples" to train the model. After the embedding vector, we have possible 10,000 logistic regression classification problems, one of these is a classifier "is the target word: juice or not ?" or "king or not ?"... But we can just train with 10,000 binary classifiers (1 positive example and k negative examples for each of them) rather than 10,000 softmax classifiers.
One thing you could do is sample according to empirical frequency of words in your corpus, but the problem is we will end-up with the representation of the words like: "the", "a", "of", ...
f(wi): heuristic value - frequency
Ref: https://www.coursera.org/learn/nlp-sequence-models/lecture/Iwx0e/negative-sampling
The text was updated successfully, but these errors were encountered: