Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purpose of unks in skipgram function #16

Open
mmlynarik opened this issue Nov 30, 2022 · 2 comments
Open

Purpose of unks in skipgram function #16

mmlynarik opened this issue Nov 30, 2022 · 2 comments

Comments

@mmlynarik
Copy link

mmlynarik commented Nov 30, 2022

Hi,

Can you please explain, what is the purpose of including the <UNK> tokens in the owords vector produced by skipgram function? What should model learn by using these as training examples?

Also, what is the purpose of variable ws in train function, if it's not used anywhere after its definition?

@theeluwin
Copy link
Owner

<UNK> means unknown token. This should be defined in the preprocessing step. ws means word score, and it is calculated heuristically, so you can use it if you want, or not.

@mmlynarik
Copy link
Author

mmlynarik commented Dec 2, 2022

Hi, yes, I understand what <UNK> means. What I don't understand is why these dummy tokens (btw, I suppose they probably were meant to represent padding and not unknown words) are included in the output of skipgram function. I checked the original implementation of Word2Vec and I can't see there this step of including padding nor unknown tokens. What information should model learn from them being included as training examples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants