-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAX_CONTEXTS tied to preprocessing? #39
Comments
Hi @hsellik ,
E.g., if the data was preprocessed with a vocabulary of 1M and you set the vocab size to 1K - then the code will load all 1M values, sort them (descendingly) by their frequency, and take only the top 1K. If you wish to use larger values for contexts or vocabularies than the values that the data was preprocessed with -- then yes, in that case, you will have to re-preprocess the data. However, I don't think that using more than 1000 contexts or larger vocabularies will help. I hope it helps, let me know if I was unclear or if you have any other questions. |
Okay, makes perfect sense, does the same apply for code2vec project? |
In code2vec - |
Okay, nice and clear. Thank you for the quick answers! :) |
On running a debugger, I get a message of "Expect 1001 fields but have 2456 in record 0". |
Hi @urialon ,
I am trying to do some hyper-parameter tuning, but it seems it is a bit trickier than I thought.
In order to change MAX_CONTEXTS in config.py, do I have to preprocess the data with the same MAX_CONTEXTS value as well?
Does this also apply for WORD_VOCAB_SIZE and PATH_VOCAB_SIZE since I see correspondence in preprocessing and running:
WORD_VOCAB_SIZE == config.MAX_TOKEN_VOCAB_SIZE
PATH_VOCAB_SIZE == config.MAX_PATH_VOCAB_SIZE
The text was updated successfully, but these errors were encountered: