You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding a new word to vocabulary could cause in segmentation fault as vocabulary grows. The reason is that only vocab array is reallocated vocab_hash array remains untouched.
The text was updated successfully, but these errors were encountered:
constintvocab_hash_size=30000000; // Maximum 30 * 0.7 = 21M words in the vocabulary
So if you have an extremely large vocabulary, a simple solution is to modify the constant here to have a large enough vocab_hash_size.
Similarly, as you proposed, we can remove the reallocation of the vocab array part and set its maximum size to be equal to that of vocab_hash. However, you will need to allocate that much memory to the vocab array regardless of how small the corpus might be. The reallocation procedure serves as a way to dynamically fit to the vocabulary of the real corpus, thus improving the memory efficiency of the code.
If you do not care about memory issues at all, then you can absolutely remove the reallocation part. However, to make the code generally more memory-efficient, I would like to keep the reallocation part for now.
Please let me know if you have any further suggestions!
Adding a new word to vocabulary could cause in segmentation fault as vocabulary grows. The reason is that only
vocab
array is reallocatedvocab_hash
array remains untouched.The text was updated successfully, but these errors were encountered: