-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you train on Glove.42B.300d.txt #4
Comments
Hi @liaocs2008 , Thank you for your interest in my work. I was able to replicate the problem with the The code has been tested on dict2vec (2.3 million vectors) and fasttext common crawl (2 million vectors) and works fine on these files, so it is strange to me that it does not work on the bigger file glove.42B. |
For segmentation fault, it can be fixed by restricting number of words read during embedding loading. The problem is that you end the while loop only when seeing EOF. After you fix it, you can train the embedding and check your binary vector. Your binary transformation function is like a sign function. Output all zeros means all values are negative. That's why I am guessing the convergence could be a problem here. |
Actually I find the segmentation fault is caused by not enough word length. There are words as long as 1K in glove.42B.300d.txt. Increasing the MAXWORDLENGTH to 1024 would work. |
Setting I have fixed the problem now and was able to train on
|
Hi,
I find your work interesting and I was testing it on glove.6B.300d.txt (after converting to the w2v format). The program works fine and output vectors look normal.
However, when I run it on glove.42B.300d.txt, the program outputs zeros for all words. I am guessing it might be a convergence issue. Could you test and help me figure out the problem?
Thanks!
The text was updated successfully, but these errors were encountered: