Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in script demo-phrase-accuracy.sh #4

Closed
GoogleCodeExporter opened this issue Mar 1, 2016 · 6 comments
Closed

Segfault in script demo-phrase-accuracy.sh #4

GoogleCodeExporter opened this issue Mar 1, 2016 · 6 comments

Comments

@GoogleCodeExporter
Copy link

$ ./demo-phrase-accuracy.sh 
make: Nothing to be done for `all'.
Starting training using file text8
Words processed: 17000K     Vocab size: 4399K  
Vocab size (unigrams + bigrams): 2586139
Words in train file: 17005206
Words written: 17000K
real    0m21.130s
user    0m20.062s
sys 0m1.054s
Starting training using file text8-phrase
Vocab size: 123636
Words in train file: 16337523
Alpha: 0.000119  Progress: 99.59%  Words/thread/sec: 22.70k  
real    1m38.617s
user    12m0.795s
sys 0m1.501s
newspapers:
./demo-phrase-accuracy.sh: line 12: 36538 Segmentation fault: 11  
./compute-accuracy vectors-phrase.bin < questions-phrases.txt

I'm on OSX (latest non-beta), and had to switch out "#include <stdlib.h>" to 
get it to compile, but no other changes.

Original issue reported on code.google.com by benjamin...@gmail.com on 19 Aug 2013 at 7:41

@GoogleCodeExporter
Copy link
Author

demo-word-accuracy.sh also crashes.
The other demos run great.

Original comment by benjamin...@gmail.com on 19 Aug 2013 at 7:45

@GoogleCodeExporter
Copy link
Author

Im on OSX Lion compiled with clang.

Using valgrind the issue appears to be on line 102 of compute-accuracy.c

vec[a] = M[a + b2 * size] - M[a + b1 * size] + m[a + b3 * size];

With 30k as the input on the command line for words the size of M is 24,000,000 
bytes or 6M float array, but from putting in an if statement the program 
regularly accesses memory outside of this range.

Putting the if statement with a printf msg stops the seg fault.

I have:
  if (a + b3 * size > 6000000) printf("Memory overflow\n"); 

Putting this statement in there outputs a bunch of memory overflow messages but 
aside from that it seems as the though the program keeps trucking along and I 
get a final output of

ACCURACY TOP1: 18.77 % (122 / 650)
Total accuracy 26.19%  Semantic accuracy: 24.76% Syntactic accuracy: 26.91%
Questions seen / total: 12268 19544 62.77%

This is obviously not a fix, something to do with buffers but I'm not a C 
expert by any means.

Original comment by dluna...@gmail.com on 22 Aug 2013 at 5:55

@GoogleCodeExporter
Copy link
Author

Thanks for reporting this bug, it should be fixed now.

Original comment by tmiko...@google.com on 23 Aug 2013 at 6:08

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Seems still broken.
deleted all data files.  Updated to latest.  Re-applied the OSX fix (#include 
<malloc.h> becomes stdlib.h)
make clean
make
re-ran the script.


Starting training using file text8
Words processed: 17000K     Vocab size: 4399K  
Vocab size (unigrams + bigrams): 2586139
Words in train file: 17005206
Words written: 17000K
real    0m20.452s
user    0m19.601s
sys 0m0.816s
Starting training using file text8-phrase
Vocab size: 123636
Words in train file: 16337523
Alpha: 0.000119  Progress: 99.59%  Words/thread/sec: 22.46k  
real    1m37.069s
user    12m8.130s
sys 0m1.240s
newspapers:
./demo-phrase-accuracy.sh: line 12:  1189 Segmentation fault: 11  
./compute-accuracy vectors-phrase.bin < questions-phrases.txt

Original comment by benjamin...@gmail.com on 23 Aug 2013 at 6:30

@GoogleCodeExporter
Copy link
Author

No idea what I'm doing, but if it helps:

(gdb) run vectors-phrase.bin <questions-phrases.txt
Starting program: /Users/benjamin/Documents/code/word2vec/compute-accuracy 
vectors-phrase.bin <questions-phrases.txt
Reading symbols for shared libraries +.............................. done
newspapers:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000004b4b1e6d740
0x00000001000019fd in main ()

Original comment by benjamin...@gmail.com on 23 Aug 2013 at 6:37

@GoogleCodeExporter
Copy link
Author

Removing -Ofast from the make file seems to have helped.  But wow is it slower, 
maybe a 90% speed reduction?

output: 

newspapers:
ACCURACY TOP1: 8.33 %  (1 / 12)
Total accuracy: 8.33 %   Semantic accuracy: 8.33 %   Syntactic accuracy: nan % 
ice_hockey:
ACCURACY TOP1: 0.00 %  (0 / 56)
Total accuracy: 1.47 %   Semantic accuracy: 1.47 %   Syntactic accuracy: nan % 
basketball:
ACCURACY TOP1: 0.00 %  (0 / 30)
Total accuracy: 1.02 %   Semantic accuracy: 1.02 %   Syntactic accuracy: nan % 
airlines:
ACCURACY TOP1: 14.29 %  (6 / 42)
Total accuracy: 5.00 %   Semantic accuracy: 5.00 %   Syntactic accuracy: nan % 
people-companies:
ACCURACY TOP1: 25.00 %  (1 / 4)
Total accuracy: 5.56 %   Semantic accuracy: 5.56 %   Syntactic accuracy: nan % 
Questions seen / total: 144 3218   4.47 % 

Original comment by benjamin...@gmail.com on 23 Aug 2013 at 7:05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant