-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal 11 error #294
Comments
A backtrace will provide more information see https://github.com/uwsampa/grappa/blob/master/doc/debugging.md |
Natalia, since you're running on our cluster you should just send me an email with a pointer to the code that's failing, and I'll take a look when I get a chance. |
I'm sitting with Natalia looking at the issue. We disassembled her binary to look at the assembly at the location of the segfault and it wasn't clear precisely where it was occurring. We observed a number of calls to Boost library functions before and after the faulting address, but weren't able to track them down as the addresses were not included within the binary. The sizes of the vertices used increased, but not by more than 2KB. I believe we would have to rebuild grappa in debug mode to get a backtrace, right? We don't own the cluster, so I believe that would be problematic. Is there another debug mechanism you could propose that would enable us to glean some insight from the segfault? |
You're running on our cluster, and Natalia has sent me the details on the segfaulting binary, so it's easy for me to take a look at the backtrace myself---I just haven't had a chance yet. In fact the backtrace from the non-debug binary is often useful in debugging these sorts of problems (we compile the optimized binary with debugging symbols too), but it's hard to make sense of it without understanding the guts of Grappa. When the backtrace just shows addresses instead of code, it usually means the problem actually occurred before the segfault happened, but it corrupted some scheduler data structure and screwed up the stack. The backtrace won't be helpful in this case. I'll get back to you two as soon as I've found a moment to take a look at the code. |
We tried some experiments. As it's label propagation algorithm, we have an array attached to each vertex. We change its size and realized that after some point, if it's big enough, we get segmentation fault. For example the program works for the arrays with size of 62 and doesn't work for 75 or more. We also tried for another program which is similar to this one and it also fails. It might give you some idea about failure. |
Hello,
I get this error when I try to run my grappa program:
Graph memory breakdown:
locale_heap_size: 0.133796 GB
global_heap_size: 0.0735908 GB
graph_total_size: 0.207387 GB
Exiting due to signal 11 with siginfo 0x400149f6f270 and payload 0x400149f6f140
srun: error: n25: task 0: Exited with exit code 1
I'm also using your graphlab implementation and running the program on sampa server.
What can I do to solve the problem?
Please, let me know if you need more information.
Thanks
The text was updated successfully, but these errors were encountered: