-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running BFS with big(er) graphs #283
Comments
Try setting --locale_shared_fraction to 0.85 or so. This determines how much memory is available for the global heap (and other specialty allocators), and defaults to 0.5. It probably can't be larger than 0.9, since some memory must be reserved for the non-Grappa-controlled heap as well. (Looks like we need to update that wiki page to match the current state of the code.) I'll bet it will work if you just set that flag; if you still have problems you can increase --global_heap_fraction too. You probably don't want to go above 0.5 for that one. |
I ran the application with the parameters and was able to execute the bfs implementation with a 2^23 scale graph. However, a 2^24 scale graph seems to crash grappa: Executed with: srun --nodes=4 --ntasks-per-node=12 -- ./bfs_beamer.exe --path=/root/kron_24_16.bedges --locale_shared_fraction 0.85 --global_heap_fraction 0.5 Same setup, the 2^24 graph was generated using graph500's kronecker generator (latest version from github). I tried different values with local_shared_fraction and global_heap_fraction but as you already indicated this causes issues with other things not having enough memory. Is there anything else I have to consider? Any further parameters to tweak? |
The strange thing is that for some reason it's not dividing up the memory among the cores on the node — there's 53 GB total for all the cores, but then it's not recognizing that you're running with |
https://github.com/uwsampa/grappa/blob/master/system/Communicator.cpp#L141 This may be the line, and I have a workaround somewhere, although if this is in fact your problem I'd suggest trying a newer MPI first. |
Ah, I didn't notice that in the log. It's more likely to be a srun/MPI mismatch than the problem @bmyerz suggests. You may use MPI on this machine enough to already know the answer to the question the following is intended to answer; but if not, try
If you see output like
life is good and we'll have to look elsewhere. If you see output like:
then your srun and MPI installation are not configured to communicate correctly, and you'll need to run with a command like |
Now that I've properly looked at the full log you provided, it looks very likely that this is the problem. You should only see one of those memory breakdowns per job; the fact that you're seeing multiple ones suggests that each process doesn't know about the rest and is trying to solve the problem independently. |
Thank you for the detailed replies and help so far. |
The warning you are seeing indicates that the messaging system is running out of space for pending messages. you can provide the option If you see the warning repeated by many cores during a run (as in your output), I would suggest killing the job, because it will probably go far too slowly. |
I added
Again, I tried different combinations of settings of the memory parameters but I am always getting this when trying a scal 2^26 graph. |
I think a backtrace is needed for more information. https://github.com/uwsampa/grappa/blob/master/doc/debugging.md |
I just tried this (export GRAPPA_FREEZE_ON_ERROR=1 before execution) with both the relase and the debug build but Grappa exits right after throwing the bus error. After that, I set the freeze_flag to true in the Grappa.cpp file (right after the two if blocks with the environment variable checks) but this doesn't work either. |
Just setting the environment variable should be enough. It didn't work in this case because you're getting a bus error, and the code in master forgot to capture that. You can pull the appropriate commit in from a dev branch with this command:
and try again. (It's probably going to show that we're running out of memory in one of the memory pools.) |
That's working, thanks. Here is a backtrace from one of the Grappa processes on frozen state: |
Hi, I came across a similar issues when running pagerank on a large graph(508k edges and 75k nodes, which is not very big actually). I don't understand the info printed out in the output. Can you explain to me each of these mean ? like, no matter how many node I use, the node total is still 31G(note that memory size of each my machine is 32G), which is the parameter I am confused. And the remaining parameters I still don't get it after I read https://github.com/uwsampa/grappa/wiki/Setting-memory-region-sizes . |
I also ran into the same issure when I did pagerank. My command is ./pagerank.exe --path my graph only contains #vertices: 81306 #edges:1768135Shared memory breakdown:
|
I successfully run the dataset #vertices: 81306 #edges:1768135 |
I tried starting grappa's bfs implementation on 4 nodes, each equipped with 64GB of ram and one Intel Xeon CPU with 6 cores (two threads per core). I used slurm to start the job and want to load a scale 26 graph which was generated by the graph500 graph generator (total file size ~16GB).
Running it with "srun --nodes=4 --ntasks-per-node=12 -- ./bfs_beamer.exe --path=/some/path/rmat_26_16" throws the following error:
"Out of memory in the global heap: couldn't find a chunk of size 34359738368 to hold an allocation of 34359738368 bytes. Can you increase --global_heap_fraction?"
I changed the --global_heap_fraction to various values (0.1 to 1.0) but none of them worked. I have found your notes on setting the memory sizes (https://github.com/uwsampa/grappa/wiki/Setting-memory-region-sizes) but that didn't help me figuring out any working combination of settings.
Are there any other settings I have to change or have i hit a limit with the given setup? For evaluation purpose, I have to find out what's the biggest synthetic graph I can load with the given nodes using grappa.
Full output of the srun command without --global_heap_fraction option set:
http://pastebin.com/6GrCbPdN
The text was updated successfully, but these errors were encountered: