Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recsplit core dump on large keyfiles #3

Closed
weaversa opened this issue Nov 8, 2019 · 3 comments
Closed

Recsplit core dump on large keyfiles #3

weaversa opened this issue Nov 8, 2019 · 3 comments

Comments

@weaversa
Copy link

weaversa commented Nov 8, 2019

Here I try to build an MPFH for 2^30 keys using recsplit. I'm running on an Amazon EC2 instance of type m5.4xlarge. Such an instance has 16 cores and 64 GiB of memory.

$ wc -l test.keys 
1073741823 test.keys
$ bin/recsplit_load_8 test.keys test.mphf
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)
@vigna
Copy link
Owner

vigna commented Nov 8, 2019

So the map was built. I'd say your key file is too large (but you don't know the size). Why do you want to load the entire file in memory?

@weaversa
Copy link
Author

weaversa commented Nov 8, 2019

I'm sorry. I was a bit misleading. I should have said I was trying to query such an MPHF. You are right that the MPFH building tool claims success, and recsplit_load gives an abort. Maybe the map produced by the build tool is bad? Maybe the query tool processes keys in a less memory efficient way than the build tool? I'm not familiar enough with the project to say why, just report the abort. I feel that the list of keys I used is agnostic to the bug, so long as the list is large. If you are unable to reproduce, I am happy to send along my list of keys to assist diagnosis.

Thank you!

Here is the full run:

$ wc -l test.keys 
1073741823 test.keys
$ bin/recsplit_dump_8 test.keys 100 test.mphf
Building...
Elias-Fano cumul sizes:  7.469788 bits/bucket
Elias-Fano cumul bits:   6.376042 bits/bucket
Elias-Fano cumul sizes:  0.074698 bits/key
Elias-Fano cumul bits:   0.063760 bits/key
Rice-Golomb descriptors: 1.656336 bits/key
Total bits:              1.794794 bits/key
Construction time: 2078.431 s, 1936 ns/key
$ bin/recsplit_load_8 test.keys test.mphf
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

@vigna
Copy link
Owner

vigna commented Nov 8, 2019 via email

@weaversa weaversa closed this as completed Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants