Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage1 HashMap: store hash & do robin hood hashing #5779

Merged
merged 2 commits into from
Jul 3, 2020
Merged

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Jul 2, 2020

This adds these two fields to a HashMap Entry:

uint32_t hash
uint32_t distance_from_start_index

Compared to master branch, standard library tests compiled 8.4% faster
and took negligible (0.001%) more memory to complete. The amount of
memory used is still down from before 8b82c40 which moved indexes
to be stored separately from entries.

So, it turns out, keeping robin hood hashing plus separating indexes
did result in a performance improvement. What happened previously is
that the gains from separating indexes balanced out the losses from
removing robin hood hashing, resulting in a wash.

This also serves as an inspiration for adding a benchmark to
std.AutoHashMap and improving the implementation.

This adds these two fields to a HashMap Entry:

uint32_t hash
uint32_t distance_from_start_index

Compared to master branch, standard library tests compiled 8.4% faster
and took negligible (0.001%) more memory to complete. The amount of
memory used is still down from before 8b82c40 which moved indexes
to be stored separately from entries.

So, it turns out, keeping robin hood hashing plus separating indexes
did result in a performance improvement. What happened previously is
that the gains from separating indexes balanced out the losses from
removing robin hood hashing, resulting in a wash.

This also serves as an inspiration for adding a benchmark to
std.AutoHashMap and improving the implementation.
@data-man
Copy link
Contributor

data-man commented Jul 2, 2020

@andrewrk
Why not use third-party libraries?

E.g.:
Tessil's libraries (especially hat-trie)
parallel-hashmap
robin-hood-hashing

Hashmaps Benchmarks - Overview

@andrewrk
Copy link
Member Author

andrewrk commented Jul 2, 2020

  1. This is a prototype of potential improvements to the standard library hash map.

  2. The amount of meta-work associated with depending on third party code for something like this vastly outweighs the actual work of the implementation. Especially in C++. Everything from build issues to package management, to people starting discussions about dynamic linking.

  3. We already have this integrated with everything, including the internal memory profiling instrumentation.

  4. It's easier & faster to fix bugs in our own codebase.

@andrewrk andrewrk merged commit 70dca0a into master Jul 3, 2020
@andrewrk andrewrk mentioned this pull request Jul 4, 2020
@andrewrk andrewrk deleted the stage1-hash-map branch July 5, 2020 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants