stage1 HashMap: store hash & do robin hood hashing #5779

andrewrk · 2020-07-02T22:39:14Z

This adds these two fields to a HashMap Entry:

uint32_t hash
uint32_t distance_from_start_index

Compared to master branch, standard library tests compiled 8.4% faster
and took negligible (0.001%) more memory to complete. The amount of
memory used is still down from before 8b82c40 which moved indexes
to be stored separately from entries.

So, it turns out, keeping robin hood hashing plus separating indexes
did result in a performance improvement. What happened previously is
that the gains from separating indexes balanced out the losses from
removing robin hood hashing, resulting in a wash.

This also serves as an inspiration for adding a benchmark to
std.AutoHashMap and improving the implementation.

This adds these two fields to a HashMap Entry: uint32_t hash uint32_t distance_from_start_index Compared to master branch, standard library tests compiled 8.4% faster and took negligible (0.001%) more memory to complete. The amount of memory used is still down from before 8b82c40 which moved indexes to be stored separately from entries. So, it turns out, keeping robin hood hashing plus separating indexes did result in a performance improvement. What happened previously is that the gains from separating indexes balanced out the losses from removing robin hood hashing, resulting in a wash. This also serves as an inspiration for adding a benchmark to std.AutoHashMap and improving the implementation.

data-man · 2020-07-02T23:12:38Z

@andrewrk
Why not use third-party libraries?

E.g.:
Tessil's libraries (especially hat-trie)
parallel-hashmap
robin-hood-hashing

Hashmaps Benchmarks - Overview

andrewrk · 2020-07-02T23:20:40Z

This is a prototype of potential improvements to the standard library hash map.
The amount of meta-work associated with depending on third party code for something like this vastly outweighs the actual work of the implementation. Especially in C++. Everything from build issues to package management, to people starting discussions about dynamic linking.
We already have this integrated with everything, including the internal memory profiling instrumentation.
It's easier & faster to fix bugs in our own codebase.

stage1 HashMap: linear scan for < 16 entries

22f0a10

andrewrk merged commit 70dca0a into master Jul 3, 2020

andrewrk mentioned this pull request Jul 4, 2020

reimplement std.HashMap #5786

Merged

andrewrk deleted the stage1-hash-map branch July 5, 2020 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stage1 HashMap: store hash & do robin hood hashing #5779

stage1 HashMap: store hash & do robin hood hashing #5779

andrewrk commented Jul 2, 2020

data-man commented Jul 2, 2020

andrewrk commented Jul 2, 2020

stage1 HashMap: store hash & do robin hood hashing #5779

stage1 HashMap: store hash & do robin hood hashing #5779

Conversation

andrewrk commented Jul 2, 2020

data-man commented Jul 2, 2020

andrewrk commented Jul 2, 2020