-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
std: count hash_map tombstones as available #10337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When entries are inserted and removed into a hash map at an equivalent rate (maintaining a mostly-consistent total count of entries), it should never need to be resized. But `HashMapUnmanaged.available` does not presently count tombstoned slots as "available", so this put/remove pattern eventually panics (assertion failure) when `available` reaches `0`. The solution implemented here is to count tombstoned slots as "available". Another approach (which hashbrown (https://github.com/rust-lang/hashbrown/blob/b3eaf32e608d1ec4c10963a4f495503d7f8a7ef5/src/raw/mod.rs#L1455-L1542) takes) would be to rehash all entries in place when there are too many tombstones. This is more complex but avoids an `O(n)` bad case when the hash map is full of many tombstones.
|
If I remember correctly, the reason why tombstones participate in the load factor (hence are not counted as "available" slots) is that they participate in collision chains. When doing lookups, even if the hashmap has 0.1% used slots and 79.9% tombstones, it will effectively have a high probability of probing, and that's what we try to minimize. Hence the fact that we grow even if the hashmap appears almost empty.
In this scenario, resizes can occur, but their likeliness will decrease exponentially in probability.
From my tests it's a good compromise between memory usage and performance (but as always, it depends on the use case). The hashmap performs quite nicely even with this value, but feel free to do measurements on probe lengths to convince yourself. However I'm quite against lowering it without a substantial amount of actual data to back this claim. Clustering is a matter of hash function quality, and the stdlib uses state of the art, very high quality hash functions (at least last time I checked). |
|
Thanks for clarification & feedback! A brief summary of my use case:
Right, but The current behavior seems unsafe especially because if/when it panics is so unpredictable. Currently, when |
|
I don't have much expertise on this, but it sounds like both options can be justified. Maybe it could be exposed via a compile-time option, just like the load ratio is currently configurable? As for which option should be the default, I'm not sure. Devil's advocate, in a theoretically endless usage scenario with random data, I don't know if I'd prefer theoretically boundless memory usage or runtime. I guess a compromise could be to make it a |
|
@Sahnvour I understand where you're coming from. As luck would have it, there's a recent paper, Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering from Stony Brook/Google/MIT published in July this year, that dives into all of this and comes back out with some counter-intuitive recommendations (my understanding around Swiss and FB's F14 tables was altered on several points the past hour!). Apologies for the lengthy quotes, but they're such awesome sections and seem to be speaking right to the heart of what we're wanting to know here:
This point from the paper in particular also helped me to understand better how queries and insertions interact:
Therefore, I think the key is to see this asymmetry, and then deal with queries and insertions separately:
So... from the point of view of performance, I think we're actually good on incrementing the And this paper also seems to support the 80% load factor that the std lib also arrived at... my thinking on this changed overnight — I used to think that 50% is the safe default for linear/triangular probing, but your numbers have also quantitively shown that the current default of 80% is reasonable. So we are agreed there. On the other hand, the early resize definitely carries a very heavy performance cost (rehashing all keys in the table), and impacts the overall amortized runtime. The less resizes the better, unless the user explicitly decides on a compact/rehash policy to shorten existing probe lengths—but I don't believe the std lib should expose compact/rehash in the interface or make any decision to handle that internally at all. If minimizing existing probe lengths is really a concern then there are more recent probing strategies we can adopt that would offer guaranteed worst-case 1 cache miss 99% of the time with a 2nd cache miss 1% of the time, no primary clustering. I believe that's significantly better than linear probing's average case, and certainly its worst-case. I would love to implement this in Zig, and hopefully will be able to soon. However, after performance, from the immediate point of view of correctness and explicitness, the surprising status quo assertion crash is a critical showstopper for embedded environments. Without this PR, TigerBeetle would be unable to rely on the std lib's I think @sentientwaffle (who's on the TigerBeetle team with me) makes a strong case also when he shows that the status quo of the counter leak breaks the |
|
Thanks for the paper, it's very interesting (although I can't say I get all the subtleties) 🙂 It's true and intuitive that tombstones help insertions, and that synthetic benchmarks testing only insertions may be quite far from real-world usage. In fact I believe benchmarking hashmaps is depressingly hard because of the number of different use cases and their performance characteristics. gotta go fast could definitely be expanded and improved.
Yes, this is absolutely an issue.
You're referring to the proposed graveyard hashing, correct ? |
|
It does seem to be worse perf according to our benchmarks as currently measured:
Some open questions:
|
|
@andrewrk this PR also introduced a necessary fix to guard against infinite loop on wraparound (we should have highlighted this more in the discussion), and I believe that's what's responsible for the performance regression (extra branch in the hot loop, that might be mitigated in the short term with unrolling, or in the long term with a better probing strategy): https://github.com/ziglang/zig/pull/10337/files#diff-8d3864cfd9fd2f29dd2b0458387c1036858fa8ac78d64517736675fbe3eaf33cR1118 Regarding benchmarks:
Regarding complicating the hash map API, I would keep it simple and not let the current probing implementation leak out, because I don't think the current probing strategy is the best maxima to begin with. There are slightly better probing strategies that don't use tombstones at all, that don't need to handle infinite loops on wraparound, and that have tighter guaranteed worst-case bounds. I wouldn't be worried by the regression at this stage, we should rather move to a better maxima. |
|
On second thought, there's a trick we could use to safely eliminate the expensive Both linear probing (what we appear to be using at present) and Swiss Table's triangular probing (but then we must always use power-of-two table sizes, which we do at present) will then be guaranteed to probe all slots and find a free slot. Sorry about this, my bad! |
|
No need to apologize- I'll count it as a win for our new perf tracking system :-) |
|
Gotta go fast! |
|
I was just thinking about this yesterday and about to propose the same thing :)
They do run the same code at each commit, and track metrics evolution. So unless we change the load factor for example, the comparison is quite fair. |
See ziglang#10337 for context. In ziglang#10337 the `available` tracking fix necessitated an additional condition on the probe loop in both `getOrPut` and `getIndex` to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot. The new condition hurts the `HashMap` benchmarks (ziglang#10337 (comment)). This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns. `getOrPut` has one additional gotcha — if the home slot is a tombstone and `getOrPut` misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored.
See ziglang#10337 for context. In ziglang#10337 the `available` tracking fix necessitated an additional condition on the probe loop in both `getOrPut` and `getIndex` to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot. The new condition hurts the `HashMap` benchmarks (ziglang#10337 (comment)). This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns. `getOrPut` has one additional gotcha — if the home slot is a tombstone and `getOrPut` misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored. Other changes: - Test hash map misses. - Test using `getOrPutAssumeCapacity` to get keys at the end (along with `get`).
See ziglang#10337 for context. In ziglang#10337 the `available` tracking fix necessitated an additional condition on the probe loop in both `getOrPut` and `getIndex` to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot. The new condition hurts the `HashMap` benchmarks (ziglang#10337 (comment)). This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns. `getOrPut` has one additional gotcha — if the home slot is a tombstone and `getOrPut` misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored. Other changes: - Test hash map misses. - Test using `getOrPutAssumeCapacity` to get keys at the end (along with `get`).
See #10337 for context. In #10337 the `available` tracking fix necessitated an additional condition on the probe loop in both `getOrPut` and `getIndex` to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot. The new condition hurts the `HashMap` benchmarks (#10337 (comment)). This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns. `getOrPut` has one additional gotcha — if the home slot is a tombstone and `getOrPut` misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored. Other changes: - Test hash map misses. - Test using `getOrPutAssumeCapacity` to get keys at the end (along with `get`).
When entries are inserted and removed into a hash map at an equivalent rate (maintaining a mostly-consistent total count of entries), the map should never need to be resized. But
HashMapUnmanaged.availabledoes not presently count tombstoned slots as "available", so this put/remove pattern eventually panics (assertion failure) whenavailablereaches0.The solution implemented here is to count tombstoned slots as "available". Another approach (which hashbrown takes) would be to rehash all entries in place when there are too many tombstones. This is more complex but avoids an
O(n)bad case when the hash map has many tombstones.The assertion failure is shown below. The new test cast exercises this behavior. This is the same problem described in #7468.
I'm not quite sure how the
"std.hash_map ensureUnusedCapacity with tombstones"test should be updated, since it was written with the expectation (and a note) that tombstones count as load.As an aside, the current
default_max_load_percentage = 80is rather high and may lead to (or exacerbate) primary clustering issues, so that should be addressed in the future.