std: optimize hash_map probe loop condition #10350

sentientwaffle · 2021-12-17T19:54:03Z

In #10337 the available tracking fix necessitated an additional condition on the probe loop in both getOrPut and getIndex to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot.

The new condition hurts the HashMap benchmarks (#10337 (comment)).

This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns.

getOrPut has one additional gotcha — if the home slot is a tombstone and getOrPut misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored.

Other changes:

Test hash map misses.
Test using getOrPutAssumeCapacity to get keys at the end (along with get).

This approach is inspired by @jorangreef's suggested optimization (#10337 (comment)).

On second thought, there's a trick we could use to safely eliminate the expensive limit check completely, just set available at startup to be one less than the actual capacity so that lookups are always guaranteed to terminate naturally in the presence of tombstones with no infinite loop.

But that doesn't quite work; an insert/remove cycle will eventually convert any free slots to tombstones. Joran and I considered a couple other approaches before discovering the temporarily-free-a-slot tactic:

Can trailing tombstones be converted back to free slots? This doesn't always fix the issue. Consider the slot array [A;B;C] with a max load of 2/3. Put A, Put B, Free A, Put C, and now it's back to needing a limit condition. (This may still be worth doing in the future to reduce the likelihood of a Miss's O(n) worst-case, even though it isn't flawless. Either that or rehash-in-place.)
Can the slot array include a special always-free slot at the end? But lookups still need to wrap around it to find collisions, so that doesn't help either.

cc: @Sahnvour

See ziglang#10337 for context. In ziglang#10337 the `available` tracking fix necessitated an additional condition on the probe loop in both `getOrPut` and `getIndex` to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot. The new condition hurts the `HashMap` benchmarks (ziglang#10337 (comment)). This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns. `getOrPut` has one additional gotcha — if the home slot is a tombstone and `getOrPut` misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored. Other changes: - Test hash map misses. - Test using `getOrPutAssumeCapacity` to get keys at the end (along with `get`).

sentientwaffle · 2021-12-17T21:02:03Z

Just pushed a fix for the failing test at https://ci.ziglang.org/ziglang/zig/859/1/4 (due to mixing up usize and u64).

andrewrk · 2021-12-17T22:30:11Z

Thanks for the follow-up. Were you able to observe any perf improvements with these changes?

sentientwaffle · 2021-12-17T23:06:54Z

I tested locally with just perf stat. The branch count was definitely improved, but the instruction count & utime seemed inconclusive.

andrewrk · 2021-12-17T23:13:40Z

Good to know, thanks. I'm fine with merging this and giving it a spin in our perf tracking dashboard. But if it remains inconclusive then it probably makes sense to revert since this is slightly more complex than before.

sentientwaffle · 2021-12-17T23:15:09Z

Sounds good!

andrewrk · 2021-12-17T23:48:01Z

Perf results are up

My observations:

strictly worse in terms of CPU instructions
slightly worse wall time (but this can be noisy)
sometimes better, sometimes worse for branch predictions

Seems like it should be reverted to me.

sentientwaffle · 2021-12-17T23:55:47Z

Agreed; I'll dig into it more on Monday.

andrewrk · 2021-12-17T23:59:36Z

reverted in 6d04de7

Sahnvour approved these changes Dec 17, 2021

View reviewed changes

sentientwaffle force-pushed the hash-map-available-faster branch from c4d6b84 to 3861c96 Compare December 17, 2021 21:01

andrewrk merged commit 11803a3 into ziglang:master Dec 17, 2021

sentientwaffle mentioned this pull request Jan 10, 2022

std: hash_map: optimize isFree/isTombstone #10562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

std: optimize hash_map probe loop condition #10350

std: optimize hash_map probe loop condition #10350

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021 •

edited

Loading

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

std: optimize hash_map probe loop condition #10350

std: optimize hash_map probe loop condition #10350

Uh oh!

Conversation

sentientwaffle commented Dec 17, 2021

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sentientwaffle commented Dec 17, 2021

Uh oh!

andrewrk commented Dec 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewrk commented Dec 17, 2021 •

edited

Loading