Skip to content

Conversation

@sentientwaffle
Copy link
Contributor

In #10337 the available tracking fix necessitated an additional condition on the probe loop in both getOrPut and getIndex to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot.

The new condition hurts the HashMap benchmarks (#10337 (comment)).

This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns.

getOrPut has one additional gotcha — if the home slot is a tombstone and getOrPut misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored.

Other changes:

  • Test hash map misses.
  • Test using getOrPutAssumeCapacity to get keys at the end (along with get).

This approach is inspired by @jorangreef's suggested optimization (#10337 (comment)).

On second thought, there's a trick we could use to safely eliminate the expensive limit check completely, just set available at startup to be one less than the actual capacity so that lookups are always guaranteed to terminate naturally in the presence of tombstones with no infinite loop.

But that doesn't quite work; an insert/remove cycle will eventually convert any free slots to tombstones. Joran and I considered a couple other approaches before discovering the temporarily-free-a-slot tactic:

  • Can trailing tombstones be converted back to free slots? This doesn't always fix the issue. Consider the slot array [A;B;C] with a max load of 2/3. Put A, Put B, Free A, Put C, and now it's back to needing a limit condition. (This may still be worth doing in the future to reduce the likelihood of a Miss's O(n) worst-case, even though it isn't flawless. Either that or rehash-in-place.)
  • Can the slot array include a special always-free slot at the end? But lookups still need to wrap around it to find collisions, so that doesn't help either.

cc: @Sahnvour

See ziglang#10337 for context.

In ziglang#10337 the `available` tracking fix necessitated an additional condition on the probe loop in both `getOrPut` and `getIndex` to prevent an infinite loop. Previously, this condition was implicit thanks to the guaranteed presence of a free slot.

The new condition hurts the `HashMap` benchmarks (ziglang#10337 (comment)).

This commit removes that extra condition on the loop. Instead, when probing, first check whether the "home" slot is the target key — if so, return it. Otherwise, save the home slot's metadata to the stack and temporarily "free" the slot (but don't touch its value). Then continue with the original loop. Once again, the loop will be implicitly broken by the new "free" slot. The original metadata is restored before the function returns.

`getOrPut` has one additional gotcha — if the home slot is a tombstone and `getOrPut` misses, then the home slot is is written with the new key; that is, its original metadata (the tombstone) is not restored.

Other changes:

- Test hash map misses.
- Test using `getOrPutAssumeCapacity` to get keys at the end (along with `get`).
@sentientwaffle sentientwaffle force-pushed the hash-map-available-faster branch from c4d6b84 to 3861c96 Compare December 17, 2021 21:01
@sentientwaffle
Copy link
Contributor Author

Just pushed a fix for the failing test at https://ci.ziglang.org/ziglang/zig/859/1/4 (due to mixing up usize and u64).

@andrewrk
Copy link
Member

Thanks for the follow-up. Were you able to observe any perf improvements with these changes?

@sentientwaffle
Copy link
Contributor Author

I tested locally with just perf stat. The branch count was definitely improved, but the instruction count & utime seemed inconclusive.

@andrewrk
Copy link
Member

Good to know, thanks. I'm fine with merging this and giving it a spin in our perf tracking dashboard. But if it remains inconclusive then it probably makes sense to revert since this is slightly more complex than before.

@sentientwaffle
Copy link
Contributor Author

Sounds good!

@andrewrk andrewrk merged commit 11803a3 into ziglang:master Dec 17, 2021
@andrewrk
Copy link
Member

andrewrk commented Dec 17, 2021

Perf results are up

My observations:

  • strictly worse in terms of CPU instructions
  • slightly worse wall time (but this can be noisy)
  • sometimes better, sometimes worse for branch predictions

Seems like it should be reverted to me.

@sentientwaffle
Copy link
Contributor Author

Agreed; I'll dig into it more on Monday.

@andrewrk
Copy link
Member

reverted in 6d04de7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants