std: hash_map: optimize isFree/isTombstone #10562

sentientwaffle · 2022-01-10T16:50:56Z

Optimize HashMap.

Add an Metadata.isFree helper method (not an optimization, just for clarity).
Implement Metadata.isTombstone and Metadata.isFree with @bitCast then comparing to a constant. I assume @bitCast-then-compare is faster than the old method because it only involves one comparison, and doesn't require bitmasking.
Summary of benchmark results (gotta-go-fast, run locally, compared to master):
- 3/4 of the hash map benchmarks used ~10% fewer cycles
- The last one (project Euler) shows 4% fewer cycles.

Benchmarks

Measurements are from running the gotta-go-fast benchmarks locally on my laptop.
available: Measures current master (a couple weeks out of date, but there haven't been any changes to hash_map in that time).
pre-available-fix: Prior to the std: count hash_map tombstones as available #10337 fix.
available-faster: Not actually faster. Measures the (now reverted) std: optimize hash_map probe loop condition #10350.
is-free-tomb-bitcasts: This PR.

label	available	pre-available-fix	available-faster	is-free-tomb-bitcasts
commit	`ef0566d`	`d54ba76`	`3861c96`	`d1c8fb5`
random-find
task-clock (ms)	16,534	13,772	18,384	14,945
context-switches	0	0	0	0
cpu-migrations	0	0	0	0
page-faults	467	469	467	467
cycles	31,104,803,977	25,945,119,875	34,588,214,725	28,155,425,962
instructions	80,226,182,400	70,491,433,560	86,563,110,421	90,400,354,978
branches	14,189,529,026	11,809,816,759	12,769,551,259	14,083,913,924
branch misses	319,005,834	229,735,717	252,086,595	122,622,922
L1-dcache-loads	8,896,046,933	7,907,823,608	10,536,298,474	10,919,177,972
L1-dcache-load-misses	174,337,652	164,790,077	172,284,377	148,872,050
LLC-loads	3,725,037	2,245,202	3,826,286	1,444,452
LLC-load-misses	343,656	93,780	354,132	51,303
random-distinct
task-clock (ms)	1,892	1,860	1,993	1,700
context-switches	0	0	0	0
cpu-migrations	0	0	0	0
page-faults	66,911	66,912	66,910	66,910
cycles	3,269,601,475	3,163,723,978	3,452,727,683	2,889,333,627
instructions	3,426,699,012	2,677,609,405	3,699,799,236	3,342,951,341
branches	517,835,326	433,026,244	542,492,094	467,979,288
branch misses	24,678,474	24,636,232	32,133,887	25,161,544
L1-dcache-loads	552,992,863	394,913,076	631,125,865	575,465,372
L1-dcache-load-misses	106,111,926	109,344,475	112,017,648	107,277,507
LLC-loads	51,676,188	56,874,928	53,674,073	52,664,954
LLC-load-misses	17,796,075	22,959,431	16,115,924	15,356,085
insert-10M-int
task-clock (ms)	3,021	2,960	4,185	2,781
context-switches	0	0	0	0
cpu-migrations	0	0	0	0
page-faults	73,759	73,760	73,759	73,759
cycles	5,348,995,897	5,191,935,672	7,583,195,490	4,891,625,417
instructions	3,798,916,797	3,144,621,177	4,629,854,698	3,733,511,249
branches	619,672,586	553,510,387	720,072,866	502,442,844
branch misses	32,061,922	31,377,103	43,435,586	31,756,749
L1-dcache-loads	570,232,081	458,894,618	711,517,039	623,767,204
L1-dcache-load-misses	179,585,635	182,700,348	191,241,512	180,021,265
LLC-loads	80,555,904	83,639,512	81,533,360	79,334,076
LLC-load-misses	32,926,971	34,002,117	31,937,085	31,143,504
project-euler-14-main
task-clock (ms)	325	388	430	315
context-switches	0	0	0	0
cpu-migrations	0	0	0	0
page-faults	17,416	17,416	17,417	17,416
cycles	534,002,560	651,213,437	718,199,420	511,703,222
instructions	396,849,402	395,656,209	467,199,680	382,598,844
branches	71,840,304	69,010,106	72,189,551	60,020,886
branch misses	5,080,626	4,667,034	5,949,324	4,655,130
L1-dcache-loads	79,627,594	77,122,812	95,701,841	81,966,220
L1-dcache-load-misses	20,540,324	21,259,274	22,035,398	20,491,521
LLC-loads	9,833,842	10,622,492	10,177,942	9,656,479
LLC-load-misses	4,140,400	5,168,282	4,763,765	3,520,579

And I experimented a few other optimizations that didn't pan out -- see here for all the data.

cc: @jorangreef @Sahnvour

- Add an `Metadata.isFree` helper method. - Implement `Metadata.isTombstone` and `Metadata.isFree` with `@bitCast` then comparing to a constant. I assume `@bitCast`-then-compare is faster than the old method because it only involves one comparison, and doesn't require bitmasking. - Summary of benchmarked changes (`gotta-go-fast`, run locally, compared to master): - 3/4 of the hash map benchmarks used ~10% fewer cycles - The last one (project Euler) shows 4% fewer cycles.

lib/std/hash_map.zig

Sahnvour · 2022-01-10T21:16:09Z

I would have expected the optimizer to catch this simplification!

SpexGuy · 2022-01-10T21:34:57Z

I would have expected the optimizer to catch this simplification!

It's not quite equivalent, the old check of isUsed() or isTombstone() would stop looping for slots which were marked as unused but had nonzero fingerprint bits. The new version will keep looping if this happens, because it doesn't recognize the slot as free. The optimizer can't prove that this case doesn't happen, so it can't make this transformation.

sentientwaffle · 2022-01-10T21:40:57Z

Restructuring isUsed or isTombstone into !isFree is a change for clarity, not performance. The actual optimization is the @bitCast(u8, self) == slot_free; (as opposed to checking the used and fingerprint fields independently).

I too was surprised that the compiler didn't handle that optimization (i.e. combining comparisons of a packed struct's fields) automatically.

- Add an `Metadata.isFree` helper method. - Implement `Metadata.isTombstone` and `Metadata.isFree` with `@bitCast` then comparing to a constant. I assume `@bitCast`-then-compare is faster than the old method because it only involves one comparison, and doesn't require bitmasking. - Summary of benchmarked changes (`gotta-go-fast`, run locally, compared to master): - 3/4 of the hash map benchmarks used ~10% fewer cycles - The last one (project Euler) shows 4% fewer cycles.

SpexGuy approved these changes Jan 10, 2022

View reviewed changes

std: hash_map: fix formatting

3379258

SpexGuy reviewed Jan 10, 2022

View reviewed changes

lib/std/hash_map.zig Show resolved Hide resolved

Sahnvour self-requested a review January 10, 2022 21:16

Sahnvour approved these changes Jan 10, 2022

View reviewed changes

andrewrk merged commit 4731a6e into ziglang:master Jan 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std: hash_map: optimize isFree/isTombstone #10562

std: hash_map: optimize isFree/isTombstone #10562

sentientwaffle commented Jan 10, 2022

Sahnvour commented Jan 10, 2022

SpexGuy commented Jan 10, 2022

sentientwaffle commented Jan 10, 2022

std: hash_map: optimize isFree/isTombstone #10562

std: hash_map: optimize isFree/isTombstone #10562

Conversation

sentientwaffle commented Jan 10, 2022

Sahnvour commented Jan 10, 2022

SpexGuy commented Jan 10, 2022

sentientwaffle commented Jan 10, 2022