Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial ZeroHashMap #2579

Merged
merged 39 commits into from Feb 13, 2023
Merged

Add initial ZeroHashMap #2579

merged 39 commits into from Feb 13, 2023

Conversation

pdogr
Copy link
Contributor

@pdogr pdogr commented Sep 19, 2022

#2532
Adds core functionality of ZeroHashMap.
Perfect hashes are computed using CHD algorithm, with aHash being used as the hashing function.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: This looks like the type of hash map impl that will work well in ICU4X. It looks algorithmically similar to the approach rkyv is taking.

I would like to see benches against ZeroMap before merging, both data size and lookup performance, and code size for bonus points.

utils/zerovec/Cargo.toml Outdated Show resolved Hide resolved
utils/zerovec/src/map/map.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
@pdogr
Copy link
Contributor Author

pdogr commented Sep 19, 2022

Praise: This looks like the type of hash map impl that will work well in ICU4X. It looks algorithmically similar to the approach rkyv is taking.

I would like to see benches against ZeroMap before merging, both data size and lookup performance, and code size for bonus points.

Yes I based my implementation on rkyv's hashmap with a few changes. Both use CHD algorithm with the random hash function approach. There is a practical approach mentioned in Appendix A of the paper which uses a simpler hash but computes key hash only once as compared to twice for the random hash function approach.

Changes from rkyv impl

  1. Different hash function
  2. Normally we would have stored the value inline at the slot, but here we store index to original placement of (k, v).
  3. assignments.contains(&index) in rkyv takes O(bucket_chain_len) which might slow hash building a bit (I haven't benchmarked it) depending on the bucket_chain len. So we use generation trick to get contains fast.

rkyv does have one additional optimization for the case when the chain has only one bucket.
Instead of storing the seed in this case, the original index will be stored in the displacement array and the seeds will have the highest bit set to 1. As this original index will have the highest bit 0, these won't collide.

@sffc
Copy link
Member

sffc commented Sep 19, 2022

Sounds good. When optimizing, please focus on (1) data size and (2) lookup speed. I don't care too much about building speed since it is done offline.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the construction algorithm but wanted to leave my comments so far

utils/zerovec/Cargo.toml Outdated Show resolved Hide resolved
fn compute_hash<K: Hash>(seed: u32, k: K, m: usize) -> u32 {
let mut hasher = create_hasher_with_seed(seed.into());
k.hash(&mut hasher);
(hasher.finish() % m as u64) as u32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(hasher.finish() % m as u64) as u32
(hasher.finish() as usize % m) as u32

This avoids doing a u64 mod instruction on 32-bit architectures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to take this back, this does not give the same results on 32 and 64 bit. Assume the hash is 2^32 + 1, then on 32 bit that gets truncated to 1, which is not congruent to 2^32+1 under arbitrary moduli.

I don't think we need 64 bits here, so let's do the arithmetic in 32 bits and then widen to usize.

utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
rename iter -> keys
change return type of compute hash
change container to FlexZeroVec
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
add #[zerovec::derive(Hash)] which derives byte hash of the ule
fix some function signatures
@pdogr
Copy link
Contributor Author

pdogr commented Sep 20, 2022

Added some benchmarks reusing the data from the existing ones.
Results

zeromap/lookup/large    time:   [188.63 ns 188.94 ns 189.27 ns]
                        change: [-2.8188% -2.3026% -1.7709%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

zerohashmap/lookup/large
                        time:   [82.513 ns 82.638 ns 82.766 ns]
                        change: [-1.1974% -0.7730% -0.3681%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

zeromap/lookup/large/hashmap
                        time:   [65.359 ns 65.476 ns 65.601 ns]
                        change: [-1.3054% -0.8122% -0.3220%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

A few tradeoffs and observations

  1. Almost half of the time is used in the hash computation, I'll try "Practical approach" given in the above paper to see if it shows improvement.
  2. FlexZeroVec vs ZeroVec for HashIndex entries (size vs lookup time)
  3. Reverse mapping. Ideally we want to store key, value together as (K, V) which would remove the need for reverse mapping and also better locality.
  4. Hash function. These benches are using ahash for now. I will do some experiments with platform independent hashes.

@sffc
Copy link
Member

sffc commented Sep 20, 2022

@pdogr Good data on the large map. How does the performance compare on the small version of the map?

@pdogr
Copy link
Contributor Author

pdogr commented Sep 20, 2022

@pdogr Good data on the large map. How does the performance compare on the small version of the map?

For smaller keys hash computation time really comes into picture

zeromap/lookup/small    time:   [52.080 ns 52.323 ns 52.614 ns]
                        change: [+5.7258% +6.4265% +7.0869%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

zerohashmap/lookup/small
                        time:   [98.832 ns 99.025 ns 99.231 ns]
                        change: [-0.8245% -0.3353% +0.1726%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

zeromap/lookup/small/hashmap
                        time:   [65.108 ns 65.336 ns 65.587 ns]
                        change: [-5.1179% -4.3311% -3.5866%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

@pdogr
Copy link
Contributor Author

pdogr commented Sep 20, 2022

FlexZeroVec vs ZeroVec

Using ZeroVec improves the lookup time by ~ 11% (99ns -> 87ns) for small lookups and ~25% (82ns -> 64ns) for large lookups with the existing algorithm.

@sffc
Copy link
Member

sffc commented Sep 20, 2022

It seems that another way to architect this could be to make this a standalone type that maps from keys to an index, and then keep a proper ZeroMap as the second stage of lookup.

utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/map.rs Outdated Show resolved Hide resolved
remove duplicate casts
support only one `build_from_exact_iter` function
where
K: Hash,
{
let l1 = compute_hash(0, k, self.displacements.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if the map is empty, as compute_hash will do % 0. Add that as a precondition to compute_hash and guard against it here.

utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
@pdogr
Copy link
Contributor Author

pdogr commented Sep 23, 2022

Switched to the practical approach mentioned in that paper with 64 bit (16 bits for g, 24 bits for f0, f1).
Changed the hash function to wyhash.

Benches
zeromap/lookup/small    time:   [50.796 ns 50.913 ns 51.042 ns]
                        change: [+0.1679% +0.7275% +1.3576%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

zeromap/lookup/large    time:   [196.49 ns 197.21 ns 198.13 ns]
                        change: [-7.7555% -6.8637% -5.9877%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

zeromap/lookup/small/hashmap
                        time:   [69.802 ns 70.089 ns 70.439 ns]
                        change: [+2.9376% +3.6548% +4.3437%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  8 (8.00%) high mild

zeromap/lookup/large/hashmap
                        time:   [66.232 ns 66.464 ns 66.749 ns]
                        change: [+0.4782% +1.0997% +1.7342%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  6 (6.00%) high severe

zerohashmap/lookup/small
                        time:   [45.760 ns 45.844 ns 45.928 ns]
                        change: [-1.8354% -1.3624% -0.9334%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild

zerohashmap/lookup/large
                        time:   [43.285 ns 43.356 ns 43.434 ns]
                        change: [+3.1555% +3.8198% +4.4009%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe


utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
utils/zerovec/src/map/hashmap.rs Outdated Show resolved Hide resolved
@pdogr pdogr marked this pull request as ready for review January 12, 2023 10:25
@pdogr pdogr requested a review from a team as a code owner January 12, 2023 10:25
utils/zerovec/derive/src/make_varule.rs Show resolved Hide resolved
utils/zerovec/derive/src/make_varule.rs Show resolved Hide resolved
utils/zerovec/src/hashmap/algorithms.rs Outdated Show resolved Hide resolved
utils/zerovec/src/hashmap/mod.rs Outdated Show resolved Hide resolved
utils/zerovec/src/hashmap/mod.rs Outdated Show resolved Hide resolved
utils/zerovec/src/hashmap/algorithms.rs Show resolved Hide resolved
/// placeholder.
#[cfg_attr(feature = "serde", serde(borrow))]
displacements: ZeroVec<'a, (u32, u32)>,
keys: K::Container,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the next pr: this doesn't borrow keys and values. You'll need a ZeroHashMapBorrowed and all the boilerplate like in ZeroMap...

/// assert_eq!(hashmap.get(&2), Some("c"));
/// assert_eq!(hashmap.get(&4), None);
/// ```
#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the next pr: ZeroMap has a special implementation to have readable JSON, you can probably steal that by just pointing it at keys and values.

utils/zerovec/src/lib.rs Outdated Show resolved Hide resolved
utils/zerovec/src/hashmap/mod.rs Outdated Show resolved Hide resolved
pdogr and others added 5 commits January 13, 2023 01:39
utils/zerovec/src/hashmap/mod.rs Show resolved Hide resolved
where
K: ZeroMapKV<'a> + ?Sized,
V: ZeroMapKV<'a> + ?Sized,
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not blocking: The iterator code is duplicated across ZeroMap and ZeroHashMap now. Move the logic to ZeroVecLike to deduplicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this in a follow-up.

compute_index in u64
shift back to (usize, u32, u32) from split_hash64
where
S: serde::Serializer,
{
(&self.displacements, &self.keys, &self.values).serialize(serializer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this isn't great for self-describing formats like JSON, but it's fine for now.

@robertbastian robertbastian removed the request for review from a team February 10, 2023 15:30
@pdogr pdogr merged commit 6de6825 into unicode-org:main Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants