Skip to content

Update xxHash and hashing APIs #1905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Oct 22, 2021
Merged

Update xxHash and hashing APIs #1905

merged 36 commits into from
Oct 22, 2021

Conversation

mavam
Copy link
Member

@mavam mavam commented Oct 14, 2021

📔 Description

This PR switches the default hash function from XXH64 to XXH3, making possible a 2-3x speed improvement.

📝 Checklist

  • Rip out vendored xxHash code
  • Update CMake scaffold (@dominiklohmann)
  • Rewrite hash algorithms
  • Refactor std::hash<T> specializations to use vast::hash
  • Fix unit tests
  • Install dependencies in CI
  • Fix static build (@tobim)
  • Replace vast::uhash with vast::hash where possible
  • Fix integration tests
  • Fix plugin CI checks (@dominiklohmann)
  • Rethink broken is_uniquely_represented traits (address, flow, uuid, integer)
  • Provide a default specialization for std:hash<T> when T is hashable
  • Make sure we can still ready old synopses (address hashing changed)
  • Support seeded hashing
  • Write changelog entries
    • address bugfix in PCAP reader
    • New external dependency

🎯 Review Instructions

Commit-by-commit.

@mavam mavam added performance Improvements or regressions of performance refactoring Restructuring of existing code labels Oct 14, 2021
@mavam mavam force-pushed the story/ch18496/xxh3 branch 6 times, most recently from a385e9a to 1e81f7d Compare October 14, 2021 11:47
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove xxhash.h from LICENSE.3rdparty as well.

@mavam mavam force-pushed the story/ch18496/xxh3 branch 4 times, most recently from 8b9dccd to 1a17e6d Compare October 15, 2021 11:57
mavam and others added 8 commits October 16, 2021 10:37
It turns out that we need this function to be stateless, because a
algorithm that is both oneshot and incremental will always intialize its
state upon construction, which is a pessimization when performing
one-shot hashing.
There is a clash with this overload:

template <class HashAlgorithm, class T>
requires portable_hash<T, HashAlgorithm>
void hash_append(HashAlgorithm& h, const T& x) noexcept {
  h(std::addressof(x), sizeof(x));
}
Provide a find module for xxhash
@mavam mavam force-pushed the story/ch18496/xxh3 branch from e354e1f to c674ec7 Compare October 16, 2021 08:38
@mavam mavam self-assigned this Oct 16, 2021
@mavam mavam force-pushed the story/ch18496/xxh3 branch from c674ec7 to 2e7978d Compare October 16, 2021 14:33
mavam added 5 commits October 16, 2021 16:38
We need the as_bytes() overload. This obviates the need for the member
function data(). Removing that one created a long tail of follow-ups.
The name is not very clear and with constexpr there's now almost no more
reason to not directly inline the actual conditional.
@mavam mavam force-pushed the story/ch18496/xxh3 branch from 7c6b45c to 57d1a4c Compare October 17, 2021 11:55
@mavam
Copy link
Member Author

mavam commented Oct 17, 2021

@dominiklohmann could you take a look at the current CI error? Seems to be a CMake thingy where I am a snail.

CMake Error at /__w/vast/vast/_install/lib/cmake/vast/VASTRegisterPlugin.cmake:464 (add_library):
  Target "example-analyzer-static" links to target
  "xxhash::xxhash_header_only" but the target was not found.  Perhaps a
  find_package() call is missing for an IMPORTED target, or an ALIAS target
  is missing?
Call Stack (most recent call first):
  CMakeLists.txt:16 (VASTRegisterPlugin)

@dominiklohmann
Copy link
Member

@dominiklohmann could you take a look at the current CI error? Seems to be a CMake thingy where I am a snail.


CMake Error at /__w/vast/vast/_install/lib/cmake/vast/VASTRegisterPlugin.cmake:464 (add_library):

  Target "example-analyzer-static" links to target

  "xxhash::xxhash_header_only" but the target was not found.  Perhaps a

  find_package() call is missing for an IMPORTED target, or an ALIAS target

  is missing?

Call Stack (most recent call first):

  CMakeLists.txt:16 (VASTRegisterPlugin)

That should be easy to fix, you likely just need to install xxhash in another place in CI. Let's chat about this tomorrow.

@mavam
Copy link
Member Author

mavam commented Oct 18, 2021

@dominiklohmann I tried fixing this in adfbd6d, but no luck by adding extra dependencies.

Just providing the find module alongside installations of libvast does
not suffice; we need to make use of it as well.
@mavam mavam force-pushed the story/ch18496/xxh3 branch 3 times, most recently from 496b7b8 to 8bed7a7 Compare October 19, 2021 18:30
mavam added 2 commits October 20, 2021 11:57
One deficiency of hash_append is that it doesn't support runtime
seeding. This commit offers an interface for this use case, yielding the
following two user-facing interfaces:

   hash<H>(x, y, z);
   seeded_hash<H>{a, b, c}(x, y, z);

Here, H is the algorithm, x, y, z objects to hash, and a, b, c seeds
given to the constructor of H. The implementation chooses the optimal
code path to perform oneshot or incremental hashing.
@mavam mavam requested a review from lava October 20, 2021 10:10
@mavam mavam marked this pull request as ready for review October 20, 2021 10:46
@mavam mavam added this to the Hashing Revamp milestone Oct 20, 2021
Copy link
Member

@lava lava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already had a few video calls over this, and I only have some minor comments.

We need to create a follow-up story to measure the speed-up from using xxh64_3, so we can double-check its not an accidental regression, and use the numbers for the release blogpost.

@mavam mavam force-pushed the story/ch18496/xxh3 branch from 8bac64b to 2bbb7cd Compare October 22, 2021 18:09
@mavam mavam requested a review from lava October 22, 2021 18:09
@mavam mavam enabled auto-merge October 22, 2021 18:09
@mavam mavam merged commit dfa0b7c into master Oct 22, 2021
@mavam mavam deleted the story/ch18496/xxh3 branch October 22, 2021 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements or regressions of performance refactoring Restructuring of existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants