Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make data predicate evaluation column-major #2730

Merged

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented Nov 24, 2022

This is a major performance optimization for expression evaluation that essentially fills in one of the last TODOs from #2440: Making data predicate evaluation itself column-major, i.e., moving as much as possible of the evaluation out of the hot loop.

Here's how the new expression algorithm works, roughly, with (3.ii) and (4) being the new and improved steps:

  1. Normalize the selection bitmap from the dense index result to the length of the batch + offset.
  2. Determine whether the expression is empty, a connective of some sort, or a predicate. For connectives, resolve them recursively and combine the resulting bitmaps accordingly.
  3. Evaluate predicates:
    1. If it's a meta extractor, operate on the batch metadata. In case of a match, the selection bitmap is the result directly.
    2. If it's a data predicate, access the desired array, and lift the resolved types for both sides of the predicate into a compile time context for the column evaluator.
  4. The column evaluator has specialization based on the three-tuple of lhs type, relational operator, and rhs view. The generic fall back case iterates over all fields per the selection bitmap to do the evaluation using the cell evaluator, which can be specialized per relational operator.

We've seen some pretty great results. When using a query on a 6M event database that's doing multiple substring searches (for which we have neither sparse nor dense index support), the full scan of the database is now over 2x quicker than before.

Concretely, we measured the following numbers using Hyperfine with three warmup runs and ten measured runs for this commit compared to its merge-base with master de86af3fca.

New:

Time (mean ± σ):     11.038 s ±  0.360 s    [User: 28.600 s, System: 4.430 s]
Range (min … max):   10.808 s … 11.569 s    10 runs

Old:

Time (mean ± σ):     22.805 s ±  0.616 s    [User: 45.060 s, System: 4.826 s]
Range (min … max):   22.042 s … 23.568 s    10 runs

This was a Hackathon group effort with @patszt, @dispanser, and @dominiklohmann.

@dominiklohmann dominiklohmann added the performance Improvements or regressions of performance label Nov 24, 2022
dominiklohmann added a commit that referenced this pull request Nov 24, 2022
I noticed this when working on #2730; I doubt this makes a big
difference overall, but we still shouldn't do unnecessary work to infer a schema
when we already have it.
@dominiklohmann dominiklohmann requested a review from a team November 24, 2022 11:53
@dominiklohmann dominiklohmann added the blocked Blocked by an (external) issue label Nov 24, 2022
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good at a high level. I only left some minor questions/comments. If CI passes, this would work for me.

libvast/src/evaluate.cpp Outdated Show resolved Hide resolved
libvast/src/evaluate.cpp Outdated Show resolved Hide resolved
libvast/src/evaluate.cpp Show resolved Hide resolved
libvast/src/evaluate.cpp Outdated Show resolved Hide resolved
libvast/src/evaluate.cpp Outdated Show resolved Hide resolved
libvast/src/evaluate.cpp Outdated Show resolved Hide resolved
libvast/src/evaluate.cpp Show resolved Hide resolved
dominiklohmann added a commit that referenced this pull request Nov 24, 2022
I noticed this when working on #2730; I doubt this makes a big
difference overall, but we still shouldn't do unnecessary work to infer a schema
when we already have it.
dominiklohmann added a commit that referenced this pull request Nov 24, 2022
I noticed this when working on #2730; I doubt this makes a big
difference overall, but we still shouldn't do unnecessary work to infer a schema
when we already have it.
@dominiklohmann dominiklohmann force-pushed the story/sc-35030/columnar-data-predicate-evaluation branch from c584a14 to dd4d0b5 Compare November 25, 2022 17:17
dominiklohmann added a commit that referenced this pull request Nov 25, 2022
I noticed this when working on #2730; I doubt this makes a big
difference overall, but we still shouldn't do unnecessary work to infer a schema
when we already have it.
@dominiklohmann dominiklohmann added rc and removed blocked Blocked by an (external) issue labels Dec 6, 2022
dominiklohmann added a commit that referenced this pull request Dec 6, 2022
I noticed this when working on #2730; I doubt this makes a
big difference overall, but we still shouldn't do unnecessary work to
infer a schema when we already have it.
dispanser and others added 7 commits December 6, 2022 18:29
This is a major performance optimization for expression evaluation that
essentially fills in one of the last TODOs from #2440: Making
data predicate evaluation itself column-major, i.e., moving as much as possible
of the evaluation out of the hot loop.

Here's how the new expression algorithm works, roughly, with (3b) and (4) being
the new and improved steps:
1. Normalize the selection bitmap from the dense index result to the length
   of the batch + offset.
2. Determine whether the expression is empty, a connective of some sort, or
   a predicate. For connectives, resolve them recursively and combine the
   resulting bitmaps accordingly.
3. Evaluate predicates:
   a) If it's a meta extractor, operate on the batch metadata. In case of a
      match, the selection bitmap is the result directly.
   b) If it's a data predicate, access the desired array, and lift the
      resolved types for both sides of the predicate into a compile time
      context for the column evaluator.
4. The column evaluator has specialization based on the three-tuple of lhs
   type, relational operator, and rhs view. The generic fall back case
   iterates over all fields per the selection bitmap to do the evaluation
   using the cell evaluator, which can be specialized per relational
   operator.

We've seen some pretty great results. When using a query on a 6M event database
that's doing multiple substring searches (for which we have neither sparse nor
dense index support), the full scan of the database is now over 2x quicker than
before.

Concretely, we measured the following numbers using Hyperfine with three warmup
runs and ten measured runs for this commit compared to its merge-base with
master `de86af3fca`.

New:

Time (mean ± σ):     11.038 s ±  0.360 s    [User: 28.600 s, System: 4.430 s]
Range (min … max):   10.808 s … 11.569 s    10 runs

Old:

Time (mean ± σ):     22.805 s ±  0.616 s    [User: 45.060 s, System: 4.826 s]
Range (min … max):   22.042 s … 23.568 s    10 runs

This was a Hackathon group effort with @patszt, @dispanser, and @dominiklohmann.

Co-authored-by: Patryk Sztyglic <patryk.sztyglic@gmail.com>
Co-authored-by: Thomas Peiselt <pi@kulturguerilla.org>
Co-authored-by: Dominik Lohmann <mail@dominiklohmann.de>
Co-authored-by: Matthias Vallentin <matthias@vallentin.net>
@dominiklohmann dominiklohmann force-pushed the story/sc-35030/columnar-data-predicate-evaluation branch from f4c5bd4 to 98add3e Compare December 6, 2022 17:30
@dominiklohmann dominiklohmann merged commit 8972422 into master Dec 6, 2022
@dominiklohmann dominiklohmann deleted the story/sc-35030/columnar-data-predicate-evaluation branch December 6, 2022 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements or regressions of performance
Projects
None yet
3 participants