Skip to content

Implement support for :string == /pattern/ queries #2769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 9, 2022

Conversation

dominiklohmann
Copy link
Member

This adds support to VAST for queries of field extractors against patterns. It also removes support for the redundant match operator, which was broken in many ways.

Querying a pattern always does a full scan of all partitions for which the field extractor bound, and there is some further room for optimization still: The current implementation actually goes to the relevant sparse and dense indexes instead of skipping them outright. That would require a larger refactoring of accessing dense and sparse indexes based on the three-tuple of (lhs, op, rhs) rather than just lhs in a normalized predicate.

This removes a noisy warning for queries like `http.hostname ~ /.*\.com$/`,
which actually do work like expected despite being _really_ slow because we
instantiate the regular expression for every individual element instead of doing
that only once.
The match operator was something we never really implemented in its
entirety, and it also just doesn't make sense to have. `:string ==
/pattern/` and `:string in /pattern/` already are a superset of its
functionality.

Since it was broken anyways this doesn't warrant a deprecation period or
us rewriting expressions dynamically.
@dominiklohmann dominiklohmann added the feature New functionality label Dec 7, 2022
@dominiklohmann dominiklohmann requested review from mavam and tobim December 7, 2022 16:24
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not check for completeness but reviewed the diff thoroughly.

This change makes it so we only instantiate regex objects once per batch
instead of once per row, causing a massive speedup even for simple
pattern searches.
@dominiklohmann dominiklohmann added blocked Blocked by an (external) issue and removed blocked Blocked by an (external) issue labels Dec 8, 2022
This change makes it so we only instantiate regex objects once per batch
instead of once per row, causing a massive speedup even for simple
pattern searches.
@dominiklohmann dominiklohmann merged commit b1c0e96 into master Dec 9, 2022
@dominiklohmann dominiklohmann deleted the story/sc-28660/pattern-search branch December 9, 2022 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants