New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix query pruning in the catalog #2264
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This several issues with the query pruner which was aggressively reducing string lookups in compound statements using the fact that all strings in a partition share the same bloom filter. First, the relational operator was ignored leading to invalid transformations like http.hostname == "foo" || http.hostname != "foo" being "optimized" to just `http.hostname == "foo"` which is clearly not the same. Second, a query like http.hostname == "foo" || dns.rrname == "foo" would also be reduced to just `http.hostname == "foo"`, which returns false for a partition that contains only suricata.dns events regardless whether or not the second half of the disjunction would produced a candidate match. Finally, with the introduction of configurable false-positive rates for individual fields in VAST 2.0, even when applying this optimization correctly it may lead to a lower-precision type level synopsis being used for the candidate check.
lava
force-pushed
the
story/sc-33307/correct-pruning
branch
2 times, most recently
from
May 6, 2022 08:20
419341c
to
03104ad
Compare
This to work around invalid transformations such as 'Cause == "Stop"' -> ':string == "Stop"' where `Cause` is an enum field. While this patch does not completely fix the issue, it ensures that the issue is at least no more likely to appear than before.
This is done to further optimize queries like :string == "foo || :string == "foo" which are likely to occur as a result of optimization after the first round of pruning.
lava
force-pushed
the
story/sc-33307/correct-pruning
branch
from
May 6, 2022 09:41
f823f2d
to
0cca247
Compare
dominiklohmann
approved these changes
May 6, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testbed metrics dashboard shows a significant reduction in catalog lookup time, so this is working nicely.
I've looked at the code and don't have any complaints either, but please add a changelog entry.
The previous version of pruning would still optimize expressions like foo == "asdf" || bar == "asdf" with `foo` being an unprunable field because the condition was only checked once a duplicate was detected; ie. for all string fields except the first.
lava
force-pushed
the
story/sc-33307/correct-pruning
branch
from
May 9, 2022 13:08
e194970
to
4a415b3
Compare
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates the catalog to rewrite queries like
http.hostname == "foo" || dns.rrname == "foo"
to:string == "foo"
(as opposed to rewriting them tohttp.hostname == "foo"
, as was done previously).Additionally, introduces a set of field names that have separate configured bloom filters (presumably with a higher precision than the default one), which should not be touched by the pruner.
馃摑 Checklist
馃幆 Review Instructions
Review this pull request commit-by-commit.