-
-
Notifications
You must be signed in to change notification settings - Fork 98
Add support for type-level synopses and a string synopsis #1214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the class string_synopsis
? Can't we just use a buffered_bloom_filter_synopsis
or buffered_bloom_filter_synopsis<T>
if we want to restrict it to a single type? I would argue the lookup function can have a precondition that it was already type-checked and that it can be fully polymorphic.
A related question: how does that buffered_string_synopsis
behave differently from the buffered_address_synopsis
?
934e5bc
to
8e94807
Compare
fef1295
to
77c4119
Compare
ec6412a
to
3599ba6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reviewed this together in a Slack call. Looks and works great!
c3f73c5
to
8bc3f4b
Compare
8bc3f4b
to
56ee0c8
Compare
Previously, when faced with a query like zeek.dns.query == "example.org" and a database that does not contain any zeek.dns records, the meta index would schedule *all* partitions since no synopsis existed that could rule out any particular partition. With this change, we also ignore partitions that have no potential matches at all.
* Unify `buffered_address_synopsis` and `buffered_string_synopsis` into a single `buffered_synopsis<T>`. * String attributes in meta index before checking type equality. * Rename `fprate` to `fp-rate` globally. * Correctly apply past tense of `shrink` where necesary.
56ee0c8
to
67eaf64
Compare
@@ -12,6 +12,10 @@ Every entry has a category for which we use the following visual abbreviations: | |||
|
|||
## Unreleased | |||
|
|||
- 🎁 A new type-level synopsis structure in the meta-index now massively | |||
speeds up string queries with very few results. | |||
[#1214](https://github.com/tenzir/vast/pull/1214) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case where I don't see the need for a "feature" changelog. It's a non-functional change.
It would only make sense to introduce a performance-related category where we document performance-affecting changes.
📔 Description
This adds support for type-level synopses to the meta index. For each field in a partition, it is possible to decide whether to use a specific synopsis, share one with the other fields of the same type, or not to use a synopsis at all.
A new string synopsis and buffered string synopsis is also introduced.
📝 Checklist
🎯 Review Instructions
By commit.