-
-
Notifications
You must be signed in to change notification settings - Fork 94
Summarize operator with pluggable aggregation functions #2417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
02ea344
to
9723975
Compare
754785a
to
fb30b6f
Compare
This rewrites the `summarize` pipeline operator to have pluggable aggregation functions through the new `aggregation_function_plugin` plugin type, which each implement a simple virtual `aggregation_function` interface for incremental aggregation. In addition, this converts both the summarize plugin and the existing aggregation functions to native plugins, making them always available when using VAST. The configuration for the aggregation also changed, allowing users to specify output field names explicitly and aggregating multiple columns into one. This is the new configuration that I primarily used for testing with `suricata.flow` events: ```yaml summarize: group-by: - timestamp - proto - event_type time-resolution: 1 hour aggregate: timestamp_min: min: timestamp timestamp_max: max: timestamp pkts_toserver: sum pkts_toclient: sum bytes_toserver: sum bytes_toclient: sum start: min end: max alerted: any ips: distinct: - src_ip - dest_ip ```
fb30b6f
to
0973775
Compare
Just wondering: should |
I don't think so to be honest, at least not for the foreseeable future while we still use the YAML based configuration. It's hard to fit additional parametrization for the aggregations functions in it, and I'd want to wait with doing so until we have an actual use case or request for it. |
changelog/unreleased/changes/2417--aggregation-function-plugins.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked over this together, minor comments inline
This did concatenation, which was certainly not expected.
This rewrites the
summarize
pipeline operator to have pluggable aggregation functions through the newaggregation_function_plugin
plugin type, which each implement a simple virtualaggregation_function
interface for incremental aggregation.In addition, this converts both the summarize plugin and the existing aggregation functions to native plugins, making them always available when using VAST.
The configuration for the aggregation also changed, allowing users to specify output field names explicitly and aggregating multiple columns into one. This is the new configuration that I primarily used for testing with
suricata.flow
events:📝 Checklist
aggregation_function_plugin
plugin type.🎯 Review Instructions
I recommend going file-by-file, and testing locally with compaction. The following order is ideal:
aggregation_function_plugin
.aggregation_function
summarize
pipeline operator implementation.summarize
configuration, and how it's bound to a schema to form anaggregation
.This PR also includes the work from #2391, which I've closed accordingly.