-
-
Notifications
You must be signed in to change notification settings - Fork 98
Expose pipeline operator metrics in execution node and pipeline executor #3376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ecd287
to
999e9d7
Compare
999e9d7
to
09d6939
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only looked at the exec node changes, which look great to me.
Co-authored-by: Dominik Lohmann <mail@dominiklohmann.de>
Co-authored-by: Dominik Lohmann <mail@dominiklohmann.de>
4debde3
to
aace5e1
Compare
aace5e1
to
1f80add
Compare
ae1b478
to
8346fc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out by printing the metrics locally. Seems to work just fine! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to merge this now and to have any follow-up implemented in #3390. For example, that already contains an improvement upon the exit handler usage from this PR.
Let's ship it.
This PR is based on @Dakostu's excellent #3376 that laid the groundwork for metrics. It comes with the following metrics-related changes: - We now calculate the approximate number of bytes in execution nodes when they handle events. The calculated number is an underestimate and (1) never includes the byte size of the schema, (2) excludes some array data buffers depending on the Arrow version used. - `tenzir --dump-metrics '<pipeline>'` prints metrics on stderr after the pipeline finished. - A set of bug fixes regarding metrics collection that were still in #3376. I also snuck in the following change that is unrelated to metrics, but that I stumbled over while developing this PR: - Sorting now works for `ip` and `enum` fields. This is what `--dump-metrics` looks like: ``` ❯ # Using the M57 trace from the Get Started guide: ❯ cat Zeek/*.log | tenzir --dump-metrics 'read zeek-tsv | to file /dev/null' operator #1 (load) elapsed: 4.02s scheduled: 553.13ms (13.75%) running: 484.85ms (12.05%) outbound: bytes: 199,898,907 at a rate of 49677650.77/s batches: 12,201 (16383.81 bytes/batch) operator #2 (read) elapsed: 6.06s scheduled: 1.84s (30.34%) running: 1.78s (29.39%) inbound: bytes: 199,898,907 at a rate of 32995890.01/s batches: 12,201 (16383.81 bytes/batch) outbound: events: 953,677 at a rate of 157416.68/s bytes: 1,997,471 at a rate of 329708.32/s (estimate) batches: 54 (17660.69 events/batch) operator #3 (write) elapsed: 7.96s scheduled: 7.67s (96.45%) running: 7.66s (96.24%) inbound: events: 953,677 at a rate of 119873.34/s bytes: 1,997,471 at a rate of 251074.02/s (estimate) batches: 54 (17660.69 events/batch) outbound: bytes: 541,837,222 at a rate of 68106745.97/s batches: 570 (950591.62 bytes/batch) operator #4 (save) elapsed: 7.96s scheduled: 6.31ms (0.08%) running: 3.94ms (0.05%) inbound: bytes: 541,837,222 at a rate of 68102061.06/s batches: 570 (950591.62 bytes/batch) ```
This change implements exposure of pipeline metrics via each running operator in the execuction node.
Right now this will be needed to propagate metrics to the
pipeline-manager
via https://github.com/tenzir/tenzir-plugins/pull/86.