Skip to content

Expose pipeline operator metrics in execution node and pipeline executor #3376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jul 24, 2023

Conversation

Dakostu
Copy link

@Dakostu Dakostu commented Jul 19, 2023

This change implements exposure of pipeline metrics via each running operator in the execuction node.
Right now this will be needed to propagate metrics to the pipeline-manager via https://github.com/tenzir/tenzir-plugins/pull/86.

@Dakostu Dakostu added feature New functionality engine Core pipeline and storage engine labels Jul 19, 2023
@Dakostu Dakostu requested a review from a team July 19, 2023 09:57
@Dakostu Dakostu force-pushed the topic/expose-pipeline-metrics branch from 0ecd287 to 999e9d7 Compare July 19, 2023 14:07
@Dakostu Dakostu marked this pull request as ready for review July 20, 2023 06:36
@Dakostu Dakostu force-pushed the topic/expose-pipeline-metrics branch from 999e9d7 to 09d6939 Compare July 20, 2023 11:26
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the exec node changes, which look great to me.

Daniel Kostuj and others added 2 commits July 21, 2023 16:31
Co-authored-by: Dominik Lohmann <mail@dominiklohmann.de>
Co-authored-by: Dominik Lohmann <mail@dominiklohmann.de>
@Dakostu Dakostu force-pushed the topic/expose-pipeline-metrics branch 2 times, most recently from 4debde3 to aace5e1 Compare July 21, 2023 15:09
@Dakostu Dakostu force-pushed the topic/expose-pipeline-metrics branch from ae1b478 to 8346fc8 Compare July 24, 2023 14:25
Copy link
Contributor

@jachris jachris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out by printing the metrics locally. Seems to work just fine! 👍

Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to merge this now and to have any follow-up implemented in #3390. For example, that already contains an improvement upon the exit handler usage from this PR.

Let's ship it.

@Dakostu Dakostu enabled auto-merge July 24, 2023 14:52
@Dakostu Dakostu merged commit 1669932 into main Jul 24, 2023
@Dakostu Dakostu deleted the topic/expose-pipeline-metrics branch July 24, 2023 16:49
dominiklohmann added a commit that referenced this pull request Jul 24, 2023
This PR is based on @Dakostu's excellent #3376 that laid the groundwork
for metrics.

It comes with the following metrics-related changes:

- We now calculate the approximate number of bytes in execution nodes
when they handle events. The calculated number is an underestimate and
(1) never includes the byte size of the schema, (2) excludes some array
data buffers depending on the Arrow version used.
- `tenzir --dump-metrics '<pipeline>'` prints metrics on stderr after
the pipeline finished.
- A set of bug fixes regarding metrics collection that were still in
#3376.

I also snuck in the following change that is unrelated to metrics, but
that I stumbled over while developing this PR:
- Sorting now works for `ip` and `enum` fields.

This is what `--dump-metrics` looks like:

```
❯ # Using the M57 trace from the Get Started guide:
❯ cat Zeek/*.log | tenzir --dump-metrics 'read zeek-tsv | to file /dev/null'
operator #1 (load)
  elapsed: 4.02s
  scheduled: 553.13ms (13.75%)
  running: 484.85ms (12.05%)
  outbound:
    bytes: 199,898,907 at a rate of 49677650.77/s
    batches: 12,201 (16383.81 bytes/batch)
operator #2 (read)
  elapsed: 6.06s
  scheduled: 1.84s (30.34%)
  running: 1.78s (29.39%)
  inbound:
    bytes: 199,898,907 at a rate of 32995890.01/s
    batches: 12,201 (16383.81 bytes/batch)
  outbound:
    events: 953,677 at a rate of 157416.68/s
    bytes: 1,997,471 at a rate of 329708.32/s (estimate)
    batches: 54 (17660.69 events/batch)
operator #3 (write)
  elapsed: 7.96s
  scheduled: 7.67s (96.45%)
  running: 7.66s (96.24%)
  inbound:
    events: 953,677 at a rate of 119873.34/s
    bytes: 1,997,471 at a rate of 251074.02/s (estimate)
    batches: 54 (17660.69 events/batch)
  outbound:
    bytes: 541,837,222 at a rate of 68106745.97/s
    batches: 570 (950591.62 bytes/batch)
operator #4 (save)
  elapsed: 7.96s
  scheduled: 6.31ms (0.08%)
  running: 3.94ms (0.05%)
  inbound:
    bytes: 541,837,222 at a rate of 68102061.06/s
    batches: 570 (950591.62 bytes/batch)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine Core pipeline and storage engine feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants