Skip to content

metrics: Processing Engine #26228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
6 tasks
peterbarnett03 opened this issue Apr 6, 2025 · 3 comments
Open
6 tasks

metrics: Processing Engine #26228

peterbarnett03 opened this issue Apr 6, 2025 · 3 comments
Labels

Comments

@peterbarnett03
Copy link
Contributor

peterbarnett03 commented Apr 6, 2025

Processing Engine Metrics

Right now we serve little information on Processing Engine performance, so this ticket looks to add some basic metrics to track.

Update the /metrics endpoint to serve the following metrics:

New Processing Engine Metrics:

  • plugin_execution_duration_seconds_bucket: Amount of time spent executing a plugin, per plugin, per trigger, with trigger type, bucketed into 0.001, 0.0025, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 10, inf seconds
    --- Plugin name should be entered as the file name without .py on the end.
  • plugin_execution_duration_seconds_sum: Total amount of time spent executing a plugin, per plugin, per trigger, with trigger type
  • plugin_execution_duration_seconds_count: Total number of times a plugin is executed, per plugin, per trigger, with trigger type
  • processing_engine_memory_size_bytes: Total size of Processing Engine memory in bytes
    --- If this can be broken down further into threads, that'd be great, but not required
  • processing_engine_plugin_errors: Total number of errors, per plugin, per trigger
  • processing_engine_memory_size_bytes: Total size of Processing Engine memory in bytes

E.g.

...
plugin_execution_duration_seconds_bucket{plugin="sample_plugin",trigger="sample_trigger",type="on_request",le="0.05"} 2
plugin_execution_duration_seconds_bucket{plugin="sample_plugin",trigger="sample_trigger",type="on_request",le="0.1"} 3
plugin_execution_duration_seconds_bucket{plugin="sample_plugin",trigger="sample_trigger",type="on_request",le="0.25"} 5
...
plugin_execution_duration_seconds_sum{plugin="sample_plugin",trigger="sample_trigger",type="on_request"} 0.68
plugin_execution_duration_seconds_count{plugin="sample_plugin",trigger="sample_trigger",type="on_request"} 5
@hiltontj
Copy link
Contributor

Labelling the metrics per plugin and per trigger may cause too high of a cardinality, especially for the duration histograms. Would you consider db label to group them by database as we have done for other metrics as a starting point?

Trigger type is one that we can label because the cardinality of that is bounded (there are 5 or so types).

@pauldix
Copy link
Member

pauldix commented May 16, 2025

We don't expect that many triggers to be defined. What does it look like if they have 1k triggers (an exceptionally high number)?

@hiltontj
Copy link
Contributor

I guess cardinality would be Ntriggers in this case, regardless of how many plugins or databases or trigger types (each trigger has a single type).

There are 15 lines emitted by the /metrics API for each duration histogram, so there would be worst case 15,000 lines.

I'm not very familiar with the limitations of prometheus or what is considered high cardinality, only that they recommend against unbounded cardinality for labels. If this is acceptable then I won't block it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants