metrics: Processing Engine #26228

peterbarnett03 · 2025-04-06T17:56:04Z

Processing Engine Metrics

Right now we serve little information on Processing Engine performance, so this ticket looks to add some basic metrics to track.

Update the /metrics endpoint to serve the following metrics:

New Processing Engine Metrics:

plugin_execution_duration_seconds_bucket: Amount of time spent executing a plugin, per plugin, per trigger, with trigger type, bucketed into 0.001, 0.0025, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 10, inf seconds
--- Plugin name should be entered as the file name without .py on the end.
plugin_execution_duration_seconds_sum: Total amount of time spent executing a plugin, per plugin, per trigger, with trigger type
plugin_execution_duration_seconds_count: Total number of times a plugin is executed, per plugin, per trigger, with trigger type
processing_engine_memory_size_bytes: Total size of Processing Engine memory in bytes
--- If this can be broken down further into threads, that'd be great, but not required
processing_engine_plugin_errors: Total number of errors, per plugin, per trigger
processing_engine_memory_size_bytes: Total size of Processing Engine memory in bytes

E.g.

...
plugin_execution_duration_seconds_bucket{plugin="sample_plugin",trigger="sample_trigger",type="on_request",le="0.05"} 2
plugin_execution_duration_seconds_bucket{plugin="sample_plugin",trigger="sample_trigger",type="on_request",le="0.1"} 3
plugin_execution_duration_seconds_bucket{plugin="sample_plugin",trigger="sample_trigger",type="on_request",le="0.25"} 5
...
plugin_execution_duration_seconds_sum{plugin="sample_plugin",trigger="sample_trigger",type="on_request"} 0.68
plugin_execution_duration_seconds_count{plugin="sample_plugin",trigger="sample_trigger",type="on_request"} 5

The text was updated successfully, but these errors were encountered:

hiltontj · 2025-05-16T14:27:56Z

Labelling the metrics per plugin and per trigger may cause too high of a cardinality, especially for the duration histograms. Would you consider db label to group them by database as we have done for other metrics as a starting point?

Trigger type is one that we can label because the cardinality of that is bounded (there are 5 or so types).

pauldix · 2025-05-16T14:44:38Z

We don't expect that many triggers to be defined. What does it look like if they have 1k triggers (an exceptionally high number)?

hiltontj · 2025-05-16T16:13:22Z

I guess cardinality would be N_triggers in this case, regardless of how many plugins or databases or trigger types (each trigger has a single type).

There are 15 lines emitted by the /metrics API for each duration histogram, so there would be worst case 15,000 lines.

I'm not very familiar with the limitations of prometheus or what is considered high cardinality, only that they recommend against unbounded cardinality for labels. If this is acceptable then I won't block it.

peterbarnett03 added the v3 label Apr 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metrics: Processing Engine #26228

metrics: Processing Engine #26228

peterbarnett03 commented Apr 6, 2025 •

edited

Loading

hiltontj commented May 16, 2025

Uh oh!

pauldix commented May 16, 2025

Uh oh!

hiltontj commented May 16, 2025

Uh oh!

metrics: Processing Engine #26228

metrics: Processing Engine #26228

Comments

peterbarnett03 commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Processing Engine Metrics

New Processing Engine Metrics:

hiltontj commented May 16, 2025

Uh oh!

pauldix commented May 16, 2025

Uh oh!

hiltontj commented May 16, 2025

Uh oh!

peterbarnett03 commented Apr 6, 2025 •

edited

Loading