Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

raghavan20 · 2022-07-25T15:04:23Z

Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a Deployment. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn up

number of executions processed for each customer/tenant
number of success or failures for each customer/tenant

What is the best way to publish metrics from these ephemeral workers? We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time

We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work. Please suggest.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

raghavan20 commented Jul 25, 2022

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #60

Comments

raghavan20 commented Jul 25, 2022