You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a Deployment. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn up
number of executions processed for each customer/tenant
number of success or failures for each customer/tenant
What is the best way to publish metrics from these ephemeral workers? We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time
We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work. Please suggest.
The text was updated successfully, but these errors were encountered:
Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a
Deployment
. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn upWhat is the best way to publish metrics from these ephemeral workers? We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time
We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work. Please suggest.
The text was updated successfully, but these errors were encountered: