You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have successfully operated all our Deckard instances with a single housekeeper pod for years. To enhance scalability of the housekeeper tasks, I propose the following improvements for the housekeeper feature:
Implement a distributed locking mechanism for each task to support running multiple housekeeper pods simultaneously. While most tasks can run concurrently due to their atomic nature, running the same task in parallel can lead to resource waste.
Address potential issues, such as Prometheus metrics duplication. Currently, we expose numerous queue metrics in the /metrics endpoint of a Deckard instance with the housekeeper enabled. Since the housekeeper is responsible for measuring many of these metrics, duplication can occur if we deploy many housekeper pods with the /metrics enabled. We can consider deploying an individual metrics pod or explore alternative solutions to mitigate this issue.
By incorporating these enhancements, we aim to achieve better scalability, improved fault tolerance, and overall performance in our distributed Deckard setup.
The text was updated successfully, but these errors were encountered:
We have successfully operated all our Deckard instances with a single housekeeper pod for years. To enhance scalability of the housekeeper tasks, I propose the following improvements for the housekeeper feature:
Implement a distributed locking mechanism for each task to support running multiple housekeeper pods simultaneously. While most tasks can run concurrently due to their atomic nature, running the same task in parallel can lead to resource waste.
Address potential issues, such as Prometheus metrics duplication. Currently, we expose numerous queue metrics in the
/metrics
endpoint of a Deckard instance with the housekeeper enabled. Since the housekeeper is responsible for measuring many of these metrics, duplication can occur if we deploy many housekeper pods with the/metrics
enabled. We can consider deploying an individual metrics pod or explore alternative solutions to mitigate this issue.By incorporating these enhancements, we aim to achieve better scalability, improved fault tolerance, and overall performance in our distributed Deckard setup.
The text was updated successfully, but these errors were encountered: