Open
Description
What would you like to be added:
KSM should have the ability to set an upper limit on number of objects ingested.
Why is this needed:
We observed an event where an autoscaler by accident created 10k+ ReplicaSets which KSM tried to report on. This caused KSM to run out of memory and we lost visibility into the cluster.
I know we can limit it already on the scraping end in Prometheus, this is just to avoid that ksm is running out of resources and to give another signal on what's going on in the cluster.
Describe the solution you'd like
- Have a generic and a resource-level command-line option that KSM should use to limit number of items read from the Kubernetes API.
- Have metrics exposed
kube_objects_watched{group="foo", kind="bar" version="baz"}
andkube_objects_watched_max
which shows the configuration limit to allow alerting if the threshold gets hit.
Activity
rexagod commentedon Mar 4, 2025
/triage accepted
If this is planned further down the line, would you prefer if I moved this issue to https://github.com/rexagod/resource-state-metrics?
mrueg commentedon Mar 4, 2025
Rather duplicate it, I think the unbound number of objects on crs and native resources both need to be addressed.
dgrisonnet commentedon Mar 5, 2025
+1 for that feature
Having new metrics and an alert that tells us when ksm reaches object limits could definitely help. For a more in-depth investigation, we could document using the
apiserver_storage_objects
metric from the kube-apiserver as well as audit log to be able to tell what is happening and who's the rogue client creating the objects.