Skip to content

Cap / Limit number of objects ingested for native and Custom Resource Metrics #2622

Open
@mrueg

Description

@mrueg

What would you like to be added:
KSM should have the ability to set an upper limit on number of objects ingested.
Why is this needed:
We observed an event where an autoscaler by accident created 10k+ ReplicaSets which KSM tried to report on. This caused KSM to run out of memory and we lost visibility into the cluster.
I know we can limit it already on the scraping end in Prometheus, this is just to avoid that ksm is running out of resources and to give another signal on what's going on in the cluster.
Describe the solution you'd like

  • Have a generic and a resource-level command-line option that KSM should use to limit number of items read from the Kubernetes API.
  • Have metrics exposed kube_objects_watched{group="foo", kind="bar" version="baz"} and kube_objects_watched_max which shows the configuration limit to allow alerting if the threshold gets hit.

Additional context

Activity

added
kind/featureCategorizes issue or PR as related to a new feature.
on Mar 3, 2025
added
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.
on Mar 3, 2025
rexagod

rexagod commented on Mar 4, 2025

@rexagod
Member

/triage accepted

If this is planned further down the line, would you prefer if I moved this issue to https://github.com/rexagod/resource-state-metrics?

added
triage/acceptedIndicates an issue or PR is ready to be actively worked on.
and removed
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.
on Mar 4, 2025
mrueg

mrueg commented on Mar 4, 2025

@mrueg
MemberAuthor

Rather duplicate it, I think the unbound number of objects on crs and native resources both need to be addressed.

dgrisonnet

dgrisonnet commented on Mar 5, 2025

@dgrisonnet
Member

+1 for that feature

give another signal on what's going on in the cluster

Having new metrics and an alert that tells us when ksm reaches object limits could definitely help. For a more in-depth investigation, we could document using the apiserver_storage_objects metric from the kube-apiserver as well as audit log to be able to tell what is happening and who's the rogue client creating the objects.

linked a pull request that will close this issue on Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @mrueg@k8s-ci-robot@dgrisonnet@rexagod

      Issue actions

        Cap / Limit number of objects ingested for native and Custom Resource Metrics · Issue #2622 · kubernetes/kube-state-metrics