Skip to content

KEP 127: add a metric, describe an error kubelet will return, and target one more beta #5413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2025

Conversation

haircommander
Copy link
Contributor

  • One-line PR description:
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jun 12, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 12, 2025
@haircommander
Copy link
Contributor Author

cc @wojtek-t

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haircommander thanks! Left two simple comments, feel free to ignore the one about the metric :)

If that is the case, checking the pod events to see if they are failing for user namespaces reasons
(like the errors shown in this KEP) is advised, in which case it is recommended to rollback or
disable the feature gate.
If there are no successfully created user namespaced pods (but are pods that have been attempted to be created), then there may be an issue with user namespaces on that node.
Copy link
Member

@rata rata Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just name the metrics? Like the old text, but using the metrics

Suggested change
If there are no successfully created user namespaced pods (but are pods that have been attempted to be created), then there may be an issue with user namespaces on that node.
If the kubelet metric `started_user_namespaced_pods_errors_total` has a value close to `started_user_namespaced_pods_total` it means most of pods with userns started are failing. If that is the case, checking the pod events to see if they are failing for user namespaces reasons (like the errors shown in this KEP) is advised, in which case it is recommended to rollback or disable the feature gate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

- the version of Kubernetes where the KEP graduated to general availability
- when the KEP was retired or superseded
-->
- Kubernetes 1.34: Feature goes GA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kep.yaml says 1.35

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oop thanks!

@wojtek-t
Copy link
Member

+1 to adding those metrics - they are definitely useful and very easy to reason about and sound reasonbly straightforward to add

/approve PRR

@wojtek-t wojtek-t self-assigned this Jun 13, 2025
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @haircommander !

Copy link
Contributor

@mrunalp mrunalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, mrunalp, rata, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 16, 2025
@k8s-ci-robot k8s-ci-robot merged commit c64eaed into kubernetes:master Jun 16, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants