Skip to content

Conversation

stlaz
Copy link
Member

@stlaz stlaz commented Jun 4, 2025

  • One-line PR description: This PR looks into the metrics of the KEP and additionally bumps it to beta for 1.34.
  • Other comments: The original KEP suggested histograms but I am not sure how I would wire those up. I think counts and gauges might do the job.

/cc enj liggitt mikebrow haircommander

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 4, 2025
Copy link
Member

@BenTheElder BenTheElder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we consider splitting the cache directory into subdirectories based on a partial key?

This approach is common for on-disk content addressed caches to avoid having too many items in a single directory (see for example, git, ls -la .git/objects).

That might be good to change before locking in these details and having users depending on the layout to debug.

Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stlaz stlaz force-pushed the ensure-metrics branch from 870d272 to 8387057 Compare June 5, 2025 10:09
@stlaz
Copy link
Member Author

stlaz commented Jun 5, 2025

Did we consider splitting the cache directory into subdirectories based on a partial key?

We were briefly discussing this during design sessions but chose the flat design in the end. We shouldn't be hitting any directory limits.

alpha:
approver: "@jpbetz"
beta:
approver: "@jpbetz"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz 👋
I chose you because you were doing PRR for alpha, please let me know in case you would like me to pick somebody else 🙏

@jpbetz jpbetz mentioned this pull request Jun 5, 2025
21 tasks
@jpbetz
Copy link
Contributor

jpbetz commented Jun 5, 2025

/approve
For PRR

(Thanks @BenTheElder for the shadow review!)

@aramase
Copy link
Member

aramase commented Jun 16, 2025

/assign @haircommander @mikebrow

for SIG Node review

@stlaz stlaz force-pushed the ensure-metrics branch 2 times, most recently from 8d61177 to eb2ddfc Compare June 16, 2025 17:13
@haircommander
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see suggestions for clarification and additional useful metrics based on kubelet policy vs image pull policy

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 17, 2025

This will allow an admin to see how many reverification checks are being requested for existing images and how
many requests make it all the way to the persistent cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some way to show the difference between a successful anonymous access request .. and having to try N different sets of credentials.... one image verification might take a large number of tries when a number of private sets of credentials are used on a large number of containers in a pod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that perhaps that may not matter as much for observability of this feature but rather for observability of image pulls, which today is not great - we don't seem to track anything besides how long an image pull took, and that is also only tracked in case of a successful pull..

If we're to track the number of pulls per image until success, I think we can do that even outside this KEP. Would you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a needed metric that exists without the ensure image kep. More specifically the number of attempted image pulls that fail due to incorrect credentials anonymous(pull 1) key ring entry 1(pull 2) private key 1(pull 3)

Failed credential looping on pull requests could cost more than actually pulling the images.

The other issue that could use a metric is cost of using :latest to connect and recheck manifests again and again...

@stlaz stlaz force-pushed the ensure-metrics branch 2 times, most recently from d603818 to bf26b21 Compare June 18, 2025 08:53
stlaz added 5 commits June 18, 2025 17:44
Signed-off-by: Stanislav Láznička <slznika@microsoft.com>
Signed-off-by: Stanislav Láznička <slznika@microsoft.com>
This is now possible under the new template.

Signed-off-by: Stanislav Láznička <slznika@microsoft.com>
Signed-off-by: Stanislav Láznička <slznika@microsoft.com>
Signed-off-by: Stanislav Láznička <slznika@microsoft.com>
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2025
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, mikebrow, mrunalp, stlaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit e740dad into kubernetes:master Jun 18, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants