Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] KEP 5105: kubectl top pvc #5106

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tsmetana
Copy link
Member

  • One-line PR description: Add a new kubectl top persistentvolumeclaim subcommand that would allow to display volume usage of PVCs
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Jan 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tsmetana
Once this PR has been reviewed and has the lgtm label, please assign eddiezane for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 30, 2025
@tsmetana
Copy link
Member Author

/cc @ardaguclu
/cc @gmeghnag

@k8s-ci-robot
Copy link
Contributor

@tsmetana: GitHub didn't allow me to request PR reviews from the following users: gmeghnag.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @ardaguclu
/cc @gmeghnag

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@googs1025
Copy link
Member

/cc

@tsmetana tsmetana force-pushed the kep-5105-kubectl-top-pvc branch from 5c3233a to 15f2fec Compare January 30, 2025 11:50
Comment on lines 293 to 295
There are the `kubelet_volume_stats_used_bytes` and
`kubelet_volume_stats_capacity_bytes` Prometheus metrics which can be used to
compute the volume used space percentage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, kubectl top pod and kubectl top node are obtained by the metrics server definition API, not directly from prometheus. If we want to support kubectl top pvc, what we want is to get it directly from prometheus? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, kubectl top pod and kubectl top node are obtained by the metrics server definition API, not directly from prometheus. If we want to support kubectl top pvc, what we want is to get it directly from prometheus? 🤔

By using Prometheus query we can get the "most interesting" value (usage percentage) with just one call -- it is simple. Though I understand the Prometheus dependency is a disadvantage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @googs1025 mentioned, the implementation needs to be same with top node and top pod. Direct Prometheus query would be very non-standard (and does not work for some cases like exec credentials plugins, etc.).

Good part is following the same patterns of these 2 commands would make the implementation easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsmetana We have discussed similar issues in other issues, you can also refer to: 🤔
FYI: kubernetes/kubectl#1696
FYI: kubernetes-sigs/metrics-server#1623

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After going through the documentation, I wonder what would be the benefit of using the metrics API. The metrics-server README states:

Metrics Server is meant only for autoscaling purposes. For example, don't use it to forward metrics to monitoring solutions, or as a source of monitoring solution metrics. In such cases please collect metrics from Kubelet /metrics/resource endpoint directly.

And what we're trying to achieve sounds more like the "for example": just monitoring. Reading the metrics directly from kubelet would remove the necessity of running the metrics-server altogether. Yes, it's lighter dependency than Prometheus, but still an additional dependency that doesn't look necessary. In the light of the above quote I even wonder if the existing kubectl top implementation is doing the right thing by depending on the metrics API for simple resource usage querying and whether adding APIs for volume usage metrics would not be too heavyweight solution -- autoscalers have no use for them at all.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar KEP had been proposed in the past at #497, but it was closed without being implemented. Eventually, someone created a kubectl plugin (yashbhutwala/kubectl-df-pv) instead. Implementing it as a kubectl plugin might be one option.

If this feature depends on the Metrics API (where the metrics-server retrieves metrics from the kubelet and exposes them via the Metrics API), I think we need to discuss it with the SIG-Instrumentation (owner of the Metrics API and metrics-server subprojects) and get their agreement. See kubernetes-sigs/metrics-server#1318.

Here are some points I’m personally curious about:

  • What are the required permissions to execute kubectl top pvc? Read access to Pods and PVCs?
  • Not all PVs are mounted as file systems on Nodes. In the case of PVs using a FUSE CSI driver, what would be the expected result of kubectl top pvc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Not all PVs are mounted as file systems on Nodes. In the case of PVs using a FUSE CSI driver, what would be the expected result of kubectl top pvc?

This is a good point. 🤔 There may be multiple types of PVs and support for third-party to do their own, which may make this implementation more complicated? In addition: kubelet_volume_stats_used_bytes and
kubelet_volume_stats_capacity_bytes are both compatible and feasible for the current mainstream PVs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* What are the required permissions to execute `kubectl top pvc`? Read access to Pods and PVCs?

I guess that would depend on the implementation. The user would have to be able to get to the kubelet stats somehow. Either directly or potentially through the metrics-server. But I would really just keep the implementation confined to present statistics that are already present in the cluster.

  • Not all PVs are mounted as file systems on Nodes. In the case of PVs using a FUSE CSI driver, what would be the expected result of kubectl top pvc?

This is a good point. 🤔 There may be multiple types of PVs and support for third-party to do their own, which may make this implementation more complicated? In addition: kubelet_volume_stats_used_bytes and kubelet_volume_stats_capacity_bytes are both compatible and feasible for the current mainstream PVs?

The volume stats are being collected by cAdvisor AFAIK, so whatever it supports would work. That's also where potential new volume types support should come from. Which is another reason I'd like to really stick to using the existing metrics. I assume (I have some data from OpenShift cluster so I'm reasonably certain about it) they'd cover vast majority of use cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The volume stats are being collected by cAdvisor AFAIK

As summarized in the table here, it seems that volume stats are exposed by Kubelet.

I assume (I have some data from OpenShift cluster so I'm reasonably certain about it) they'd cover vast majority of use cases.

Basically, that's the case, but for CSI drivers that mount object storage via FUSE (e.g., awslabs/mountpoint-s3-csi-driver), the kubelet_volume_stats_used_bytes metric did not contain any values for such PVCs. If usage cannot be retrieved from kubelet_volume_stats_used_bytes, I think we could simply ignore the usage of them.

@tsmetana tsmetana force-pushed the kep-5105-kubectl-top-pvc branch from 15f2fec to e863b37 Compare February 3, 2025 14:10
@tsmetana tsmetana changed the title KEP 5105: kubectl top pvc [WIP] KEP 5105: kubectl top pvc Feb 6, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2025
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: make sure to check all the appropriate boxes.

[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
-->

Provide a simple command to display a PersistentVolumeClaim capacity usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Provide a simple command to display a PersistentVolumeClaim capacity usage.
Provide a simple command to display a PersistentVolumeClaim capacity usage, similar to `kubectl top node`.

This might be a good place to talk about core concepts and how they relate.
-->

The feature depends on the metric endpoints to be accessible by the user. That
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about metrics.k8s.io? If so let's make it clear.

extending the production code to implement this enhancement.
-->

- `<package>`: `<date>` - `<test coverage>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed.

https://storage.googleapis.com/k8s-triage/index.html
-->

- <test>: <link to test coverage>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required.

- [ ] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name:
- Components depending on the feature gate:
- [X] Other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like most kubectl enhancements, we'll introduce an env var KUBECTL_TOP_PVC for example to guard access to this functionality. It will go through the usual cycles alpha (default to off), beta (on by default), GA (locked).


###### Are there any tests for feature enablement/disablement?

No, there's no way to disable the feature in a single `kubectl` version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment about env var.

status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
creation-date: 2025-01-30
reviewers:
- TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put Arda and my name as reviewers and approvers here.

owning-sig: sig-cli
participating-sigs:
- sig-storage
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status should be implementable.

# The target maturity stage in the current dev cycle for this KEP.
# If the purpose of this KEP is to deprecate a user-visible feature
# and a Deprecated feature gates are added, they should be deprecated|disabled|removed.
stage: alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still targeting 1.33? Because this feature wasn't opted in, yet.

@soltysh
Copy link
Contributor

soltysh commented Feb 12, 2025

I almost forgot, you're gonna need PRR, iow. https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/template/nnnn.yaml filled in with matching number. We can ask John to handle that one.

in their cluster.

## Alternatives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a custom plugin to kubectl? If so, write an alternative here for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I believe it already exists: https://github.com/yashbhutwala/kubectl-df-pv. I will have to basically rewrite the KEP, so I'll make sure to include this in the alternatives as well as previous work.

Since the volume usage values are read from the metrics endpoints, the feature
is only useful to the users who have the matrics server configured and running
in their cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this report the metrics for all storage, or only for storage in-use within the cluster?

PVCs can exist without being mounted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since cAdvisor gets the metrics from statfs syscall, it would only be able to get data about mounted filesystems, i.e. in case of persistent volumes, the ones in use by the pods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a big limitation; to be honest, I think we should aim to cover all PVCs and report at least capacity. If we can't also report space used across all PVCs, maybe we should leave this as a plugin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of any (usable) way to get an usage of unmounted filesystem -- it requires knowledge of the filesystem internals. Even the capacity of unmounted PVCs is rather difficult to report. The numbers in API objects might be quite far from the reality. There are volume types where the capacity reported in the PVC status is only guaranteed to be lower than the actual volume size and nothing much else (e.g. NFS, which is quite popular one). I'm not sure how having this as plugin would help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we (Kubernetes) put something in-tree, it has to be kind of "right". For example, people might expect kubectl top pvc to do something that's not feasible. We (Kubernetes) would have to manage expectations there.

With such a large user base, it's actually really likely that end users would complain about the missing behavior, or that the subcommand is confusing.

However, if it's instead a community plugin and you install it and it's not as good as you hoped, this project doesn't have to handle the feedback there. A community plugin can provide half a solution and still be useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps providing a plugin example would be less risky than adding it directly. Considering that the current solution may not be universal, or the process of making it universal will be longer (after all, the metrics server needs to be changed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a plugin example, it can provide users (third parties) with their own command line tools. Of course, it is possible to prepare for both, providing plug-in examples in the short term, and then follow up in the long term if there is a long-term plan. 🤔 Just my opinion

@tsmetana
Copy link
Member Author

I think it's clear that this feature is out of scope for 1.33. I was overly optimistic. If I'm to go with the implementation using the metrics API, it would need a new feature also in metrics-server. And I would have to get more familiar with it as well as all the previous work since I seem to be repeating some well-known mistakes. And of course: we already have the plugin, so perhaps this whole effort is not as useful as I thought it would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

7 participants