-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
volumebinding: scheduler queueing hints - CSIStorageCapacity #124961
volumebinding: scheduler queueing hints - CSIStorageCapacity #124961
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bells17 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @sanposhiho @utam0k |
/kind feature |
20b3c2b
to
411d7ae
Compare
bedc76c
to
74f80bf
Compare
pkg/scheduler/framework/plugins/volumebinding/volume_binding_test.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding_test.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding_test.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
} | ||
return events | ||
} | ||
|
||
func (pl *VolumeBinding) isSchedulableAfterCSIStorageCapacityChange(logger klog.Logger, pod *v1.Pod, oldObj, newObj interface{}) (framework.QueueingHint, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often is CSIStorageCapacity
updated in an actual world?
If it's infrequent in general, can we just return Queue when CSIStorageCapacity is created/updated and a Pod has specific .spec.volumes?
e.g.,
func (pl *VolumeBinding) isSchedulableAfterCSIStorageCapacityChange(logger klog.Logger, pod *v1.Pod, oldObj, newObj interface{}) (framework.QueueingHint, error) {
//...
for _, vol := range pod.Spec.Volumes {
if vol.PersistentVolumeClaim != nil || vol.Ephemeral != nil:
// This Pod might have got unschedulable due to CSIStorageCapacity in a past scheduling cycle.
return framework.Queue, nil
}
}
return framework.QueueSkip, nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment.
How often is CSIStorageCapacity updated in an actual world?
CSIStorageCapacity is updated by the external-provisioner of each CSI Driver. The relevant code can be found here:
https://github.com/kubernetes-csi/external-provisioner/blob/v5.0.0/pkg/capacity/capacity.go#L592-L669
There are several events that trigger the update, but with the default settings, each CSIStorageCapacity for a single CSI Driver is updated at least every minute:
https://github.com/kubernetes-csi/external-provisioner/blob/v5.0.0/pkg/capacity/capacity.go#L512-L526
https://github.com/kubernetes-csi/external-provisioner/blob/v5.0.0/cmd/csi-provisioner/csi-provisioner.go#L111
The number of CSIStorageCapacity objects depends on the NodeTopology configuration of the target CSI Driver.
For example, when using network volumes like the gcp-compute-persistent-disk-csi-driver, it can be on a per-zone basis:
https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/v999.1.0/pkg/gce-pd-csi-driver/node.go#L377-L392
When using lvm running on each node, like topolvm, CSIStorageCapacity is created on a per-node basis:
https://github.com/topolvm/topolvm/blob/main/internal/driver/node.go#L557-L566
If it's infrequent in general, can we just return Queue when CSIStorageCapacity is created/updated and a Pod has specific .spec.volumes?
Yes, that approach alone can help skip queueing Pods unrelated to CSIStorageCapacity, reducing unnecessary overhead.
5346abf
to
be636e8
Compare
be636e8
to
362a702
Compare
@sanposhiho @toVersus Thank you for your review. I have made the necessary changes, so please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some nits, other than that LGTM.
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding.go
Outdated
Show resolved
Hide resolved
pkg/scheduler/framework/plugins/volumebinding/volume_binding_test.go
Outdated
Show resolved
Hide resolved
for _, vol := range pod.Spec.Volumes { | ||
if vol.PersistentVolumeClaim != nil || vol.Ephemeral != nil { | ||
// This Pod might have got unschedulable due to CSIStorageCapacity in a past scheduling cycle. | ||
logger.V(5).Info("CSIStorageCapacity was created or updated, but the pod doesn't mount any persistent volumes and generic ephemeral storage volumes. It doesn't make this pod schedulable.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me wonder: in the first place, could a Pod without any PVC or ephemeral volume get unschedulable by VolumeBinding plugin?
If No, we likely end up having no QHint for CSIStorageCapacity
(for now, at least until we build some observability like I said).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I made a mistake in correcting the log message.
(This has been fixed)
It makes me wonder: in the first place, could a Pod without any PVC or ephemeral volume get unschedulable by VolumeBinding plugin?
I think the answer to this is No.
If No, we likely end up having no QHint for CSIStorageCapacity (for now, at least until we build some observability like I said).
Regarding this point, I think that if we don't return QueueSkip in the QHint, scheduling will be attempted due to CSIStorageCapacity changes even in cases where the Pod doesn't have a Volume. Would this not cause any issues in your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this point, I think that if we don't return QueueSkip in the QHint, scheduling will be attempted due to CSIStorageCapacity changes even in cases where the Pod doesn't have a Volume. Would this not cause any issues in your opinion?
Only QHints of plugins that have rejected a Pod would be run. e.g., if, in a previous scheduling cycle, a Pod is failed by resource fit plugin and PodAffinity plugin, then only QHints of those plugins would be run for the Pod.
Then, if the answer to my first question "could a Pod without any PVC or ephemeral volume get unschedulable by VolumeBinding plugin?" is No, filtering Pods with volume in QHint doesn't make any point since only Pods with volume would reach this QHint in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your input. I understand.
I'll proceed to close the unnecessary PRs.
82b1102
to
8a942f3
Compare
pkg/scheduler/framework/plugins/volumebinding/volume_binding_test.go
Outdated
Show resolved
Hide resolved
/lgtm |
@toVersus: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/close |
@bells17: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
kube-scheduler implements scheduling hints for the VolumeBinding plugin.
The scheduling hints allow the scheduler to determine whether to retry or skip scheduling a Pod based on the changes made to the CSIStorageCapacity resource referenced by the plugin.
Which issue(s) this PR fixes:
Part of #118893
KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md
Base PR: #124939
Special notes for your reviewer:
Fields Impacting QueueingHintFn
PersistentVolume (PV) is not included in this table because it can undergo extensive changes when a conversion is performed by csi-translation-lib.
ref(ja): https://zenn.dev/bells17/scraps/65bd6891012bdc
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: