Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volumebinding: scheduler queueing hints - StorageClass #124958

Closed

Conversation

bells17
Copy link
Contributor

@bells17 bells17 commented May 20, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

kube-scheduler implements scheduling hints for the VolumeBinding plugin.
The scheduling hints allow the scheduler to determine whether to retry or skip scheduling a Pod based on the changes made to the StorageClass resource referenced by the plugin.

Which issue(s) this PR fixes:

Part of #118893
KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md
Base PR: #124939

Special notes for your reviewer:

Fields Impacting QueueingHintFn

PersistentVolume (PV) is not included in this table because it can undergo extensive changes when a conversion is performed by csi-translation-lib.

resource field Referenced in PreFilter+Filter? Admission Overwrite Prevention Config Need to Check Changes in QHint?
StorageClass .metadata.labels x x x
StorageClass .metadata.annotations x x x
StorageClass .provisioner o o x
StorageClass .parameters x o x
StorageClass .reclaimPolicy x o x
StorageClass .mountOptions x o x
StorageClass .allowVolumeExpansion x x x
StorageClass .volumeBindingMode o x o
StorageClass .allowedTopologies o x o

ref(ja): https://zenn.dev/bells17/scraps/65bd6891012bdc

Does this PR introduce a user-facing change?

kube-scheduler implements scheduling hints for the VolumeBinding plugin.
The scheduling hints allow the scheduler to retry scheduling a Pod that was previously rejected by the VolumeBinding plugin only if a new resource referenced by the plugin was created or an existing resource referenced by the plugin was updated.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 20, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bells17
Once this PR has been reviewed and has the lgtm label, please assign saad-ali for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels May 20, 2024
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 20, 2024
@bells17
Copy link
Contributor Author

bells17 commented May 20, 2024

/cc @sanposhiho @utam0k

@bells17
Copy link
Contributor Author

bells17 commented May 20, 2024

/retest

@sanposhiho
Copy link
Member

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 20, 2024
@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch from 651ccab to e4fd29b Compare May 20, 2024 12:41
@bells17
Copy link
Contributor Author

bells17 commented May 20, 2024

/retest

@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch 2 times, most recently from 837ce6f to c9d523a Compare May 21, 2024 04:35
@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch 2 times, most recently from e3ed38e to ee91dca Compare May 21, 2024 14:11
@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch from ee91dca to c4507f6 Compare May 21, 2024 18:36
@bells17 bells17 changed the title WIP: volumebinding: scheduler queueing hints - StorageClass volumebinding: scheduler queueing hints - StorageClass May 22, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 22, 2024
@bells17
Copy link
Contributor Author

bells17 commented May 22, 2024

/cc @carlory

@k8s-ci-robot k8s-ci-robot requested a review from carlory May 22, 2024 08:55
@bells17
Copy link
Contributor Author

bells17 commented May 22, 2024

/cc @jsafrane @xing-yang

if err != nil {
return framework.Queue, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's forbidden to update the volume binding mode of a storage class. If a storage class's binding mode isn't VolumeBindingWaitForFirstConsumer, when it is created or updated, the action itself doesn't make the pod become schedulable directly. If then some PVCs is bound by PVs due to the storage class (re-)created, it should be reflected in the PVC update event. so in this function, we should only take care of special storage classes with the VolumeBindingWaitForFirstConsumer binding mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only consider VolumeBindingWaitForFirstConsumer, this PR will depend on the other hint func which handle PVC events.

So let's merge another firstly once that PR is ready to be merged. @sanposhiho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion. In that case, would you mind reviewing the following PR first, which adds QHint for PVC?
#124959

if err != nil {
if pinfo.isEphemeral && apierrors.IsNotFound(err) {
err = fmt.Errorf("waiting for ephemeral volume controller to create the persistentvolumeclaim %q", pinfo.pvcName)
return framework.Queue, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return an error? If so, the pod will put into BackOffQ. It doesn't make pod become schedulable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. After considering your feedback, I think it might be better to make the following changes:

for _, vol := range pod.Spec.Volumes {
	var pvc *v1.PersistentVolumeClaim
	switch {
	case vol.PersistentVolumeClaim != nil:
		pvcName := vol.PersistentVolumeClaim.ClaimName
		pvc, err = pl.PVCLister.PersistentVolumeClaims(pod.Namespace).Get(pvcName)
		if err != nil {
			return framework.Queue, err
		}
	case vol.Ephemeral != nil:
		pvc = &v1.PersistentVolumeClaim{
			ObjectMeta: vol.Ephemeral.VolumeClaimTemplate.ObjectMeta,
			Spec:       vol.Ephemeral.VolumeClaimTemplate.Spec,
		}
	default:
		continue
	}

	if pvc.Spec.VolumeName != "" {
		// Skipping the check for CSIStorageCapacity as the PVC is configured
		// to be bound to an existing PV.
		continue
	}

	className := volume.GetPersistentVolumeClaimClass(pvc)
	if className == newSC.Name {
		if oldSC == nil {
			logger.V(4).Info("StorageClass was created")
			return framework.Queue, nil
		}

		if !apiequality.Semantic.DeepEqual(newSC.AllowedTopologies, oldSC.AllowedTopologies) {
			logger.V(4).Info("StorageClass was created or updated, and changed Provisioner", "AllowedTopologies", newSC.AllowedTopologies)
			return framework.Queue, nil
		}
	}
}

What do you think about this approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried committing the code content mentioned in the above comment.

@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch 2 times, most recently from 7cc8646 to 537975d Compare May 25, 2024 00:55
@bells17 bells17 requested a review from carlory May 25, 2024 01:05
…e's PVC Template when a Pod is using an Ephemeral Volume
@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch from 537975d to dbbaa06 Compare May 25, 2024 02:11
@@ -123,6 +127,60 @@ func (pl *VolumeBinding) EventsToRegister() []framework.ClusterEventWithHint {
return events
}

func (pl *VolumeBinding) isSchedulableAfterStorageClassChange(logger klog.Logger, pod *v1.Pod, oldObj, newObj interface{}) (framework.QueueingHint, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to #124961 (comment); if we can consider StorageClass update is not that frequent, it's not worthy for a fine but time-consuming filtering (using PVCLister.PersistentVolumeClaims().Get()).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first step, we should make it minimum, at least until we establish some observability around QHint and could ensure this level of fine filtering doesn't impact a large cluster negatively (#124566).

@bells17 bells17 force-pushed the qhint-volume-binding-storageclass branch from 5acaacf to 843435f Compare May 26, 2024 13:37
@bells17 bells17 requested a review from sanposhiho May 26, 2024 13:48
@bells17
Copy link
Contributor Author

bells17 commented May 26, 2024

@sanposhiho Thank you for your review. I have made the necessary changes, so please take another look.

@bells17
Copy link
Contributor Author

bells17 commented May 31, 2024

/close
#124961 (comment)

@k8s-ci-robot
Copy link
Contributor

@bells17: Closed this PR.

In response to this:

/close
#124961 (comment)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants