[KEP-5254] DRA: Constraints with CEL #5391

aws-ruhip · 2025-06-10T06:05:40Z

One-line PR description: Add support for new CEL expression constraint in DRA

Issue link: DRA: Constraints with CEL #5254

Other comments:

linux-foundation-easycla · 2025-06-10T06:05:44Z

The committers listed above are authorized under a signed CLA.

✅ login: aws-ruhip / name: Ruhi Prasad (e5a7fb4)

k8s-ci-robot · 2025-06-10T06:05:49Z

Welcome @aws-ruhip!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-06-10T06:05:49Z

Hi @aws-ruhip. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

geetasg · 2025-06-10T13:51:43Z

cc @pohly @johnbelamaric @klueska @dims @mengqiy

dims · 2025-06-10T16:41:54Z

/ok-to-test

k8s-ci-robot · 2025-06-10T17:23:10Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aws-ruhip
Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric, macsko for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-scheduling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2025-06-13T11:18:43Z

/wg device-management

pohly

I haven't looked at the current KEP yet, I only checked whether we have the right reviewers assigned.

pohly · 2025-06-13T14:37:09Z

keps/sig-scheduling/5254-dra-constraints-with-cel/kep.yaml

+  - "@klueska"
+  - "@pohly"
+approvers:
+  - "@alculquicondor" # SIG Scheduling


@dom4ha, @macsko: who of you can take point on this one?

I will be OOO starting from tomorrow, so I cannot promise the review. @macsko will be OOO from Thursday and @sanposhiho may have limited bandwidth as well.

pohly · 2025-06-13T14:37:33Z

keps/sig-scheduling/5254-dra-constraints-with-cel/kep.yaml

+  - "@pohly"
+approvers:
+  - "@alculquicondor" # SIG Scheduling
+  - "@dchen1107" # SIG Node


You can remove SIG node entirely. This does not affect the kubelet at all.

pohly · 2025-06-13T14:38:07Z

keps/sig-scheduling/5254-dra-constraints-with-cel/kep.yaml

+  - "@geetasg"
+owning-sig: sig-scheduling
+participating-sigs:
+  - sig-node


You can remove SIG node entirely. This does not affect the kubelet at all.

pohly · 2025-06-13T14:40:05Z

keps/prod-readiness/sig-scheduling/5254.yaml

@@ -0,0 +1,3 @@
+kep-number: 5254
+alpha:
+  approver: "@johnbelamaric"


I'm not sure whether @johnbelamaric has time - I doubt it.

Let's see who from the PRR reviewers takes this, if it's not too late.

kannon92 · 2025-06-13T15:26:02Z

keps/prod-readiness/sig-scheduling/5254.yaml

@@ -0,0 +1,3 @@
+kep-number: 5254
+alpha:
+  approver: "@johnbelamaric"


Suggested change

approver: "@johnbelamaric"

approver: "@deads2k"

I'm not sure the context but david is assigned already in the PRR board.

deads2k

Some general comments and some PRR comments.

deads2k · 2025-06-16T18:15:55Z

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

+
+The kube-scheduler is responsible for selecting devices to allocate based on this new CELDeviceConstraint. The scheduler already supports selecting devices according to MatchAttribute - this KEP will extend that to also support CELDeviceConstraint.
+
+#### Non-monotonic constraint evaluation 


Does this mean that we have to evaluate the CEL against every possible set of devices to determine if they match?

Yes, the CEL cannot be evaluated until the required number of devices are present in the device set. If they do not match, a new device set will be evaluated. This process will be repeated until a valid device set is found, according to the CEL expression.

deads2k · 2025-06-16T18:17:47Z

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes, through the feature gate. Pods that are already running will continue unaffected. New workloads trying to utilize the feature will not work. Admission will fail due to API error.


kubelet admission or kube-apiserver admission? kube-apiserver admission plugins are optional. Additionally, I don't see any description of admission in the KEP above, so I'm not clear on how this would happen.

Since we are modifying the resource claim template API spec to add this new cel constraint, any workloads trying to use this cel constraint while the feature gate is disabled will see an error like "cel.expression is not a field"

Since we are modifying the resource claim template API spec to add this new cel constraint, any workloads trying to use this cel constraint while the feature gate is disabled will see an error like "cel.expression is not a field"

That's not a common approach and extremely uncommon in the admission stage of the apiserver flow. Can you link to where you're familiar with an existing feature that behaves this way?

@deads2k We will not be relying on admission plugins. We will update the KEP to clarify the feature gate behavior. Thanks for the feedback!

deads2k · 2025-06-16T18:19:42Z

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

+question.
+-->
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?


scheduler throughput and the scheduler latency both seem like valuable metrics here.

deads2k · 2025-06-16T18:20:51Z

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+
+Merely enabling the feature has no resource impact. When in use, resource consumption will scale with the number of device combinations being evaluated, varying based on specific allocation patterns. 


Suggested change

Merely enabling the feature has no resource impact. When in use, resource consumption will scale with the number of device combinations being evaluated, varying based on specific allocation patterns.

Resource consumption will scale with the number of device combinations being evaluated, varying based on specific allocation patterns.

assume the feature is going to be used when writing PRR answers. How will we check the actual cost in a running cluster?

For CEL selectors, we use cost estimation from the cel k8s library: https://github.com/kubernetes/kubernetes/blob/2cc3dbf2250177c60b2967e3e8481ac3f7e33829/staging/src/k8s.io/dynamic-resource-allocation/cel/compile.go#L191

We will leverage the same for CEL constraints

Are you going to expose that as a metric? Also, I remember this as an estimate, not the actual. This new API is going to result in repeated evaluations and seems pertinent to track its (potentially significant) expense.

cc @wojtek-t

Are you going to expose that as a metric? Also, I remember this as an estimate, not the actual. This new API is going to result in repeated evaluations and seems pertinent to track its (potentially significant) expense.

Did we have any metric once we introduced CEL for validation in apiserver?

I don't have any reasonable suggestion from the top of my head:

what we really care is cpu itself, but getting data about how much is used here in particular I think can only be read from profiles

the only thing that comes to my mind is trying to report how many combinations are evaluated (and then use some average cel-evaluation cost to just estimated the cost of it); but that's super hand-wavy

Summarizing a slack thread in here. Permutations take smaller numbers and make really big ones. On a cluster with 10^3 nodes and 10^3 pods using it, an expression that is P(16,8) becomes a potential worst case of 10^14 checks (compared with 10^6 without this change).

On the surface this appears to be a big change. Agreeing to what sort of parameters are good for a benchmark prior and thinking through acceptable thresholds for changes to that benchmark with sig-scalability need to happen. I haven't yet thought of a different way to be comfortable with the PRR scaling questions here.

Also, I remember this as an estimate, not the actual.

CEL can also track actual cost during execution, which is indicative of cpu usage: https://pkg.go.dev/github.com/google/cel-go/cel#EvalDetails.ActualCost

This is used in CEL selector evaluation here: https://github.com/kubernetes/kubernetes/blob/b0b52f4fb29223f829b50ae58323c20a7764dfc9/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L970

We can surface a metric that tracks cost evaluation for cel constraint workloads

cc @pohly @johnbelamaric do we have rough scaling targets for DRA devices currently?

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

macsko · 2025-06-18T08:02:08Z

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

+
+A malicious or buggy workload can specify CEL expressions that degrade the performance of constraint evaluation and scheduling. We will specify a limit on evaluation cost for the expression. There is already a mechanism to cap this today with CEL selectors that we can reuse. [Reference](https://github.com/kubernetes/kubernetes/blob/6188e5cb7b2f106b047493b7b498c1882723cab4/pkg/apis/resource/types.go#L910-L933)
+
+#### Performance at scale


I strongly feel that just adding a CEL expression is a simple solution, but it significantly (even exponentially) increases the complexity of the filtering phase in all scenarios not supported by MatchAttribute. If we could group use cases and implement faster algorithms or, at least, reduce the problem space for them, that would be beneficial.
Also, remember about workload-aware scheduling, which will be introduced in upcoming releases. In some scenarios, it might need to run the filtering multiple times, and relying on really heavy algorithms could completely slow down scheduling.

But, we could leave this to the beta stage or next revision of alpha, so please document it accordingly in the KEP (and say that this has to be revisited before beta).

@macsko Thank you for the review. We will update the KEP as per the feedback. Can you please link issue/info on workload aware scheduling ? cc @aws-ruhip

The issue: kubernetes/kubernetes#132192

macsko · 2025-06-18T08:02:57Z

keps/sig-scheduling/5254-dra-constraints-with-cel/README.md

+that might indicate a serious problem?
+-->
+
+After enabling the feature, if the scheduler_pending_pods metric in the kube-scheduler suddenly increases, then perhaps scheduling no longer works as intended. 


Or when no new pods are created in the cluster, but metric is constant, indicating that no pods have been scheduled.

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Jun 10, 2025

k8s-ci-robot requested a review from dom4ha June 10, 2025 06:05

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 10, 2025

k8s-ci-robot requested a review from macsko June 10, 2025 06:05

github-project-automation bot added this to SIG Scheduling Jun 10, 2025

github-project-automation bot moved this to Needs Triage in SIG Scheduling Jun 10, 2025

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 10, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 10, 2025

aws-ruhip force-pushed the dra-constraints-with-cel branch 2 times, most recently from 409e0f7 to f803450 Compare June 10, 2025 17:23

aws-ruhip force-pushed the dra-constraints-with-cel branch 4 times, most recently from b6c73a0 to a97c185 Compare June 10, 2025 18:53

geetasg mentioned this pull request Jun 11, 2025

DRA: Constraints with CEL #5254

Open

4 tasks

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 12, 2025

geetasg force-pushed the dra-constraints-with-cel branch from e4440e1 to b392c23 Compare June 12, 2025 15:28

k8s-ci-robot removed the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jun 12, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 12, 2025

k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Jun 13, 2025

github-project-automation bot added this to Dynamic Resource Allocation Jun 13, 2025

github-project-automation bot moved this to 🆕 New in Dynamic Resource Allocation Jun 13, 2025

pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jun 13, 2025

pohly reviewed Jun 13, 2025

View reviewed changes

kannon92 reviewed Jun 13, 2025

View reviewed changes

deads2k requested changes Jun 16, 2025

View reviewed changes

github-project-automation bot moved this from Needs Triage to Needs Review in SIG Scheduling Jun 16, 2025

macsko reviewed Jun 17, 2025

View reviewed changes

aws-ruhip force-pushed the dra-constraints-with-cel branch from b392c23 to a10403e Compare June 18, 2025 04:56

macsko reviewed Jun 18, 2025

View reviewed changes

[KEP-5254] DRA: Constraints with CEL

e5a7fb4

aws-ruhip force-pushed the dra-constraints-with-cel branch from a10403e to e5a7fb4 Compare June 18, 2025 19:48


		The kube-scheduler is responsible for selecting devices to allocate based on this new CELDeviceConstraint. The scheduler already supports selecting devices according to MatchAttribute - this KEP will extend that to also support CELDeviceConstraint.

		#### Non-monotonic constraint evaluation


		###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

		Yes, through the feature gate. Pods that are already running will continue unaffected. New workloads trying to utilize the feature will not work. Admission will fail due to API error.


		###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

		Merely enabling the feature has no resource impact. When in use, resource consumption will scale with the number of device combinations being evaluated, varying based on specific allocation patterns.

	Merely enabling the feature has no resource impact. When in use, resource consumption will scale with the number of device combinations being evaluated, varying based on specific allocation patterns.
	Resource consumption will scale with the number of device combinations being evaluated, varying based on specific allocation patterns.


		A malicious or buggy workload can specify CEL expressions that degrade the performance of constraint evaluation and scheduling. We will specify a limit on evaluation cost for the expression. There is already a mechanism to cap this today with CEL selectors that we can reuse. [Reference](https://github.com/kubernetes/kubernetes/blob/6188e5cb7b2f106b047493b7b498c1882723cab4/pkg/apis/resource/types.go#L910-L933)

		#### Performance at scale

[KEP-5254] DRA: Constraints with CEL #5391

Are you sure you want to change the base?

[KEP-5254] DRA: Constraints with CEL #5391

Conversation

aws-ruhip commented Jun 10, 2025

Uh oh!

linux-foundation-easycla bot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 10, 2025

Uh oh!

k8s-ci-robot commented Jun 10, 2025

Uh oh!

geetasg commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dims commented Jun 10, 2025

Uh oh!

k8s-ci-robot commented Jun 10, 2025

Uh oh!

pohly commented Jun 13, 2025

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla bot commented Jun 10, 2025 •

edited

Loading

geetasg commented Jun 10, 2025 •

edited

Loading