KEP-5229: Asynchronous API calls during scheduling #5249

macsko · 2025-04-16T15:17:26Z

One-line PR description: Add KEP-5229

Issue link: Asynchronous API calls during scheduling #5229

Other comments:

macsko · 2025-04-16T15:19:39Z

I have written some possible approaches to these API calls to start the discussion. I will not be visibly active for the next three weeks, but feel free to comment.

macsko · 2025-04-16T15:21:44Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+- `nominatedNodeName` scenario support would require more effort in (1.1) or (1.2).
+
+
+#### 2.2: Make the API calls queued


I feel like queueing API calls might be a good direction long term and probably won't be that hard to implement.

sanposhiho

Great summary! I'll take time reading through it and put some comments after that.

keps/prod-readiness/sig-scheduling/5229.yaml

wojtek-t · 2025-04-25T13:26:32Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+This would require implementing advanced logic for queueing API calls in the kube-scheduler and migrating **all** pod-based API calls done during scheduling to this method,
+potentially including the binding API call. The new component should be able to resolve any conflicts in the incoming API calls as well as parallelize them properly,
+e.g., don't parallelize two updates of the same pod. This requires [making the API calls queued](#22-make-the-api-calls-queued) or
+[sending API calls through a kube-scheduler's cache](#23-send-api-calls-through-a-kube-schedulers-cache) to be implemented.


Conceptually what you need here is something pretty similar to DeltaFIFO we use in client-go:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/delta_fifo.go

We wouldn't operate on object, but on their updates, but that's conceptually exactly what you need:

tracking changes to a given object all together

a concept of deduping (if I already scheduled a pod but didn't yet sent the failures, don't send them; if I have multiple failures not yet sent to report, is the last one only valid?, etc.)

Sounds promising, but after a brief look it's not super clear to me how it works and whether it addresses all the problems.

The logic on the scheduler side may still be quite complex depending on which type of update is pending. For instance I suspect some of them may be blocking re-scheduling, but some not.

I think my comment could have been misleading.
It was supposed to be a meta-comment - i don't think we can really reuse any code from DeltaFIFO.
What I was trying to say is that it's effectively the same conceptual pattern that delta-fifo is using.

The logic on the scheduler side may still be quite complex depending on which type of update is pending. For instance I suspect some of them may be blocking re-scheduling, but some not.

That's interesting - in particular:

what are the examples where an API call should prevent further attempts to do something with a given pod?
[The only one I can see is preemption, but it's not even for a given pod, so it doesn't belong to this category]

is there any case when if I have queued two different API calls for a given pod P - we actually want to send both, not just the last one?

what are the examples where an API call should prevent further attempts to do something with a given pod?
[The only one I can see is preemption, but it's not even for a given pod, so it doesn't belong to this category]

is there any case when if I have queued two different API calls for a given pod P - we actually want to send both, not just the last one?

Don't know the answer out of my head, but it's not obvious to me yet that all updates can be skipped and we don't need to wait for updates to be persisted. That's why adding an analysis section and categorization of the API calls is needed.

One example comes to my mind is preemption of just-bound pod. It would be safer to let the binding call happen first before we could start preemption process, to avoid a situation we preempt something not bound yet.

One example comes to my mind is preemption of just-bound pod. It would be safer to let the binding call happen first before we could start preemption process, to avoid a situation we preempt something not bound yet.

That an interesting scenario - I just don't think that we should let the binding there. We probably need smarter deduping logic.
If we know that we will preempt the pod - there is no point in doing the binding. We need to think what is means from the "pre-binding" perspective (how to ensure we will not leak something), but probably directly moving the pod to a failed state seems better, than letting kubelet start it and bring it down immediately after.

One example comes to my mind is preemption of just-bound pod. It would be safer to let the binding call happen first before we could start preemption process, to avoid a situation we preempt something not bound yet.

I guess it actually happens today? Because we're running the binding cycle asynchronously.
Let's say pod-1 goes to the binding cycle (after "assume"), pod-2 triggers the preemption and then pod-2 could preempt pod-1 before pod-1's binding cycle completes.
And, what happens today (I guess) is, because pod-1 is deleted, the binding cycle for pod-1 would be just failing.

But, that is a good example showing the difficulty of this KEP: Possibly, not only API calls for the same object, but also API calls for different objects could be depending on one another...

what are the examples where an API call should prevent further attempts to do something with a given pod?

I haven't found any such API call. Even preemption should not be a problem.

Another thing to consider is how to update the Pod's status in the scheduler's memory. Now, since the API calls are blocking, we don't need to persist the status change before the call, because the event handlers will soon receive the Pod/Update event with the current status and update the Pod object there.

is there any case when if I have queued two different API calls for a given pod P - we actually want to send both, not just the last one?

We have 3 kinds of API calls for a Pod in scheduler: deletion (preemption), binding and status update (unschedulable or NNN). There is no such case where we would like to send two of these calls for a single Pod (see API calls categorization section).

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/kep.yaml

dom4ha · 2025-05-08T00:53:59Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+Making one universal approach of handling API calls in the kube-scheduler could allow these calls to be consistent, as well as better controlling
+the number of dispatched goroutines. Asynchronous preemption could also be migrated to this approach.
+
+### Goals


Looking from the high level, I think we may have the following goals (although it's not clear we're able to achieve all of them):

Make scheduling cycle free of blocking api-calls (any async option is fine)

Skip some type of updates if they soon become irrelevant by consecutive updates

Prioritize high importance updates (binding) over low importance ones if updates to the api-server gets throttled

Maybe we have more, but I think agreeing first on our goals is important before we can select a solution. One of my doubts here is that we may not know yet all the requirement we may have once we start designing workload-aware scheduling and reservations. More api calls may appear at that time.

+1 to it, but I think there is also a question about priorities. In my mental model:

eliminate blocking calls is P0

skip soon-to-be irrelevant updates is P1

prioritization of updates - is nice to have [I would like that but not necessary if that adds a lot of complexity]

I'm feeling like the third one is the next step, and we can just scope it out for now, at least from this KEP. Of course, we should keep it in mind when discussing this KEP though.
I agree that all of three are important, but it's just that we don't have to solve all of them at once on one KEP here; the third one looks complex enough to deserve being discussed on another KEP.

which seems to be roughly aligned with what I wrote above

Agree, my with the above, the order was also in purpose.

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

… the goals

macsko · 2025-05-20T08:58:21Z

I added a section with API calls categorization and adjusted the goals. PTAL

To go further, we could probably agree on some high-level concept (queuing without blocking a pod?).

wojtek-t · 2025-05-28T07:50:13Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+  if the newest status is stored in-memory.
+- API calls for non-Pod resources (5 - 10) should be further analyzed as they are not likely to consider the Pod-based API calls,
+  hence implementing those shouldn't block making (1 - 3) calls asynchronous.
+


Is there any relevance order between calls for different pods?
I think there isn't (i.e. we don't require NNN to be set before preempting the first victim).

But even if there isn't - it would make sense to call it out explicitly here.

Is there any relevance order between calls for different pods?

I think there is because we need to protect the space freed up by preempted pods for the nominated pod. Sure, pods don't disapear immediately, but we shouldn't rely on it. That obviously puts in question whether we can priorities some updates over the others... which could change the order.

Alternatively we may still want to treat a preemption as one async operation that includes setting nomination and removing pods in the right sequence.

because we need to protect the space freed up by preempted pods for the nominated pod.

But, we store NNN to the nominator when we update NNN. i.e., even if we make the API call async, as long as we report NNN to the nominator synchronously, the space should be reserved, and not stealed by other pods' scheduling cycles.
So, I believe we can make this NNN API call in the preemption asynchronous, simply.

Or, if we go with the scheduler cache idea, we don't have to be worried at all about that kind of scenario. Any updates are reflected to the scheduler cache synchronously, and things that happen later can immediately refer to the update on the scheduler cache (regardless of whether API call is asynchronously done or not

Unless scheduler restarts in the meantime and we lose the in-memory state.

Alternatively we may still want to treat a preemption as one async operation that includes setting nomination and removing pods in the right sequence.

which effectively introduces the ordering
The question is whether in-memory state is good enough, at least initially.

I referred only to the ordering of api calls, not cache updates, which in turn can become fully synchronous again (we achieve the non-blocking goal by async api calls only).

Is there any relevance order between calls for different pods?

I'd say there isn't. Any two API calls, but for different pods made by scheduler are independent and I don't see any counter-example.

I believe we can make this NNN API call in the preemption asynchronous, simply.

That's right.

The question is whether in-memory state is good enough, at least initially.

WDYM by "initially" here?

wojtek-t · 2025-05-28T07:52:40Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+- Updating Pod status (1, 2) could be less important and called if there is space for it.
+  It's worth considering if setting `nominatedNodeName` (3) should have the same priority or higher,
+  because the higher delay might affect other components like Cluster Autoscaler.
+- API calls for non-Pod resources (5 - 10) could be analyzed case by case, but are likely less important than (5) and (4).


nit: 6-10

Also - I'm not sure I agree with it. As an example - volume binding is on a critical path for pod binding - so delaying volume binding delays pod binding - which affects overall pod startup. Which is probably what we should actually optimize for.

+1. I actually rather think API calls for non-Pod resources are likely equally important to 4 and 5 (at least for the current default scheduler) because, like described in 6-10 above, most of them are in the binding cycles, which are on the critical path, or PostFilter (similar to preemption - making space for pending pods).

But, the difficulty here is that we cannot just say "the API calls for non pods are likely important -> let's prioritize them" because obviously there'll be exceptions especially when it comes to the scheduler with custom plugins.
So, I basically agree with the KEP saying that this is case by case actually.

I guess it leads to another question: should we consider API calls being made within binding cycles? They're already async, so can we just treat them as a single async group for multiple API calls? And, in this KEP, we only need to add a canceling functionality in the binding cycles.

I guess it leads to another question: should we consider API calls being made within binding cycles?

Maybe I misunderstood your intention here, but if we don't consider them, we will face the races due to these not being coordinated with others. If the pod was earlier unschedulable, but the call was queued for too long, we need to cancel that once binding starts. So we need coordination between these. From that perspective, having a single place to handle everything sounds better.

I guess it leads to another question: should we consider API calls being made within binding cycles?

I think we need to consider pod-based API calls (status update, binding, preemption) to be able to effectively cancel/skip them. Another aspect is the 6 - 10 calls, which I think don't need be considered, at least initially. The only concern might be the DRA's PostFilter, which can make blocking API call (but probably we would need then to make async API calls for all ResourceClaim calls because of similar reasons to Pod ones).

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

wojtek-t · 2025-05-28T07:59:41Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+- Simplifies introducing new API calls to the kube-scheduler if the collision handling logic is configured correctly.
+
+Cons:
+- Requires implementing complex, advanced queueing logic.


I would say it's debatable how complex that really is - my subjective claim is that it's actually not that complex.

Really depends on what we're doing with this queueing logic. If it's just going to be the queue, obviously simple.
But, like mentioned at 2.2, if we are going to merge multiple operations into one, things will be getting complicated.

wojtek-t · 2025-05-28T08:01:59Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+- Cannot be used for the `nominatedNodeName` scenario, requiring additional effort and separate handling.
+
+
+#### 1.3: Use advanced queue and don't block the pod from being scheduled in the meantime


I'm personally strongly for this option - because it allows to handle all calls the same way and optimizes for latency.
I think the additional paid upfront complexity (that btw will be well encapsulated to a single place) it a cost that we should be willing to pay for the fact that it becomes unified and simple to use.

I also agree that this should be the way to go. If we reach the consensus here, I'd move other options to the Alternatives considered and present one proposal.

wojtek-t · 2025-05-28T08:06:08Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+- `nominatedNodeName` scenario support would require more effort in (1.1) or (1.2).
+
+
+#### 2.2: Make the API calls queued


I believe that this is should be path forward - the primary reason against 2.3 is that it allows to encapsulate the logic into a single place and the burden is not spread across different (arguable much harder to reason about) places. And allows for avoiding races and optimizations (compared to 2.1)

On the other hand how to handle external changes? The obvious case is disappearance of a pod that was supposed to be preempted. It's probably quite easy case, as we can skip our update, but what if nominated node name changes once we allow external actors to change it?

Things may change behind the scene - so there might be races. But for a particular object relying on optimistic concurrency should just work.

If you're worries about scheduler preempting something and external actor changing NNN in the meantime - that has an inherent race in it no matter what we do - we may try to minimize the window, but it will always be there and I'm not sure I would actually try to optimize for it.

Update - after thinking about it I changed my mind and we may prefer 2.3 actually: #5249 (comment)

dom4ha · 2025-05-28T00:57:00Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+Both of the above API calls could be migrated to the new mechanism.
+
+In-tree plugins' operations that involve non-pod API calls during scheduling and could be made asynchronous,
+but not necessarily in the first place:


I think we may want to consider them from the very beginning, because when canceling pod binding, we probably want to cancel any pre-binding calls as well.

Moreover, we need to consider changes planned in the Extended Resource KEP, so it will be additional challenge how to put them together.

because when canceling pod binding, we probably want to cancel any pre-binding calls as well

When do we cancel pod binding?

Never mind, I now can guess you meant ↓ described later in this kep.

Pod deletion caused by preemption (4) should cancel all Pod-based API calls for such a Pod.

I think we may want to consider them from the very beginning, because when canceling pod binding, we probably want to cancel any pre-binding calls as well.

Yes, that could be an optimization, but not necessary.

Btw, now when the pod Y is in the PreBind stage and pod X wants to preempt pod Y, pod Y delete API call is sent to the apiserver. If pod Y would be on WaitOnPermit, preemption just cancel it, without sending the API call. Shouldn't we optimize similarly for the PreBind, given that the number of use cases for PreBind (DRA) is growing? Do we have any reason against it?

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

dom4ha · 2025-05-28T14:52:46Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+  if the newest status is stored in-memory.
+- API calls for non-Pod resources (5 - 10) should be further analyzed as they are not likely to consider the Pod-based API calls,
+  hence implementing those shouldn't block making (1 - 3) calls asynchronous.
+


Is there any relevance order between calls for different pods?

I think there is because we need to protect the space freed up by preempted pods for the nominated pod. Sure, pods don't disapear immediately, but we shouldn't rely on it. That obviously puts in question whether we can priorities some updates over the others... which could change the order.

Alternatively we may still want to treat a preemption as one async operation that includes setting nomination and removing pods in the right sequence.

dom4ha

Overall looks good. I think the proposal description is mainly missing a statement that some procedures will be still a separate go routines with ability to wait for the API calls to finish. This way we don't have to implement dependencies between those calls which depend on each other.

We should also make it clear which option we want to follow. IIUC option C is the one that is the most reasonable one and easy to implement.

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

dom4ha · 2025-06-16T14:05:44Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+
+This proposal could be implemented as a second step extension of proposal A.
+
+### Summary of API call management


I think it's worth to describe what happens on update failures, either due to conflicts or just reaching the max number or retries. For instance what happens when binding fails. Is it expected that a go routine waits for binding completion and reacts to a failure? Does the failed updates for which nobody waits, are ignored (like in status updates)?

Added the examples to Retrying API calls section

macsko · 2025-06-16T14:49:55Z

/label tide/merge-method-squash

wojtek-t

Just some minor comments from me - other than that PRR lgtm.

wojtek-t · 2025-06-17T09:19:25Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+and if it's not expected, custom logic could always be added using an `APICall` interface.
+
+However, to support other potential use cases and have the newest object possible in the cache (proposals B and C, and optionally A),
+merging the object received by event handlers with API call details should also be added.


Note that merging with objects received from event handlers isn't a full solution. Given those are eventually consistent, there will always be a period of time, when the object was already changed and is not yet observed by the cache.

I assume our update will fail due to such a conflict, so this is why we should describe how to react on this and other types of failures.

Agree - but propagating objects from handlers isn't the full solution - this may still happen given their eventually consistent nature.

Yes, I know that there will be such window, when we would still have to merge it, even after executing the call.

But, I'm now wondering, what if we execute the update API call and just write the returned updated object to the cache? Then, when we process the updates in event handlers, we could ignore those, that have an older ResourceVersion (as we have a newer object already in the cache).

Yes - we should do that.
But still that isn't a full solution - we can still face conflicts etc. that we need to handle.

What conflicts do you mean? If we make an update based on an older object?

Yeah. I mean that no matter what we do, the actual version committed in etcd can be different than what scheduler thinks it is.

Yes, the scheduler could indeed operate on an older version of an object, but we face similar problems with the current implementation. The key is to ensure this is considered when handling an API call. However, it doesn't have any impact on current (Pod-based) API calls that we (will) made by the scheduler.

One potential conflict that we should not ignore is when a pod got assigned manually by a user. I think that we should have some rules which fields scheduler could blindly override and which requires pulling them in first and issue rescheduling of the conflicting changes. NodeName (assignment) is one of them, but possibly there are others like ReservedFor in ResourceClaims.

I think that we should have some rules which fields scheduler could blindly override and which requires pulling them in first and issue rescheduling of the conflicting changes. NodeName (assignment) is one of them

I don't feel we have much to do with that. For the actual use cases, overriding will be enough (maybe with handling the external binding/deletion separately). We could defer this discussion to the implementation phase and see what will be exactly needed then.

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

wojtek-t · 2025-06-17T09:22:16Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+#### Enqueueing an API call while a previous one is in-flight
+
+One other possible scenario is when an API call is executing (is in-flight) and a new API call for the same object wants to be added.
+If both have the same call type, standard merging logic could be used, i.e., merge the new API call with the API call in flight.


I don't understand this - if the old one is already in-flight, how we can merge those?

We need to keep the details of the API call in the map, even if the call is in-flight.

Let's say a status update API call is in-flight and wants to set the Pod condition PodScheduled to false with reason foo.
If then the next status update is queued for the same Pod, it can verify via merge if it actually changes anything. If it doesn't, it can be skipped. If it does, the merge should ignore changes that are duplicated with a call that is in-flight.
However, skip is a more important use case.

If it does, the merge should ignore changes that are duplicated with a call that is in-flight.

This assumes that the request succeeds (or if not - will be retried).

But anyway - I think I get your point - the wording is misleading though.
It's not really the we merge the new incoming request with the old one. It's that we adjust the new incoming one to reflect changes that are already in flight.

It's a phrasing issue - merging for me means that we combine these two into a single one. Which is not what we're doing.

or if not - will be retried.

I'd say that this assumes that the request failure will be handled, if needed, as explained in Retrying API calls section.

It's not really the we merge the new incoming request with the old one. It's that we adjust the new incoming one to reflect changes that are already in flight.

Yes, that't right. Probably the wording is not that precise, as you said. I'll refine it to make it more clear.

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

wojtek-t · 2025-06-17T13:46:04Z

This looks fine from PRR perspective.

/approve PRR

dom4ha · 2025-06-17T14:58:54Z

/approve

k8s-ci-robot · 2025-06-17T14:59:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dom4ha, macsko, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-scheduling/OWNERS~~ [dom4ha,macsko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dom4ha · 2025-06-17T14:59:30Z

/hold
/lgtm

sanposhiho

Left some comments, but LGTM overall.
I have several unknowns around APIQueue, but I agree the overall direction and we can clarify such details at the implementation part.

sanposhiho · 2025-06-17T23:27:59Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+This might be a good place to talk about core concepts and how they relate.
+-->
+
+### Risks and Mitigations


We should consider "custom plugins might start to use this for use cases that we haven't considered" as a risk. There could be many questions like "what if (custom) plugins do this".
We need to clarify and document a solid recommendation for plugin developers to guide them properly: what kind of API calls are supported gracefully by this async API mechanism, and what API calls are not supported, or don't make sense.
Or, at the first step, maybe we even can start off saying "we don't yet support custom plugins", until we're fully ready.

Good point, I added the section about it. I think we could partially support these plugins in the first version, but we have to carefully document allowed use cases.

sanposhiho · 2025-06-17T23:31:26Z

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

+
+// APICall describes the API call to be made and store all required data to make the call,
+// e.g. fields that should be updated or object to be added/removed.
+type APICall interface {


So, should all the API calls for the same APICallType use the same APICall implementation? What if two different APICall implementations are used to make an API call for a certain APICallType? How would/should APIQueue behave, how are those two Merge()ed? or will we somehow try to prevent such situation in the first place?

Yes, we start with an assumption that we have one implementation per a CallType, but we might consider extending this behavior in the future. Added it explicitly

At the implementation, we need:

how to prevent two different APICall are used for the same APICallType.

how to replace a certain APICall that everyone is using. e.g., custom plugins may want to change how status_update is handled and hence the status_update's APICall implementation that is used by everyone.

meaning, no one should directly depend on a specific APICall implementation. Everyone should use each APICall implementation indirectly so that we can change it.

I guess APICall implementations for each type should be stored properly in a single place that is accessible to everyone.

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md

wojtek-t · 2025-06-18T08:13:31Z

/lgtm

Unholding based on #5249 (review)

/hold cancel

KEP-5229: Asynchronous API calls during scheduling

9272ef0

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 16, 2025

k8s-ci-robot requested review from alculquicondor and jeremyrickard April 16, 2025 15:17

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Apr 16, 2025

github-project-automation bot added this to SIG Scheduling Apr 16, 2025

github-project-automation bot moved this to Needs Triage in SIG Scheduling Apr 16, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 16, 2025

k8s-ci-robot requested review from dom4ha and sanposhiho April 16, 2025 15:19

macsko commented Apr 16, 2025

View reviewed changes

sanposhiho reviewed Apr 16, 2025

View reviewed changes

wojtek-t reviewed Apr 25, 2025

View reviewed changes

wojtek-t self-assigned this Apr 25, 2025

alculquicondor reviewed Apr 25, 2025

View reviewed changes

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/kep.yaml Outdated Show resolved Hide resolved

dom4ha mentioned this pull request May 7, 2025

kep-5278: nominated node name for an expected pod placement #5287

Merged

dom4ha reviewed May 8, 2025

View reviewed changes

keps/sig-scheduling/5229-asynchronous-api-calls-during-scheduling/README.md Outdated Show resolved Hide resolved

dom4ha mentioned this pull request May 14, 2025

KEP-5055: DRA: admin-controlled attributes and device taints in 1.33 #5277

Merged

Change KEP approvers, add API calls categorization section and adjust…

3ba17f2

… the goals

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 20, 2025

Extend API calls categorization section

9d6365d

sanposhiho mentioned this pull request May 22, 2025

Implicit tolerations #5282

Open

4 tasks

wojtek-t mentioned this pull request May 28, 2025

Add KEP for DRA: Extended Resource #5136

Merged

wojtek-t reviewed May 28, 2025

View reviewed changes

dom4ha reviewed May 28, 2025

View reviewed changes

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 9, 2025

Update toc

6f6bfc1

macsko requested review from dom4ha and sanposhiho June 10, 2025 07:11

Add risks section and apply small fixes

5e6af5d

dom4ha reviewed Jun 16, 2025

View reviewed changes

Refine some sections and apply the comments

cfd49b3

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 16, 2025

wojtek-t reviewed Jun 17, 2025

View reviewed changes

macsko added 2 commits June 17, 2025 12:25

Apply comments and move two proposals to alternatives

fb465d2

Improve wording

f05fed3

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2025

k8s-ci-robot assigned dom4ha Jun 17, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2025

sanposhiho reviewed Jun 18, 2025

View reviewed changes

Apply comments around custom plugins

cc3e597

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025

Update toc

b0bba8a

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 18, 2025

k8s-ci-robot merged commit 3d9a053 into kubernetes:master Jun 18, 2025
4 checks passed

k8s-ci-robot added this to the v1.34 milestone Jun 18, 2025

github-project-automation bot moved this from Needs Triage to Done in SIG Scheduling Jun 18, 2025

		- `nominatedNodeName` scenario support would require more effort in (1.1) or (1.2).


		#### 2.2: Make the API calls queued

		- Cannot be used for the `nominatedNodeName` scenario, requiring additional effort and separate handling.


		#### 1.3: Use advanced queue and don't block the pod from being scheduled in the meantime


		This proposal could be implemented as a second step extension of proposal A.

		### Summary of API call management

KEP-5229: Asynchronous API calls during scheduling #5249

KEP-5229: Asynchronous API calls during scheduling #5249

Conversation

macsko commented Apr 16, 2025

Uh oh!

macsko commented Apr 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macsko May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

macsko commented May 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macsko May 20, 2025 •

edited

Loading

sanposhiho May 28, 2025 •

edited

Loading

sanposhiho May 28, 2025 •

edited

Loading

sanposhiho May 28, 2025 •

edited

Loading

sanposhiho May 28, 2025 •

edited

Loading