KEP-5075: DRA: Consumable Capacity #5104

sunya-ch · 2025-01-30T01:57:59Z

One-line PR description: Enable DRA API and scheduling to support a shared device allocation with consumable capacity.

Issue link: DRA: Consumable Capacity #5075

Other comments:
- WIP forked repo: https://github.com/sunya-ch/kubernetes/tree/kep-5075
- WIP PoC demo: https://github.com/sunya-ch/dra-example-driver/tree/consumable-capacity-poc
- PoC test: https://github.com/sunya-ch/dra-example-driver/tree/consumable-capacity-poc/demo/kep-5075
- Document for technical discussion: https://docs.google.com/document/d/1U0u2uErpYcf-RooPEws5oDMiJ9kT2uDoWNjcWrLH224/edit?usp=sharing

k8s-ci-robot · 2025-01-30T01:58:08Z

Welcome @sunya-ch!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-01-30T01:58:09Z

Hi @sunya-ch. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pohly · 2025-02-04T16:51:30Z

/wg device-management

johnbelamaric

Thanks, this is a good start. After reading this thoroughly, I think we can meet all the use cases with "per-device allocatable resources", and do not need to expose the "device provisioning" aspect to the K8s control plane. I think the driver may still do that, but it's actually not something the scheduler needs to care about.

I think if we repurpose this to just "per-device allocatable resources", we will not only meet these use cases, but we can meet some of the ones for modeling standard resources in DRA.

Please take a look and if you want to meet to discuss it, let me know.

keps/sig-node/5075-dra-dynamic-provisioning-support/kep.yaml

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

johnbelamaric · 2025-02-05T23:46:58Z

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

+-->
+- Introduce an ability to allocating on-demand provisioned devices via DRA. This should cover the use case of macvlan, ipvlan in DRA driver for CNI and virtual accelerator devices with on-demand memory fraction. 
+- Enhance a capability of secondary networks to dynamically allocate secondary networks based on present availabilities such as bandwidth.
+- Leverage the enhancement of capacity consumption from [KEP-4815](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4815-dra-partitionable-devices).


I want to do something distinct from the capacity consumption in there. That consumption is 100% internal to the resource slice, and is about shared resources. In this case, we are talking about consumption by the claims. The reason capacity is separate from attributes was exactly to support this kind of consumption model.

@johnbelamaric I updated the third goal to

- Enable capacity field to be consumable.

johnbelamaric · 2025-02-05T23:58:54Z

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

+required) or even code snippets. If there's any ambiguity about HOW your
+proposal will be implemented, this is the place to discuss them.
+-->
+This enhancement system design introduces the terms `device source`, `provision limit`, `provision request`, and `provisioned device`.  In addition to the existing device instance, `device source` represents the root device or device specifiction as a source to provision or generate a new device allocated to the Pod. The number of provision per `device source` can be limited by the `provision limit`. Device sources are exposed by device vendors or provisioners via Publish API.


Ok, I like this KEP, but I don't think it's necessary to introduce all this machinery. I also am not sure there is a need to "provision" a new device. Instead, I would suggest we can allocate a partial device and share the underlying device. That is, we can move toward "platform mediated sharing of devices", rather than just the current mechanism of pointing multiple containers or pods add the same resource claim ("user mediated sharing of devices").

In platform mediated sharing, we do not share resource claims, but instead we have separate resource claims which hold a partial allocation of the same underlying device from the same resource slice. Each of those resource claims contains the amount of allocation. This is analogous to how resources are consumed by pods on nodes; in this case, resources are consumed by claims on devices.

For some more background see this (outdated) document, and the prototype linked in there.

Consider if the capacity values in a device were considered allocatable. Then, you could have a resource claim for a portion of the device. The scheduler would be able to reconstruct "how much" of the device is consumed by aggregating all resource claims that are attached to that device.

Rather than a specific "provisioning limit" on the number of devices, the limit is hit when there are insufficient resources in capacity to satisfy the claim. This allows the limit to be hit based upon specific resources consumed.

With this, I think in fact we should be able to model "compute devices" that represent the CPU, memory, and other standard node resources. I think this would satisfy the requirements @catblade has for aligning CPU with GPU with NIC, for example.

WDYT? Would you be willing/able to rework this a bit and see if this idea can satisfy your needs? Pretty much, we need to be able to write claims that "allocate" from the capacity fields. In the minimal implementation, we wouldn't actually need a change in the ResourceSlice API, I don't think, only in the ResourceClaim API. In the case of things like virtual interfaces, the name of the created interface could be stored in the ResourceClaim.Status data.

For a more robust implementation, we would want the DeviceCapacity entries in the Capacities to be able to do things like block allocation ("you asked for 1 byte, but I can only allocate in 1Mi increments" - see the doc above). We may also need to be able to model different segments of resources, tagged with different topology, and be able to identify which types of resources are aggregatable across those topologies (for example, "8Gi is in numa0, 8Gi is in numa1 - if you ask for numa alignment and 4Gi, we give you a CPU and memory numa aligned, but if you ask for 10Gi we have to aggregate across the boundary"). Those are harder, maybe we start by writing it up without those, then wait until @catblade complains :)

(by the way, there may well be use cases for "provisioning", I just think the ones listed so far don't need it...if you have more use cases, please do add them)

@johnbelamaric I think your comment is apt, and agree.

In platform mediated sharing, we do not share resource claims, but instead we have separate resource claims which hold a partial allocation of the same underlying device from the same resource slice. Each of those resource claims contains the amount of allocation. This is analogous to how resources are consumed by pods on nodes; in this case, resources are consumed by claims on devices.

I totally agree with this direction.

The objective of this KEP is to distinguish the case when the device which is allocated to the pod is actually a new device provisioned dynamically based on the master device. For instance, macvlan and ipvlan are a new virtual network device (i.e., net1) which will be provisioned, configured, and removed by the cni-dra-driver based on the selected master device (i.e., eth1).

However, at the same time, I agree that we can consider provision limit as one capacity, and leave all provisioned data (specific to the new device) to the resourceclaim status.

Is it possible to have an explicit way to clarify the type of sharing in the ResourceClaims status such as for provisioned device, pre-sliced device like MIG, dynamically-slice like virtual GPU ?

The objective of this KEP is to distinguish the case when the device which is allocated to the pod is actually a new device provisioned dynamically based on the master device. For instance, macvlan and ipvlan are a new virtual network device (i.e., net1) which will be provisioned, configured, and removed by the cni-dra-driver based on the selected master device (i.e., eth1).

While at the actual node layer, this is what is happening, it's not clear to me that the K8s control plane needs to model this. It's not clear to me there needs to be awareness at the ResourceSlice/ResourceClaim level of these devices that are provisioned just to contain the allocation of a shared underlying device. It might be necessary, I am just not 100% sure yet and want to explore the options.

Is it possible to have an explicit way to clarify the type of sharing in the ResourceClaims status such as for provisioned device, pre-sliced device like MIG, dynamically-slice like virtual GPU ?

I think we are probably going to need a way to indicate that a device can be shared by platform-mediated sharing. Earlier version of DRA had a Shareable flag to indicate that a device can be shared by user-mediated sharing, but we decided it was unnecessary (since the user controls this anyway). But if the platform needs to manage it, then the driver author may need to let K8s know if a device is shareable. More to explore!

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

johnbelamaric · 2025-02-07T16:25:43Z

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

+  devices:
+    provisions:
+    - name: net1
+      deviceClassName: vlan-cni.networking.x-k8s.io


I think you'll also need to attach some config to specify the VLAN ID, for example

Yes, the CNI configuration will be also required for ResourceClaim as @LionelJouin's design in cni-dra-driver.
https://github.com/kubernetes-sigs/cni-dra-driver/blob/eb94de8c61835de1f13eefe700de9d9d89615531/docs/design/device-configuration-api.md#design

I will add reference in this KEP too.

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

catblade · 2025-02-07T22:04:26Z

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

+List the specific goals of the KEP. What is it trying to achieve? How will we
+know that this has succeeded?
+-->
+- Introduce an ability to allocating on-demand provisioned devices via DRA. This should cover the use case of macvlan, ipvlan in DRA driver for CNI and virtual accelerator devices with on-demand memory fraction. 


Suggestions for grammar/clarity: Introduce an ability to allocating on-demand provisioned devices via DRA. This should cover the use cases of macvlan or ipvlan in a DRA driver for CNI and virtual accelerator devices with on-demand memory fraction.

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

sunya-ch · 2025-02-12T04:40:39Z

@johnbelamaric @catblade Thank you so much for taking care of this KEP.
I have updated the KEP to address some comments such as filling KEP information which I can fill and applying rewrite suggestion first.
To reduce the number of notification, I have put 🚀 icon to the comment that I think I have addressed. Please consider click resolve button if you are agree.

For the comment related to design change, I will also update the KEP correspondingly next.

sunya-ch · 2025-02-12T06:18:30Z

@johnbelamaric I have tried drafting your suggestion below. Please correct my understanding.

kind: ResourceSlice
...
spec:
  driver: cni.dra.networking.x-k8s.io
  devices:
  - name: eth1
    basic:
      attributes:
        name:
          string: "eth1"
      capacity:
        vlan: # <- define provision limit here
          quantity: 1000 
        bandwidth:
          quantity: 10Gi
---
kind: DeviceClass
metadata:
  name: provisionable.networking.x-k8s.io

kind: ResourceClaim
...
spec:
  devices: # <- move back the provision request to device request
    requests:
    - name: macvlan
      deviceClassName: provisionable.networking.x-k8s.io
      allocationMode: ExactCount
      count: 1
      resources:  # <- add resource in request (like PodCapacityClaim?). This field does not exist yet; however, I think you also have an idea to have this available based on your suggestion. 
      # Discussion point 1: should the following resource requests be per each count? 
        vlan:
          quantity: 1 # <- I guess that you are expect this resource consumption defined in resource claim. 
          #  Discussion point 2: do we need explicit vlan quantity here while it should not be other than 1?
        bandwidth:
          quantity: 1Gi
    config:
    - requests:
      - macvlan
      opaque:
        driver: cni.dra.networking.x-k8s.io
        parameters: # CNIParameters with the GVK, interface name and CNI Config (in YAML format).
          apiVersion: cni.networking.x-k8s.io/v1alpha1
          kind: CNI
          ifName: "net1"
          config:
            cniVersion: 1.0.0
            name: net1
            plugins:
            - type: macvlan
              master: eth0
              mode: bridge
              ipam:
                type: host-local
                ranges:
                - - subnet: 10.10.1.0/24

For the resource claim status, I still would like to propose the way that we can distinguish the way device is shared there. Can we instead add some optional field in the AllocatedDeviceStatus? For example,

type AllocatedDeviceStatus struct {
...
	// ProvisionData contains provisioning-related information specific to the provisioned device.
	//
	// This data may include provisioner-specific information.
	//
	// +optional
       *ProvisionData
}

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md

johnbelamaric · 2025-02-12T17:08:13Z

@johnbelamaric I have tried drafting your suggestion below. Please correct my understanding.

Yes, what you propose in this comment is very much what I am suggesting. And your "discussion" points are definitely things I agree need more discussion and thought.

For the resource claim status, I still would like to propose the way that we can distinguish the way device is shared there. Can we instead add some optional field in the AllocatedDeviceStatus? For example,

Maybe yes? Let's ground it in the specific use cases.

aojea · 2025-02-18T12:07:01Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+    requests:
+    - name: macvlan
+      deviceClassName: vlan-cni.networking.x-k8s.io
+      allocationMode: Shared
+      resources:
+        requests:
+          bandwidth: "1Gi"


I do not understand how is this going to be related to eth1, what if there is eth2 and eth3 too?
config is opaque so we should not depend on opaque values, is what this is suggesting?

User can specify the device via the selection (such as specify device's attribute name = eth1). Additionally, DRA allows pod to dynamically allocate device based on its available resource such as bandwidth.The selected device information will be also sent to the driver then the network driver and configure the selected device.

I assume the idea here is to underspecify when possible, but be more specific if needed. So if there are also eth2 and eth3, then leaving it like this means that any one of eth1-3 will work. If that is not the case, then a selector is needed to further narrow down the eligible devices.

Yes exactly, Morten. We can "schedule" onto any of the underlying eth* devices. If we want a specific one - or one with a specific attribute (like network or trunking) - then that should be specified like any other selector.

aojea · 2025-02-18T12:13:19Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+and its attributes match the request's selectors and constraints.
+"share" refers to amount of resources reserved or allocated to the claim request.
+The newly added `allocatedShare` field in the `DeviceRequestAllocationResult` will be set when the allocation is successful.
+A device can be shared (allocated) to different pods. 


in the same node?

Yes. A device in the same node can be shared to multiple pods. There will be multiple resource claim with the same device allocated but with different resource share.
I will update and add "in the same node" in the sentence for clarification.

I think multi-node resources could have those pods split across nodes. I don't think there is a need to nail it down here.

pohly

My comments so far have been addressed, but I need to do another full pass.

pohly · 2025-06-10T09:57:12Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+    //
+    // +optional
+    // +listType=atomic
+    Options []resource.Quantity


The KEP now lists proposals for the maximum size of slices. Those look good enough for merging the KEP. During the implementation, we need verify that the overall size does not get too large and perhaps do some more tweaks (let's note under test plan). Then before beta we need to collect feedback whether the resulting limits are sufficient for real-world use (covered by beta criteria).

pohly · 2025-06-10T09:58:35Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+
+// CapacitySharingPolicyRange defines a valid range for consumable capacity values.
+//
+// - If the requested amount is less than Minimum, it is rounded up to the Minimum value.


Agreed. Counting usage by device class is very coarse and remains so with this KEP, so neither worse nor better 😐

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

BenTheElder

bah, I thought these review comments were submitted last week >.<

keps/prod-readiness/sig-scheduling/5075.yaml

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

BenTheElder · 2025-06-06T22:05:28Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+-->
+
+Yes, this feature can be disabled once it has been enabled.
+The `allowMultipleAllocations` flag, `sharingPolicy`, `capacityRequests`, and `consumedCapacities` fields will be dropped.


dropped, or ignored?

These fields are dropped by the existing implementing strategy. Should it be ignored instead?

These fields are dropped by the existing implementing strategy. Should it be ignored instead?

There is per-feature flexibility to choose the best option, but often new usages of the fields are disallowed, but existing usages are allowed to remain. that allows cleanup. I think you've anticipated this in the next sentence about status retaining the information.

BenTheElder · 2025-06-06T22:12:06Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+      - Impact of its degraded performance or high-error rates on the feature:
+-->
+
+This feature depends on the DRA structured parameters feature being enabled, and on DRA drivers being deployed.


drivers that support the feature? or just drivers with pre-existing support?
(IIUC the former)

@BenTheElder Yes, the driver must support this feature. I will add the clarification.

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

[PRR was filled out]

dom4ha · 2025-06-11T11:59:25Z

It looks good. Note that the sig-scheduling approval should be already granted.

deads2k · 2025-06-11T20:22:05Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+
+Will consider in the beta timeframe.


While not required for alpha, I do always encourage people to spend time considering the metrics they'll need for debugging early. I suspect the existing metric for successful claims will be useful.

@deads2k Thank you. I added metrics apiserver_request and scheduler_plugin_execution_duration_seconds in the section.

deads2k · 2025-06-11T20:25:41Z

PRR lgtm, approving PRR

/approve

k8s-ci-robot · 2025-06-11T20:25:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, dom4ha, sunya-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-scheduling/OWNERS~~ [dom4ha]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly

Complete pass, including API review.

Some nits, but no blockers. Remaining open questions are all things that we can still decide later during the API review of the implementation.

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

pohly · 2025-06-17T08:57:17Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+  This would move the device from the host into the user's pod, preventing other users from accessing the shareable device.
+
+  **Mitigation:**
+  - Administrators should define clear device classes for shareable and unshareable devices to prevent such misallocations.


Just for my understanding: the user allocated a shared device, but then did something in their pod which makes the device exclusive to the pod?

I don't see how different device classes help here. It does not prevent a malicious user from doing the above, unless there are additional constraints in place, like limiting the quota for the shareable device class to zero for such a user.

This is primarily an issue for the DRA driver managing shareable devices, not this KEP. In this KEP we can assume that a DRA driver enforces proper sharing and that users cannot circumvent this.

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

pohly · 2025-06-17T09:33:06Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+	//
+	// +optional
+	// +featureGate=DRAConsumableCapacity
+	ShareID *string


Nit: indent with the same amount of spaces as elsewhere.

I'm undecided whether we want to enforce this as a hex value. This seems a bit unusual from an API perspective and, as presented here, doesn't specify a maximum length of the string.

Perhaps just go with the usual "must be a DNS label"? Doing it as a short hex string then becomes an implementation detail, without being part of the API.

Let's first clarify that before re-indenting, otherwise this comment thread is immediately "obsolete"...

If this is a random string that correlates to nothing else, and whose only purpose is to be unique, maybe use ShareUID instead? can decide / tweak during impl

This seems a little fuzzy / underspecified currently. I think it's ok to figure out specifics at implementation time and update here, but they do need to be figured out:

format / min-length / max-length restrictions

how uniqueness is validated

whether we just trust writers to generate unique values correctly ... what happens if they don't?

xref: #5104 (comment)

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

pohly · 2025-06-17T16:30:39Z

Status check...

SIG Scheduling approval by @dom4ha: KEP-5075: DRA: Consumable Capacity #5104 (comment), KEP-5075: DRA: Consumable Capacity #5104 (comment)
PRR by @deads2k: KEP-5075: DRA: Consumable Capacity #5104 (comment)
preliminary API review by @liggitt, some more by myself as aspiring API approver: KEP-5075: DRA: Consumable Capacity #5104 (review)
SIG Network folks have commented (@LionelJouin, @aojea).

There are some opens regarding the API:

Those can be revisited during the actual API review of the implementation.

I think the PR is in a good enough shape to merge it. @sunya-ch is traveling this week and I'll be out starting Thursday, so let me propose merging this tomorrow (Wednesday) unless some new concerns are brought up.

/lgtm
/hold

pohly · 2025-06-17T16:34:43Z

/label tide/merge-method-squash

@sunya-ch: I prefer one manually squashed commit and can re-add LGTM if needed in this repo (not all do), but don't worry if you don't get to it.

liggitt · 2025-06-17T16:45:15Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+	// If the allocation result includes a ShareID, the Device field is extended with the ShareID,
+	// formatted as `<device name>/<share id>`.


mixing these into the same field is a little surprising... multiple possible formats / meanings seems like it will make it harder and more error-prone to consume, especially if consumers get used to this always matching the device name currently, then it surprisingly starts appending /<share id> for shared devices in the future

is there a good reason not to have the device name and share id as separate fields here?

We had it as a separate field initially, but it turned out that you were right about warning that separate field as a new key in the status list map.

Encoding more information in an existing string is the lesser evil - not nice conceptually, but it works.

@sunya-ch: this one of those aspects that need to be added as additional source code comment in the types.go (i.e. outside of the doc comment), because that is where future readers will find it when wondering about the API design.

oh, I didn't realize this was an existing field.... then I really don't think we should redefine it to sometimes smush in another value / change the meaning and format ... do existing consumers of this field use the value to look up the device? I would assume they do

we're on the cusp of promoting to v1, this seems really important to pin down... if we can't add new optional fields, AND give them defaults, AND make them key fields to allow for otherwise-duplicate items in []AllocatedDeviceStatus, we're in a sticky spot

did we over-specify the map-list aspects of Devices []AllocatedDeviceStatus in ways we can't evolve? do we need to tweak anything there or is there anything we can tweak?

// +listType=map // +listMapKey=driver // +listMapKey=device // +listMapKey=pool // +featureGate=DRAResourceClaimDeviceStatus Devices []AllocatedDeviceStatus `json:"devices,omitempty" protobuf:"bytes,4,opt,name=devices"`

do existing consumers of this field use the value to look up the device? I would assume they do.

Some consumers will simply show the string as part of the driver/pool/device triplet. Those are fine. It will even look nice if they use the recommended (but not required) "concatenate with slash" format.

Some consumers might try to look up the device in the ResourceSlice or correlate. Those consumers need to be aware of the additional share ID. Primarily these are schedulers. They need to be aware of this feature either way to behave correctly. Most consumers shouldn't need to do this, partly because the ResourceSlice with the device definition isn't even guaranteed to be available.

Finally, DRA drivers need to be aware if they support this feature. Nothing changes for other DRA drivers.

did we over-specify the map-list aspects of Devices []AllocatedDeviceStatus in ways we can't evolve?

Having it as a +listType=map is useful because different DRA drivers own different entries in it and then can use SSA.

That the API machinery behind it doesn't fully support adding a new map key is a bummer. I think we should try to fix this in 1.34. If we can, then the "device" string can remain as it is and "shareID" can be stored as a new field. Downgrades to 1.33 with the extra field will be problematic, but that's "downgrading after using an alpha feature", so it's not guaranteed to work smoothly.

Figuring this out is a blocker for the GA promotion and the implementation PR of this KEP. I'm sufficiently motivated to tackle this 😅 But it's not an area that I know, so help would be needed.

cc @jpbetz

Small correction: Devices []AllocatedDeviceStatus is not going to GA in 1.34 because it is not part of core DRA. It's part of #4817, which remains in beta.

Still important to figure out, of course!

liggitt · 2025-06-17T16:47:49Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+	//
+	// +optional
+	// +featureGate=DRAConsumableCapacity
+	ShareID *string


If this is a random string that correlates to nothing else, and whose only purpose is to be unique, maybe use ShareUID instead? can decide / tweak during impl

This seems a little fuzzy / underspecified currently. I think it's ok to figure out specifics at implementation time and update here, but they do need to be figured out:

format / min-length / max-length restrictions

how uniqueness is validated

whether we just trust writers to generate unique values correctly ... what happens if they don't?

liggitt · 2025-06-17T16:52:48Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+  // making it less reliable than ShareID.
+
+  // ConsumedCapacities tracks the amount of capacity consumed per device as part of the claim request.
+  // The consumed amount may differ from the requested amount: it is rounded up to the nearest valid


may differ from the requested amount

but may not be less than the requested amount?

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

liggitt · 2025-06-17T17:02:24Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+
+// DeviceCapacity describes a quantity associated with a device.
+type DeviceCapacity struct {
+    // Value defines how much of a certain device capacity is available.


non-blocking nit to tweak during impl or in a follow-up: I'd probably drop or rephrase "available" from the doc here, since that is ambiguous whether it is real-time / updated as the device is allocated and then used

liggitt · 2025-06-17T17:15:34Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+//
+// If the requested amount is not listed in the options, it cannot be allocated to this device.
+type CapacitySharingPolicyDiscrete struct {
+    // Options defines a list of additional valid capacity values that can be requested.


Suggested change

// Options defines a list of additional valid capacity values that can be requested.

// Options defines the list of valid capacity values that can be requested.

liggitt · 2025-06-17T17:19:42Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+    // Minimum + chunk size must be less than or equal to the capacity value.
+    //
+    // +optional
+    ChunkSize *resource.Quantity


non-blocking question: is there a more intuitive / standard name for this? Can think about it on the way to implementation, maybe StepSize or UnitSize?

liggitt · 2025-06-17T17:32:28Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+   // DiscreteValues defines a set of acceptable quantity values in consuming requests.
+   //
+   // +optional
+   // +oneOf=ValidSharingValues
+   DiscreteValues *CapacitySharingPolicyDiscrete


non-blocking: on the way to implementation, consider if we need the nesting levels or if this could simplify / flatten to ValidValues []resource.Quantity

liggitt · 2025-06-17T17:33:00Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+   //
+   // +optional
+   // +oneOf=ValidSharingValues
+   ValueRange *CapacitySharingPolicyRange


non-blocking, can think about this as we implement: if we rename the above option to ValidValues, this could rename to ValidRange or something similar

liggitt · 2025-06-17T17:33:51Z

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

+    // Minimum specifies the minimum capacity allowed for a consumption request.
+    //
+    // Minimum must be less than or equal to the capacity value.
+    // Default must be more than or equal to the minimum.
+    //
+    // +required
+    Minimum resource.Quantity
+
+    // Maximum defines the upper limit for capacity that can be requested.
+    //
+    // Maximum must be less than or equal to the capacity value.
+    // Minimum and default must be less than or equal to the maximum.
+    //
+    // +optional
+    Maximum *resource.Quantity


non-blocking, but make a note to decide whether to use Minimum/Maximum or match LimitRange and use Min/Max

keps/sig-scheduling/5075-dra-consumable-capacity/README.md

Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>

pohly · 2025-06-18T13:33:52Z

/lgtm

I'm still inclined to merge this KEP today.

How to add the share ID (#5104 (comment)) is under discussion. We need to figure out the best way forward before merging the implementation.

pohly · 2025-06-18T19:49:00Z

I can't get hold of @liggitt on Slack for confirmation and will be out for the rest of the week now, so I am going ahead and will merge this KEP as it is now. Future discussion points:

/hold cancel

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 30, 2025

sunya-ch marked this pull request as draft January 30, 2025 01:58

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 30, 2025

k8s-ci-robot requested review from dchen1107 and derekwaynecarr January 30, 2025 01:58

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 30, 2025

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 30, 2025

sunya-ch mentioned this pull request Jan 30, 2025

DRA: Consumable Capacity #5075

Open

4 tasks

k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Feb 4, 2025

johnbelamaric reviewed Feb 6, 2025

View reviewed changes

johnbelamaric mentioned this pull request Feb 6, 2025

KEP-4815 DRA Partitionable devices design update #5066

Merged

johnbelamaric reviewed Feb 7, 2025

View reviewed changes

catblade reviewed Feb 7, 2025

View reviewed changes

sunya-ch force-pushed the dra-dynamic-provision branch from 6a57299 to 72f9f82 Compare February 12, 2025 04:34

pohly reviewed Feb 12, 2025

View reviewed changes

keps/sig-node/5075-dra-dynamic-provisioning-support/README.md Outdated Show resolved Hide resolved

sunya-ch force-pushed the dra-dynamic-provision branch from c861fb3 to 32e1c9c Compare February 18, 2025 03:09

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 18, 2025

sunya-ch force-pushed the dra-dynamic-provision branch from 32e1c9c to d0eb088 Compare February 18, 2025 03:11

sunya-ch changed the title ~~KEP-5075: DRA: Dynamic Device Provisioning Support~~ KEP-5075: DRA: Consumable Capacity Feb 18, 2025

sunya-ch force-pushed the dra-dynamic-provision branch from d0eb088 to 3e4b3b4 Compare February 18, 2025 08:58

aojea reviewed Feb 18, 2025

View reviewed changes

pohly reviewed Jun 10, 2025

View reviewed changes

BenTheElder reviewed Jun 10, 2025

View reviewed changes

deads2k reviewed Jun 11, 2025

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2025

pohly reviewed Jun 17, 2025

View reviewed changes

k8s-ci-robot assigned pohly Jun 17, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2025

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 17, 2025

liggitt reviewed Jun 17, 2025

View reviewed changes

sunya-ch force-pushed the dra-dynamic-provision branch from be65f28 to 8bb378b Compare June 18, 2025 06:24

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025

sunya-ch force-pushed the dra-dynamic-provision branch from 8bb378b to a33d33d Compare June 18, 2025 06:47

pohly reviewed Jun 18, 2025

View reviewed changes

keps/sig-scheduling/5075-dra-consumable-capacity/README.md Outdated Show resolved Hide resolved

KEP 5075: DRA Consumable Capacity with allow multiple allocations

dcc2aaa

Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>

sunya-ch force-pushed the dra-dynamic-provision branch from a33d33d to dcc2aaa Compare June 18, 2025 10:54

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 18, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2025

k8s-ci-robot merged commit 52abef7 into kubernetes:master Jun 18, 2025
4 checks passed

github-project-automation bot moved this from Needs Review to Done in SIG Scheduling Jun 18, 2025

pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Jun 23, 2025

sunya-ch mentioned this pull request Jun 25, 2025

DRA: Implementation of Consumable Capacity (KEP-5075) kubernetes/kubernetes#132522

Open

pohly mentioned this pull request Jun 25, 2025

API: support adding new keys to listType=map kubernetes/kubernetes#132524

Open


		###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

		Will consider in the beta timeframe.

		// If the allocation result includes a ShareID, the Device field is extended with the ShareID,
		// formatted as `<device name>/<share id>`.

	// Options defines a list of additional valid capacity values that can be requested.
	// Options defines the list of valid capacity values that can be requested.

KEP-5075: DRA: Consumable Capacity #5104

KEP-5075: DRA: Consumable Capacity #5104

Conversation

sunya-ch commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jan 30, 2025

Uh oh!

k8s-ci-robot commented Jan 30, 2025

Uh oh!

pohly commented Feb 4, 2025

Uh oh!

johnbelamaric left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunya-ch Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sunya-ch commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunya-ch commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

johnbelamaric commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aojea Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohly left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

BenTheElder left a comment

sunya-ch commented Jan 30, 2025 •

edited

Loading

sunya-ch Feb 12, 2025 •

edited

Loading

sunya-ch commented Feb 12, 2025 •

edited

Loading

sunya-ch commented Feb 12, 2025 •

edited

Loading

johnbelamaric commented Feb 12, 2025 •

edited

Loading

aojea Feb 18, 2025 •

edited

Loading