-
Notifications
You must be signed in to change notification settings - Fork 1.5k
kep-3695-beta update #5346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
kep-3695-beta update #5346
Conversation
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Welcome @guptaNswati! |
Hi @guptaNswati. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: Kevin Klues <klueska@gmail.com>
/ok-to-test |
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming the existing comments (e.g. KEP template) are addressed
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL to check kubernetes/kubernetes#132028 is relevant for the Get
endpoint.
I think no, this is why I didn't change the impl. of Get
, but worth a look in the context of this work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm we do List all the pods and get this from the client bindings of k8s.io/kubelet/pkg/apis/podresources/v1.GetPodResourcesResponse.GetPodResources but your change is conditional useActivePods:bool
so should not impact this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking. So, if Get
is used to get info about a non running-pod, maybe because of a race or because user error, what would happen? I can think of:
Get
returns the state without error as long as the pod is known to the kubelet (non-deleted). Are the clients of the API ok with that?Get
returns error, likepod not found
or anything equivalent (pod not running
)?
What I'm trying to do here is to prevent us to fix the same bug twice, once inList
, once inGet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dint find anything in issues of dcgm-exporter related to this. Our default metric collection interval is 30 secs, where we call the podresources api
to list any running pods using the GPUs and collects the GPU metrics associated with a given GPU pod. So if a pod is not running or is stale and reporting older metrics within this time interval. I think its okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats a non-running pod in this case? a pod that never started or deleted or had errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats a non-running pod in this case? a pod that never started or deleted or had errors?
A non-running pod is a pod bound to the kubelet (thus not pending
) which is in not in running
phase. In practice, this maps to terminated but non deleted pods, which in turn maps to pods whose restartPolicy
is not always
. Probably the easiest example are pods belonging to kubernetes jobs
.
Turns out this is a relatively easy yet combination to get, yet not too often used. This is basically one of the major reasons why it took us roughly 2 years to figure out
I'm glad you confirmed Get
endpoint is not affected; this is what my research also suggested. Nevertheless, I think we should fully specify in the KEP the semantics about calling Get
with non-running pods as defined above. The reason I'm writing is because last time I looked at the KEP (admittedly some months ago) this was not fully specified. Should be an easy fix and totally in scope for beta stage.
Insufficiently detailed specification bite us pretty bad (kubernetes/kubernetes#119423 (comment) - this quite complex to figure out requiring a lot of ancient history research) and I'm submitting a doc PR/KEP PR fix ASAP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this holds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add this. I ran this pr-test pod which terminated and is in crashloopback. But in my test, List() is also not returning it. Is it an ephemeral bug? I will test a bit more.
$ kubectl get pod | grep pr-test
pr-test 0/1 CrashLoopBackOff 14 (4m39s ago) 52m
$ kubectl describe pod pr-test
Warning BackOff 36s (x225 over 50m) kubelet Back-off restarting failed container busybox in pod pr-test_default(8931c1c9-7bc6-4721-98fe-e6193dc10a33)
$ sudo grpcurl -plaintext -import-path . -import-path vendor -proto staging/src/k8s.io/kubelet/pkg/apis/podresources/v1/api.proto -d '{"podName":"pr-test","podNamespace":"default"}' unix:///var/lib/kubelet/pod-resources/kubelet.sock v1.PodResourcesLister/Get
ERROR:
Code: Unknown
Message: pod pr-test in namespace default not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification, at risk of being redundant. Get
itself and the changes pertaining to this KEP had not bugs reported recently to the best of my knowledge. From that perspective we're good.
But since we discovered a List
bug largely caused by insufficient specification over the cycle of the API, I think that this is a good chance to fully clarify and detail the semantics of Get
(and the changes covered by this KEP) solidifying what is otherwise mostly volatile knowledge. I think this should be quick and easy and document behavior we already have and want, just adding details, so a perfect fit for beta which won't detriment the goal nor cause any slowdown. Again sorry if I'm redundant, just wanted to fully explain my PoV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this clarifies this a bit more.
@soltysh @SergeyKanzhelev what else is needed to get this merged. I see its set to |
Please update to the latest template according to #3695 (comment) My understanding @soltysh haven't stazrted reviews as the metadata was not correct. Now it should be correct, need to work on the document |
It is using the latest template. See Sorry, i am confused if you are referring to something else. |
You need to check the KEP layout of this PR matches the layout in the latest KEP template available. When folks update the KEP template some sections may move/be created/merged/removed, some new PRR questions can arise, and so forth |
As @ffromani mentioned some template sections might have moved or were added. I am relying on this comment: #3695 (comment) that tells the the KEP readme is not following the last template. |
Got it. thanks both. I see i updated the implementation history but that is within the format.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple questions not answered, which are requirement for beta promotion.
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier in the doc:
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
. - Missing links in the integration tests section, see template, and in the e2e section as well, see template. Either of the two is required for beta promotion, and it looks like you had a requirement for e2e during alpha, so I expect those to be completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
.
This was addressed - thank you.
This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this holds.
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. | ||
|
||
#### GA | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further in the document:
- In
###### Are there any tests for feature enablement/disablement?
can you please link to those tests? - In
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
why that question is not applicable, can you explain? - In
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
N/A is not correct. You're using explicit metricspod_resources_endpoint_requests_total
,pod_resources_endpoint_requests_list
andpod_resources_endpoint_requests_get
. What is the SLO for those? - In
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
No is not correct. You're expanding the return types fromGet
call by additional information coming from DRA, so I'd expect that information be expressed in that answer. - In
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
your answer is not one I'd expect. What you're writing there is probably more suited to Troubleshooting or Risks and Mitigations sections. In your particular case I'm expecting a rough estimation if CPU/RAM/disk/IO will be affected by this feature. Based on my understanding you're likely increase the memory consumption to hold the DRA information, no? - You're missing
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
question, from the template - In
###### How does this feature react if the API server and/or etcd is unavailable?
I'm expecting a concrete answer. - In
###### What are other known failure modes?
can you please use the proposed template? IE. Detection -> Mitigations -> Diagnostic, etc. - In
###### What steps should be taken if SLOs are not being met to determine the problem?
you're missing answer, N/A is not acceptable. - What about Drawbacks and alternatives considered? I'm missing both sections, have we not considered any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Most of these have the similar answers in this KEP and this KEP is an extension of it https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
Are these blocking updates?
And for this particular KEP, nothing has changed in the API from Alpha to Beta.
@ffromani where can i find the e2e test. The test PR is this: kubernetes/kubernetes#116846
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see some test here https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-kubernetes-node-kubelet-serial-containerd/1932604229172596736
but that doesnot seem to be the right link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to review on which lane the podresources api (not pod-level resources) tests run, which changed multiple time over the months. Let's find and fix the configuration if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the config in k/test-infra
the tests should indeed run in the lanes
pull-kubernetes-node-kubelet-serial-containerd, but it's high time we add an explicit lane and/or we call out the tests explicitly in some lane. We can do something similar to the somehow confusing pull-kubernetes-node-kubelet-serial-podresources
lane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Most of these have the similar answers in this KEP and this KEP is an extension of it https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
Are these blocking updates?
Yes, those are blocking changes. It doesn't matter the answers are similar, we need them in this document. If they are matching, it's fine to just copy them over.
Signed-off-by: Swati Gupta <swatig@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sig-node review: provisional LGTM. Two outstanding items from my POV.
- (nonblocking, in scope for Beta, "just" extra work) turns out the CI config is obsolete, and the podresources api tests run incidentally. We should add them explicitly (like we do for example for cpumanager) and we should have a presubmit lane (like we do for example for pod level resources). I think it is unfair to account all this work into this KEP, but some of it make sense to be done in the context of the beta graduation. Already begun, PRs linked. We may need to fix or add some e2e tests, but I think this is totally fair in the context of the beta graduation.
- (blocking) clarify the
Get
semantics as discussed. Because of how the API works, there is a TOCTOU unavoidable issue. There is no atomicity or consistency guarantee betweenList
orGet
, nor a concept of a session (which I would push back, btw). So it is possible, albeit probably unlikely, thatGet
is called with stale information (e.g pod becomes terminated between calls). We need to clarify and document the error paths
Once the above are addressed, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my previous comments still hold, and are blocking to merging this KEP.
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
.
This was addressed - thank you.
This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.
@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed, | |||
|
|||
###### Are there any tests for feature enablement/disablement? | |||
|
|||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. | |||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked PR isn't testing feature enablement/disablement, or am I misreading it? The closest place where you test this feature gate is https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/podresources/server_v1_test.go but there you only turn this on, but I don't see the requested on/off test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a on/off test scattered across the existing tests: https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L977 and https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L1066
We can use a PR to make the tests more explicit and some changes are needed if the FG goes to default on: the FG status should be set explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can we make sure this is listed here?
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. | ||
|
||
#### GA | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Most of these have the similar answers in this KEP and this KEP is an extension of it https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
Are these blocking updates?
Yes, those are blocking changes. It doesn't matter the answers are similar, we need them in this document. If they are matching, it's fine to just copy them over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, we need one more update please
To fix a long-standing desync where `List()` would return allocations for terminated pods because it iterated over the full kubelet pod list instead of the resource-manager’s active pods, PR #132028 adds a new `activePodsOnly` boolean flag to the v1 `List` RPC. | ||
|
||
- **Default behavior** (`activePodsOnly=false`): backward-compatible, returns all pods as before. | ||
- **With `activePodsOnly=true`**: filters results to only those pods currently tracked as “active” by the resource managers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would omit this part, is not relevant for this KEP. I will submit a separate update to both the KEP and the docs later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. I was also not sure if this is the right place. But still wanted to add a bit of context of Get() is based on List()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this and the comment right below are addressed (basically once this section is rephrased), LGTM
note that even the current most common usage pattern is List
+ a series of call to Get
, the two calls are independent from each other. Is perfectly legal for a client to call Get
with a namespace+name pair taken in a different way than the output of List
.
**Race and inconsistency between `List()` and `Get()`:** | ||
A pod’s phase may change between a successful `List()` and a subsequent `Get()`. Because there is no session or atomic snapshot between a `List()` and a later `Get()`, it’s possible (though uncommon) for a pod to transition to a non-Running phase in between calls. Clients must therefore be prepared for “stale” or “missing” data and handle the documented error paths when `Get()` is invoked on a pod that is no longer in the Running phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to fully specify the behavior of Get
in this case, or if, because a mistake or a race or any issue, it is fed with inconsistent data.
Currently the code does this:
pod, exist := p.podsProvider.GetPodByName(req.PodNamespace, req.PodName)
if !exist {
metrics.PodResourcesEndpointErrorsGetCount.WithLabelValues("v1").Inc()
return nil, fmt.Errorf("pod %s in namespace %s not found", req.PodName, req.PodNamespace)
}
It seems to me the most logical extension is to take into account the phase of the pod either
- return empty data and swallow the error
- return error like pod not found
The API in general (we determined during the fixing and review process of 132028) wants to return data pertaining to running pods which thus actively hold resources.
Therefore, my take is that Get
should also error out in the unlikely but possible case it is asked about non-running pods.
I'm open to the other approach or to any other sufficiently motivated approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani
How about this:
Get()
returns a valid GetPodResourcesResponse
with its current PodResources
when the pod is in Running
state. But due to some race condition or other inconsistency, it’s possible (though uncommon) for a specified pod to transition to terminated or its Status.Phase
is not Running
, the call must return a NotFound
error.
Do you think, adding an explicit check about the Pod Status.Phase can help? Something like:
pod, exist := p.podsProvider.GetPodByName(req.PodNamespace, req.PodName)
if !exist || pod.Status.Phase != v1.PodRunning {
metrics.PodResourcesEndpointErrorsGetCount.WithLabelValues("v1").Inc()
return nil, status.Errorf(codes.NotFound,
"pod %s in namespace %s not found or not running",
req.PodName, req.PodNamespace)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we may be overthinking this. As you pointed out, its possible that Get() mostly (not 100%) returns an error when asked about a non running pod. I think just calling it out in the spec should be enough. Digging bit deeper in the code, Get() should not report a removed pod except a rare race condition https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L198
Your fix in List() to filterOutInactivePods() make sense https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1065
If you still think, its better to improve Get(), we can similar check like I proposed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I surely don't want to overcomplicate. The problem here is the insufficient validation and lack of checking, albeit in a corner case. TL;DR: we need to decide what Get
does when the phase is not Running
. A one-liner is sufficient for me, but it has to be there :)
e.g.
"Get() never fails if the kubelet knows about the pods" (but note this likely misreports resources of terminated pods)
"Get() returns error if the pod is known to the kubelet, but is terminated"
that's it.
@soltysh PTAL, i tried to answers these #5346 (comment). If these are satisfactory, then the only pending items:
|
Signed-off-by: Swati Gupta <swatig@nvidia.com>
Signed-off-by: Swati Gupta <swatig@nvidia.com>
a804a04
to
ae728ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
sig-node review: my comment about Get
was addressed. We may need to review for the nect cycle (beta2? GA?), but the current changes are sufficient.
my 2c about e2e tests (k/k/test/e2e_node/podresources_test.go):
My PR kubernetes/kubernetes#132028 does a bit of cleanup in the e2e tests so it can be wise to wait for it to merge (it's a fix we need anyway) and base the work on top of it Fixing the test-infra lane to better handle the e2e tests is already ongoing: kubernetes/kubernetes#132345 and can proceed in parallel My understanding is that doing all the above is fair in the same cycle on which we attempt to graduate to beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still answers missing inside the document itself, but since they were provided as comments, I'm willing to conditionally approve as is. Please make sure to either add the missing links before merging this PR, or shortly after.
/approve
the PRR section
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this clarifies this a bit more.
@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed, | |||
|
|||
###### Are there any tests for feature enablement/disablement? | |||
|
|||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. | |||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can we make sure this is listed here?
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: guptaNswati, soltysh The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks @soltysh. Given the holiday in the US today I don't think @guptaNswati will have time to make the update before tomorrow. I will work with her early next week to ensure all of the information in the comments gets incorporated. @mrunalp, @haircommander, @SergeyKanzhelev can we please get a sig-node approval to move this forward. Thanks! |
I think they are all based in the US. I'd be supporting (FWIW) an exception if the KEP slips just because there is no time left for an approval review because of the holidays. |
One-line PR description: Update KEP to prepare for beta in 1.34
Issue link: DRA: Extend PodResources to include resources from Dynamic Resource Allocation #3695
/wg device-management
/assign @moshe010
/assign @klueska
/assign @@johnbelamaric for PRR