-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Target NodeSwap GA in v1.34 #5387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
iholder101
commented
Jun 8, 2025
- One-line PR description: Target NodeSwap GA in v1.34.
- Issue link: Node memory swap support #2400.
- Other comments:
Skipping CI for Draft Pull Request. |
/hold I believe the ability to disable swap per-Pod is a GA blocker with this KEP. As discussed in a few meetings, one of the requested feature for the node-level swap enablement is an ability to disable swap per-Pod. We already offer a few cases when swap is not enabled in this KEP. However, there are still cases when customer may want to opt-out from swap. Moreover, if this is a security requirement, the ability to disable swap on Pod must be done per-Pod using mechanisms like admission web hooks. Some alternatives were discussed:
There might be other alternatives, but in my opinion, they ultimately not worth the effort, keeping in ming these great benefits of enable per-Pod swap. So the proposal is to assume that pod without any value configured is non-swappable. And allow to either enable
As for the timeline, I would argue that introducing this field may keep the pod in beta for another release and then GA after validation. @thockin to keep me honest on the requirements here. |
I'm having difficulty putting the threads together into a single picture. Can you clarify the question?
This is certainly the SAFE answer |
So the #2400 was in beta for a while. And the graduation criteria we set is to have an API for future per-Pod swap approved. So we know that we have a way forward. I was advocating for including per-Pod enablemend/disablement in #2400 and not wait for the full support of per-Pod swap enablement, limits, etc. And I am thinking that adding a way to enable/disable swap per-Pod is not resetting #2400 to alpha. And it can progress to GA soon as long as we can demonstrate it was used. Just wanted to confirm that this API addition is not resetting the KEP to alpha. |
we are on the same page. I think disabling/enabling per-Pod will be an adoption blocker for this feature for many users. |
Further to our conversation offline, there had seemed to be a miscommunication in sig-node meetings. I'm sorry about that. However, I just want to re-iterate this KEP's scope, from the merged KEP design:
As can be seen, we've agreed and merged to the design to focus on basic swap enablement with no pod-level API changes. This is mentioned specifically on the KEP and captures most of the Since (after the KEP design was merged already..) folks were insisting on providing clarity for the pod-level APIs, I've opened #5359, although it's clearly not part of the KEP's scope AFAICT. |
From slack from @iholder101:
Can you please clarify why it will be unusable for people who wants swap? Is it because they do not own the app? Or it is too much of a burden to install the webhook? I feel like there may be some confirmation bias when asking somebody who wants swap. People who want swap would wish for minimal friction enablement. When people who doesn't want swap will now need to deal with labels, taints, etc. I think the opposite should be better when people who doesn't want swap should spend less effort than people who wants it |
IMHO, for swap as a whole feature, ideal default would be 'disabled' by default. Most production use-cases prefer predictability, and when pods don't explicitly declare swap preferences, shouldn't cause runtime uncertainty. but unfortunately it would break the (implicit) behavior for existing users. The trade-off question is API consistency vs not disrupting existing users. |
I want to stress again that, AFAICT, the vast majority of app owners do not want to use swap, nor are they trying to avoid using it. Usually, they just don't care, and treat it as a node-level admin configuration that should be dealt with by someone else. Just like with regular Linux processes, in rare cases, app owners know they want to opt-out, and processes can ask the kernel to opt-out of swap in such cases, but a process never opts-in or asks to use swap. I also remind that the whole swap feature is opt-in, on a node-level basis (just like with any Linux host).
Are we talking short or long term? which is more important? As I've emphasized here, the whole idea of this KEP was to enable basic swap while opening the door for further extensions, that would be made incrementally. The idea was to deliver minimal functionality now, gather real-world feedback, and then gradually evolve the feature, adding APIs, eviction behavior, scheduling mechanisms, and more. However, the current suggestion seems to set aside that vision in favor of short-term concerns. Since swap-aware scheduling hasn't been designed yet, supporting it still requires manual intervention from administrators. Rather than investing time in a more thoughtful and extensible design, we risk rushing into a quick fix that may constrain us down the line. In other words, the proposed API offers modest short-term gains at the expense of long-term flexibility. A better approach, in my view, would be to state clearly: "This feature is production-usable, but incomplete. Today, certain aspects - like scheduling properties - must be managed manually. Only enable it if you're comfortable taking on that responsibility". That framing leaves the door open for robust, future-proof solutions, rather than locking us into a slightly safer setup now that could hinder extensibility later. Looking ahead, even once we support proper scheduling, eviction policies, and swap-limit controls, we'll remain bound to an API that requires admins to configure To summarize, I think it would be best to stick to the KEP's original plan, GA 2400 and discuss KEP-5359 in parallel. Saying that, TBH, after so many releases I'm investing so much in this, and after so many times things were agreed, merged to the KEP, then were completely revisited - I'm worn down. If people think that an opt-in approach is the right way to go - let's do it - as long as we GA swap in the shorter term possible. @dchen1107 @haircommander @thockin @tallclair I'd appreciate your thoughts on this. |
Signed-off-by: Itamar Holder <iholder@redhat.com>
af2c841
to
4a0a2f9
Compare
I'm going to stay out of the sig conversation of "should this be replaced by something else". The PRR itself was updated during the various betas and looks to still be in good shape. This approval is only for the PRR. /approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
based on #5360 (comment)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, iholder101, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel |