-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5307 Initial KEP for container restart exceptions #5308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5307 Initial KEP for container restart exceptions #5308
Conversation
Welcome @yuanwang04! |
Hi @yuanwang04. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6cb2b80
to
0cbfc18
Compare
/ok-to-test |
0cbfc18
to
040ffef
Compare
bee6cce
to
67df630
Compare
@yuanwang04 thank you for the work, I like this proposal, AFAIK this approach is fully compatible with the Job's podFailurePolicy (at least if I'm not missing something), because when Pod's restartPolicy: Never, then Job's podFailurePolicy only analyzes pods which reach the "Failed" phase. Here, the pods avoid reaching the failed phase. Once they reach, they will be matched against podFailurePolicy which may decide to recreate the entire pod. |
67df630
to
5bbe5d6
Compare
Capturing discussion: What I WANT is a design that ties pod restartPolicy, container restartPolicy, and this new idea of rules together into a coherent story. If we end up in a place where We talked through options which add a new value to RestartPolicy, but given that RestartPolicy a) doesn't say "clients must handle unknown values" and b) overloads the meaning to include "this is a sidecar", I think we simply CAN'T add a new enum value. So I accept that I am going to be grumpy. I think we CAN get to a place where container Is it worth adding something to status to represent the result of these multiple fields, so we can encourage clients to infer less? |
cac3f70
to
cbcb7b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API LGTM overall - all my comments are minor and this doesn't have to be final API
/approve
This KEP introduces restart rules for a container so kubelet can apply | ||
those rules on container failure. This will allow users to configure special | ||
exit codes of the container to be treated as non-failure and restart the | ||
container in-place even if the Pod has a restartPolicy=Never. This scenario is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haircommander we can lament our past mistakes, but we can't always fix them. I grilled these guys for hours this week and the conclusion is that none of us can find a way that is "safe enough". :(
|
||
The initial proposal supports only exit code as requirement for the rules. | ||
|
||
The proposed API is as following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider adding something to status which is less layered?
cbcb7b2
to
2c44788
Compare
2c44788
to
7aaf999
Compare
/approve PRR |
/approve |
/lgtm I remain some concerns, but those concerns shouldn't block we move forward this alpha feature. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, mrunalp, thockin, wojtek-t, yuanwang04 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
Uh oh!
There was an error while loading. Please reload this page.