Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Handle affinity assistant deadlock for node maintenance #6584

Closed
wants to merge 1 commit into from

Conversation

lbernick
Copy link
Member

@lbernick lbernick commented Apr 26, 2023

Prior to this commit, when cordoning a node for maintenance, the affinity assistant can deadlock. This happens because if the placeholder pod is scheduled to a node which is then marked unschedulable, new TaskRun pods cannot schedule or trigger scaleup for the cluster autoscaler because they have inter-pod affinity for the placeholder pod on the unschedulable node.

This commit adds a new controller which watches for nodes. If the nodes become unschedulable, it deletes any affinity assistant pods running on them (but leaves any TaskRun pods). The affinity assistant statefulset will then recreate the placeholder pod, which will be scheduled to an available node (or trigger scale-up if there's a volume node affinity conflict). Existing TaskRuns cannot be scheduled until the placeholder pod is re-scheduled.

Prerequisite for #6543.

This commit only handles situations where nodes are unschedulable. It doesn't handle situations where nodes run out of resources or reach their cap on the number of pods (e.g. #4699).

Tested locally, it appears to work.
Closes #6586.

/kind bug

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

NONE

Prior to this commit, when cordoning a node for maintenance, the affinity assistant
can deadlock. This happens because if the placeholder pod is scheduled to a node
which is then marked unschedulable, new TaskRun pods cannot schedule or trigger scaleup
for the cluster autoscaler because they have inter-pod affinity for the placeholder pod
on the unschedulable node.

This commit adds a new controller which watches for nodes. If the nodes become unschedulable,
it deletes any affinity assistant pods running on them (but leaves any TaskRun pods).
The affinity assistant statefulset will then recreate the placeholder pod, which will be scheduled
to an available node (or trigger scale-up if there's a volume node affinity conflict).
Existing TaskRuns cannot be scheduled until the placeholder pod is re-scheduled.

This commit only handles situations where nodes are unschedulable. It doesn't handle situations where
nodes run out of resources or reach their cap on the number of pods.

Tested locally, it appears to work.
@tekton-robot
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@tekton-robot tekton-robot added release-note-none Denotes a PR that doesnt merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. labels Apr 26, 2023
@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from lbernick after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 26, 2023
@lbernick
Copy link
Member Author

@skaegi @pritidesai hoping this could address your concerns? This feels a bit hacky but would like to hear your thoughts.

@jlpettersson you might have thoughts as well?

@jlpettersson
Copy link
Member

This sounds good to me. It is an improvement from the current situation. 👍

@lbernick
Copy link
Member Author

lbernick commented May 9, 2023

Closing in favor of @pritidesai's alternate approach #6596, thanks Priti!

@lbernick lbernick closed this May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesnt merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Affinity assistant deadlock during node maintenance
7 participants