-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix CalicoNodeStatus update stuck in conflict #10555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CalicoNodeStatus update stuck in conflict #10555
Conversation
KubeConfig: apiconfig.KubeConfig{ | ||
Kubeconfig: os.Getenv("KUBECONFIG"), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this change is really needed, but on my local environment, without this change make fv
would silently skip whole nodestatus test suite, giving a false-positive success, while actually 0 tests were executed. This happens due to panic in scope of Describe
, which ginkgo treats as success for some reason
/sem-approve |
Removing "merge-when-ready" label due to new commits |
@caseydavenport , Hi, some tests have failed, one of them (related to linter) I have fixed, but for others I can not see any error information or artifacts to look at. Can you please help to understand why those tests failed |
@theboringstuff I think the test failure was due to a bug in master that has since been fixed - you should be able to pull in the latest master to your PR and I believe those tests will pass now. |
@caseydavenport , done, but semaphore is pending, can you please approve it |
/sem-approve |
@caseydavenport , check is still failing, I tried to reproduce locally one of them, but I am not sure I get the same error. Do I understand correctly that PR can not be merged until tests are passing? |
@caseydavenport , Hi, do you have any suggestions what we should do to proceed? I would like to fix CI somehow, but without access to any informative output I can only guess what is the problem |
@theboringstuff sorry - I have been out on PTO. I will take a look at this today! |
I think the test failure is not related to your changes, so we can merge this. |
@caseydavenport thank you! Can we also back-port it to v3.30 and v3.29? Should I create additional PRs to those release branches? |
@theboringstuff yep! Feel free to create cherry-pick PRs to |
Description
CalicoNodeStatus reporter may get stuck in a conflict trying to update object in k8s, thus leaving CalicoNodeStatus in outdated state. This conflict may happen due to syncer not sending object updates to reporter. Syncer may miss object updates if these updates happened during apiserver/etcd performance drops (e.g. due to slow disk), as described in the issue #8715. It is hard to trigger manually in real-world scenarios, but we frequently see this issue post-factum on our envs. This issue could be easily reproduced synthetically by making k8s objects updates directly through etcd, avoiding apiserver
We would like this also to be backported to calico 3.30/3.29
Related issues/PRs
fixes #8715
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.