Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomise wait to reclaim IPs on node deletion, and fix flaky test 870 #3716

Merged
merged 5 commits into from
Oct 15, 2019

Conversation

bboreham
Copy link
Contributor

Fixes #3690 - we should shut down Kubernetes before deleting the node otherwise Weave Net can restart.

While debugging this I found a few other things wrong with the kube-utils program, notably that the randomised wait would always take 3.081 seconds so we got a thundering herd every time.

And reduce the wait in the test script - now we have a callback when a node is removed and ipam broadcasts the reclaim, there should be no reason to wait 90 seconds.

Without this, each run gets the same random numbers so all nodes are
racing to remove the deleted node.
So we know when it came in, also which node.
Otherwise we see all the node addresses in the middle of the log.
Now we have a callback when a node is removed, there should be no
reason to wait 90 seconds.

The reclaim does wait up to 5 seconds before starting, so allowing 10
seconds in the script should be sufficient.
@bboreham bboreham changed the title Randomise wait to reclaim IPs on node deletion, and fix flaky test Randomise wait to reclaim IPs on node deletion, and fix flaky test 870 Oct 11, 2019
@bboreham bboreham added this to the 2.6 milestone Oct 12, 2019
@murali-reddy
Copy link
Contributor

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test flake in 870_weave_recovers_unreachable_ips
2 participants