Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

weave-kube doesn't run reset on exit #2807

Closed
pmcq opened this issue Feb 16, 2017 · 4 comments
Closed

weave-kube doesn't run reset on exit #2807

pmcq opened this issue Feb 16, 2017 · 4 comments

Comments

@pmcq
Copy link

pmcq commented Feb 16, 2017

We have a cluster that does autoscaling (EC2 instances that are stopped/started/restarted depending), and it appears that on occasion, a restarted machine will not be able to connect to peers, in part because they had hit their conn-limit with connections to machines that are no longer alive. I bumped up from the default conn-limit, but didn't know if running weave reset before shutting down a machine would fix this issue (and reclaim IP addresses used by that node).

It does not appear that running kubectl delete pod <weave-pod> runs weave reset (per slack discussion with @brb). I was wondering if adding a preStop hook (https://kubernetes.io/docs/user-guide/container-environment/#hook-details) to the weave daemon set to run weave reset would be sufficient here. I also know it would require adding /var/run/docker.sock as a hostPath mounted volume to the weave container in order to run weave reset.

Partially related: #2797

The Allocating IP addresses section of the doc mentions that weave "also releases that address when the container exits." but I'm not sure that happens if it doesn't do a weave reset
https://www.weave.works/docs/net/latest/ipam/

@brb
Copy link
Contributor

brb commented Feb 17, 2017

Thanks for the report.

Re weave reset: let's move the discussion to #2797, as your issue duplicates it.

Re: the Multiple connections errors you reported on Slack (https://weave-community.slack.com/archives/general/p1487196446001096):

  1. Could you provide full logs of weave-kube container which logged the error?
  2. Does waiting ~1 minute makes the failing connection to self-heal?

@brb brb removed their assignment Feb 17, 2017
@pmcq
Copy link
Author

pmcq commented Feb 17, 2017

Okay, sounds good thanks. I think it is safe to close this issue then.

I believe the multiple connections issue was from not waiting enough between restarts of the container. I haven't seen it reliably in our normal setup, just when I was doing some manual reboots and container restarts to investigate. The conn-limit and hosts connecting to old instances seemed to be the real issue.

@pmcq
Copy link
Author

pmcq commented Mar 2, 2017

Re: Multiple connections: For anyone who stumbles here, after doing more digging on the next node failure I believe that the root issue we were seeing was: #2617 as the weave-kube pod was coming up with exit status 1: iptables: No chain/target/match by that name. on v1.8.0 occasionally on node reboot. We bumped versions and don't believe the issue is recurring any longer (rebooted nodes a number of times in a row but will need time to be sure)

@brb
Copy link
Contributor

brb commented Mar 3, 2017

@pmcq Thanks for letting us know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants