weave-kube doesn't run reset on exit #2807

pmcq · 2017-02-16T18:41:19Z

We have a cluster that does autoscaling (EC2 instances that are stopped/started/restarted depending), and it appears that on occasion, a restarted machine will not be able to connect to peers, in part because they had hit their conn-limit with connections to machines that are no longer alive. I bumped up from the default conn-limit, but didn't know if running weave reset before shutting down a machine would fix this issue (and reclaim IP addresses used by that node).

It does not appear that running kubectl delete pod <weave-pod> runs weave reset (per slack discussion with @brb). I was wondering if adding a preStop hook (https://kubernetes.io/docs/user-guide/container-environment/#hook-details) to the weave daemon set to run weave reset would be sufficient here. I also know it would require adding /var/run/docker.sock as a hostPath mounted volume to the weave container in order to run weave reset.

Partially related: #2797

The Allocating IP addresses section of the doc mentions that weave "also releases that address when the container exits." but I'm not sure that happens if it doesn't do a weave reset
https://www.weave.works/docs/net/latest/ipam/

The text was updated successfully, but these errors were encountered:

brb · 2017-02-17T13:58:37Z

Thanks for the report.

Re weave reset: let's move the discussion to #2797, as your issue duplicates it.

Re: the Multiple connections errors you reported on Slack (https://weave-community.slack.com/archives/general/p1487196446001096):

Could you provide full logs of weave-kube container which logged the error?
Does waiting ~1 minute makes the failing connection to self-heal?

pmcq · 2017-02-17T15:01:46Z

Okay, sounds good thanks. I think it is safe to close this issue then.

I believe the multiple connections issue was from not waiting enough between restarts of the container. I haven't seen it reliably in our normal setup, just when I was doing some manual reboots and container restarts to investigate. The conn-limit and hosts connecting to old instances seemed to be the real issue.

pmcq · 2017-03-02T20:03:12Z

Re: Multiple connections: For anyone who stumbles here, after doing more digging on the next node failure I believe that the root issue we were seeing was: #2617 as the weave-kube pod was coming up with exit status 1: iptables: No chain/target/match by that name. on v1.8.0 occasionally on node reboot. We bumped versions and don't believe the issue is recurring any longer (rebooted nodes a number of times in a row but will need time to be sure)

brb · 2017-03-03T11:01:53Z

@pmcq Thanks for letting us know.

marccarre assigned brb Feb 17, 2017

marccarre added the state/investigating label Feb 17, 2017

brb mentioned this issue Feb 17, 2017

Remove deleted k8s nodes from Weave Net #2797

Closed

brb removed their assignment Feb 17, 2017

marccarre assigned pmcq Feb 17, 2017

pmcq closed this as completed Feb 17, 2017

marccarre added this to the n/a milestone Feb 17, 2017

marccarre added resolution/duplicate and removed state/investigating labels Feb 17, 2017

marccarre unassigned pmcq Feb 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weave-kube doesn't run reset on exit #2807

weave-kube doesn't run reset on exit #2807

pmcq commented Feb 16, 2017

brb commented Feb 17, 2017

pmcq commented Feb 17, 2017

pmcq commented Mar 2, 2017

brb commented Mar 3, 2017

weave-kube doesn't run reset on exit #2807

weave-kube doesn't run reset on exit #2807

Comments

pmcq commented Feb 16, 2017

brb commented Feb 17, 2017

pmcq commented Feb 17, 2017

pmcq commented Mar 2, 2017

brb commented Mar 3, 2017