Improper iptables configuration in case of concurrent iptables access #2998

yannrouillard · 2017-06-01T05:22:20Z

We currently use weave on our Kubernetes cluster to provide the networking layer and we encounter from time to time networking issues with weave at container startup time.

Our non-production clusters are automatically stopped at night and re-started in the morning.
On several occasion, we noticed the Kubernetes network stack didn't work correctly in the morning.

The symptoms were that containers were not able to access resources outside of the cluster.
It generally impacted also internal access as the kube-dns was not properly working because the container was not able to reach external DNS servers.

After investigation, we noticed the following things:

the problem appear at the node level rather than cluster level: some of the nodes were not impacted and pods hosted on these nodes could access external resources,
restarting weave pods on a network-unhealthy node didn't solve the issue,
the packets sent outside of the cluster weren't properly masqueraded by iptables,
and indeed the WEAVE rule set was not present in the iptables nat table,
the weave container failed its healtcheck at startup at first and was restarted at least once on failed nodes

yannrouillard · 2017-06-01T05:34:09Z

After having at look at the weave shell script that is launched at container start time (by launch.sh),
we noticed than the iptables WEAVE ruleset configuration is performed by the try_create_bridge but only if the bridge is not already present.

Our theory is that the weave container was restarted after the bridge was created but before all iptables rules were created. Upon subsequent weave restarts, iptables were not added again as the bridge was already present and hence the iptables configuration was left in an improper state.

We don't know why this situation happens from time to time only and why it usually impact a lot of nodes at the same time. There may be an external condition that slows down our weave container startup time.

yannrouillard · 2017-06-01T05:37:38Z

The best solution would be to make weave start script more resilient in case of failure.
It might be also that weave liveness probe threshold defined in the default daemon set yaml file is too low and causes unnecessary restarts.

bboreham · 2017-06-01T09:12:52Z

Thanks @yannrouillard; I think your analysis of the situation is very good.

Currently the code starts from scratch and does actions A, B, C, D, E to achieve the target state.
It would be better to compare actual state to target state and decide that only actions C and D (say) are needed to get there. We recently moved all that code from shell-script to Go which makes it far easier to contemplate such a change.

Re the liveness probe, it is configured to allow 30 seconds, and the network set-up typically takes less than one second. So very interested in any clues you can give what would stretch it out that much.

yannrouillard · 2017-06-05T07:15:55Z

Ok some news about this issue.
This might not be caused by restart at wrong moment (or else there are several ways to trigger the issue).

We had a similar issue again but this time we got more information as we enabled debug log.
We saw that the WEAVE target creation failed because of Resource temporarily unavailable error.

This caused an improper iptables configuration that is never repaired after.
Here is the log snippet showing the issue:

+ run_iptables -t nat -N WEAVE
+ [ -z 1 ]
+ iptables -w -t nat -N WEAVE
iptables: Resource temporarily unavailable.
+ true
+ add_iptables_rule nat POSTROUTING -j WEAVE
+ IPTABLES_TABLE=nat
+ shift 1
+ run_iptables -t nat -C POSTROUTING -j WEAVE
+ true
+ run_iptables -t nat -A POSTROUTING -j WEAVE
+ [ -z 1 ]
+ iptables -w -t nat -A POSTROUTING -j WEAVE
iptables v1.6.0: Couldn't load target `WEAVE':No such file or directory

Try `iptables -h' or 'iptables --help' for more information.
+ [ 2 != 4 ]
+ return 1

Currently looking how this could happen but one question: any failure in WEAVE target creation is ignored and error message are redirected to /dev/null, what was the reason for that ?
run_iptables -t nat -N WEAVE >/dev/null 2>&1 || true

We had to remove the >/dev/null 2>&1 to be able to see the proper error message.

yannrouillard · 2017-06-05T08:50:38Z

@bboreham I didn't understand why the -woption didn't prevent this issue but I wonder if we don't run into the problem mentioned in this bug moby/moby#30379:

Iptables binaries on the host have a lock that they try to get (/run/xtables.lock or a unix socket)
and will wait until it's grabbed. However, inside of a container that lock will be different, so
iptables on the host and the container will both attempt to run at the same time, causing this
issue.

From what I see indeed iptables on my host and inside weave container are both using /run/xtables.lock and unless I mistaken, the /run/xtables.lock is not mounted from the host.

Shouldn't we mount /run/xtables.lock in weave container ?

bboreham · 2017-06-05T09:27:45Z

@yannrouillard yes; that is under discussion at #2980. Note we have to ensure the file exists on the host before running a container that mounts it.

That moby issue you linked to is closed as a duplicate but I updated the open one moby/moby#12547

yannrouillard · 2017-07-03T01:06:50Z

For now we mounted the /run/xtables.lock in the weave container as we know this file will be present as the time the weave container is started on our host.

For now the problem didn't appear again but we are waiting for more time before being sure.

I will update this ticket with the outcome.

chrislovecnm · 2017-07-27T20:03:13Z

@bboreham can you provide more information, and possibly an example manifest? Should we add the /run directory to the weave pods?

bboreham · 2017-07-27T21:26:21Z

@chrislovecnm the problem with just doing a mount is that, for a freshly-booted machine where the lock file doesn't exist Docker will create a directory of the same name, which will then break everything.

Mounting the parent directory, /run, is problematic because Docker's container trees are under there, which means every volume mount is now recursive, and that makes things break inside the kernel.

There is an upcoming feature kubernetes/kubernetes#46597 which will allow you to say you want a file and not a directory, so we could safely mount /run/xtables.lock. Sadly we can't rely on that until some future version of Kubernetes (1.8, probably)

Failing that, you need to arrange on the host that the file exists before starting the Weave pod, which may be straightforward for kops. @yannrouillard could you share your manifest change as an example?

bboreham · 2017-10-05T11:51:02Z

Fixed by #3134

yannrouillard changed the title ~~Weave container~~ Improper iptables configuration when weave container is restarted before full initialization Jun 1, 2017

bboreham added the bug label Jun 1, 2017

yannrouillard changed the title ~~Improper iptables configuration when weave container is restarted before full initialization~~ Improper iptables configuration in case of concurrent iptables access Jun 5, 2017

chrislovecnm mentioned this issue Jul 27, 2017

Add lock file to weave manifest kubernetes/kops#3068

Closed

This was referenced Oct 2, 2017

Do something more useful when the weave bridge is DOWN #3133

Closed

Mount iptables lock file in weave-kube #3134

Merged

bboreham added this to the 2.0.5 milestone Oct 5, 2017

bboreham closed this as completed Oct 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improper iptables configuration in case of concurrent iptables access #2998

Improper iptables configuration in case of concurrent iptables access #2998

yannrouillard commented Jun 1, 2017

yannrouillard commented Jun 1, 2017

yannrouillard commented Jun 1, 2017

bboreham commented Jun 1, 2017

yannrouillard commented Jun 5, 2017

yannrouillard commented Jun 5, 2017 •

edited

bboreham commented Jun 5, 2017

yannrouillard commented Jul 3, 2017

chrislovecnm commented Jul 27, 2017

bboreham commented Jul 27, 2017

bboreham commented Oct 5, 2017

Improper iptables configuration in case of concurrent iptables access #2998

Improper iptables configuration in case of concurrent iptables access #2998

Comments

yannrouillard commented Jun 1, 2017

yannrouillard commented Jun 1, 2017

yannrouillard commented Jun 1, 2017

bboreham commented Jun 1, 2017

yannrouillard commented Jun 5, 2017

yannrouillard commented Jun 5, 2017 • edited

bboreham commented Jun 5, 2017

yannrouillard commented Jul 3, 2017

chrislovecnm commented Jul 27, 2017

bboreham commented Jul 27, 2017

bboreham commented Oct 5, 2017

yannrouillard commented Jun 5, 2017 •

edited