-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improper iptables configuration in case of concurrent iptables access #2998
Comments
After having at look at the weave shell script that is launched at container start time (by launch.sh), Our theory is that the weave container was restarted after the bridge was created but before all iptables rules were created. Upon subsequent weave restarts, iptables were not added again as the bridge was already present and hence the iptables configuration was left in an improper state. We don't know why this situation happens from time to time only and why it usually impact a lot of nodes at the same time. There may be an external condition that slows down our weave container startup time. |
The best solution would be to make weave start script more resilient in case of failure. |
Thanks @yannrouillard; I think your analysis of the situation is very good. Currently the code starts from scratch and does actions A, B, C, D, E to achieve the target state. Re the liveness probe, it is configured to allow 30 seconds, and the network set-up typically takes less than one second. So very interested in any clues you can give what would stretch it out that much. |
Ok some news about this issue. We had a similar issue again but this time we got more information as we enabled debug log. This caused an improper iptables configuration that is never repaired after.
Currently looking how this could happen but one question: any failure in WEAVE target creation is ignored and error message are redirected to /dev/null, what was the reason for that ? We had to remove the |
@bboreham I didn't understand why the
From what I see indeed iptables on my host and inside weave container are both using Shouldn't we mount |
@yannrouillard yes; that is under discussion at #2980. Note we have to ensure the file exists on the host before running a container that mounts it. That moby issue you linked to is closed as a duplicate but I updated the open one moby/moby#12547 |
For now we mounted the For now the problem didn't appear again but we are waiting for more time before being sure. I will update this ticket with the outcome. |
@bboreham can you provide more information, and possibly an example manifest? Should we add the |
@chrislovecnm the problem with just doing a mount is that, for a freshly-booted machine where the lock file doesn't exist Docker will create a directory of the same name, which will then break everything. Mounting the parent directory, There is an upcoming feature kubernetes/kubernetes#46597 which will allow you to say you want a file and not a directory, so we could safely mount Failing that, you need to arrange on the host that the file exists before starting the Weave pod, which may be straightforward for kops. @yannrouillard could you share your manifest change as an example? |
Fixed by #3134 |
We currently use weave on our Kubernetes cluster to provide the networking layer and we encounter from time to time networking issues with weave at container startup time.
Our non-production clusters are automatically stopped at night and re-started in the morning.
On several occasion, we noticed the Kubernetes network stack didn't work correctly in the morning.
The symptoms were that containers were not able to access resources outside of the cluster.
It generally impacted also internal access as the kube-dns was not properly working because the container was not able to reach external DNS servers.
After investigation, we noticed the following things:
The text was updated successfully, but these errors were encountered: