-
Notifications
You must be signed in to change notification settings - Fork 672
'weave ps' fails with "Cannot find weave bridge: Link not found" #2388
Comments
I looked at this the other day. The WEAVE_DEBUG output shows the error half way through iterating over the containers, which is very mysterious indeed. |
@2opremio Are you able to reproduce the issue? Maybe you have an output of From inspecting the netlink library, I can see that |
I can't reproduce with my computer in Spain (different Docker/Weave versions). I can reproduce systematically with my laptop in London but it would have to wait until Monday. In the meantime, I think @jml can (he shows exactly the same error in weaveworks/scope#1599 ) |
As I've already mentioned on Slack, the root cause is the WithNetNS function. During execution of In the Linux kernel, each netdev is associated only with exactly one network namespace. Therefore, the "weave" bridge cannot be seen from a container which network namespace != host network namespace. The error you get is due to https://github.com/weaveworks/weave/blob/v1.6.0/common/utils.go#L108 being executed on a thread which entered container network namespace which is not associated with the "weave" bridge netdev. As an evidence, please check the following strace output (full log):
The 3365 process does the |
To reproduce the issue (on my machine - 4.6 / 1.11.2), I start 32 containers ( |
This is basically the same as moby/libnetwork#1113, which any Go program calling |
Please see #2388 (comment) for more details.
Please see #2388 (comment) for more details.
Please see #2388 (comment) for more details.
Please see #2388 (comment) for more details.
Please see #2388 (comment) for more details.
I found this via https://www.weave.works/blog/linux-namespaces-and-go-don-t-mix. This snippet: runtime.LockOSThread()
defer runtime.UnlockOSThread()
hostNetNs, _ := netns.Get()
netns.Set(ns)
links, _ := netlink.LinkList()
fmt.Println(filterAttachedLinks(links, indexes))
// Return to the host network namespace
netns.Set(hostNetNs) is a perfect illustration of erroneous usage of
|
@bcmills we think the most likely explanation is the one we observed: a new OS thread is started inside the locked region, which inherits the changed namespace, and when other goroutines are then scheduled onto that thread they are in the that other namespace. |
... even if the weave bridge exists:
This happened when running a local instance of the scope service (kubernetes anywhere).
Running with
bash -x
I get:Output when running the docker command above with WEAVE_DEBUG=1: https://gist.github.com/2opremio/cfe5c7fa236854e9963dda0f8afc5754
Some details about my system:
The text was updated successfully, but these errors were encountered: