This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 668
Zombie apocalypse with kubernetes and weave-kube #2836
Comments
I found it! Attention to node weave-1.9.2:
weave-1.9.1:
weave-1.9.0:
weave-1.8.2:
Tadaaa! |
Many thanks for reporting this @Bregor!
|
marccarre
added a commit
that referenced
this issue
Mar 13, 2017
Reproduces issue #2836. Sample output: test #5 "run_on mct-0.us-central1-a.weave-net ps aux | grep 'launch.sh' | grep defunct | wc -l" failed: expected "0" got "1" test #6 "run_on mct-1.us-central1-a.weave-net ps aux | grep 'launch.sh' | grep defunct | wc -l" failed: expected "0" got "1" test #7 "run_on mct-2.us-central1-a.weave-net ps aux | grep 'launch.sh' | grep defunct | wc -l" failed: expected "0" got "1" 3 of 8 tests failed in 61.780s.
marccarre
added a commit
that referenced
this issue
Mar 13, 2017
This: 1. prevents from generating defunct (a.k.a. zombie) processes, and therefore fixes #2836; but also 2. reintroduces running more than one process and issues related to signals forwarding, effectively reverting #2688 / reopening #2684. A proper fix would leverage something like Tini. See also: - github.com/krallin/tini - github.com/krallin/tini/issues/8 - github.com/docker-library/official-images#init
marccarre
added a commit
that referenced
this issue
Mar 13, 2017
Fixes #2836, which prevents from generating defunct (a.k.a. zombie) processes, and does so without reopening #2684. In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh does not do. This change leverages tini to add such behaviour.
marccarre
added a commit
that referenced
this issue
Mar 13, 2017
Fixes #2836, i.e. prevents from generating defunct (a.k.a. zombie) launch.sh processes, and does so without reopening #2684. Why: In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh does not do. This change leverages tini to bake such behaviour in, as recommended by Docker: https://github.com/docker-library/official-images#init
marccarre
added a commit
that referenced
this issue
Mar 13, 2017
Fixes #2836, i.e. prevents from generating defunct (a.k.a. zombie) launch.sh processes, and does so without reopening #2684. Why: In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh does not do. This change leverages tini to bake such behaviour in, as recommended by Docker: https://github.com/docker-library/official-images#init See also: github.com/krallin/tini/issues/8
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
Fixes #2836, i.e. prevents from generating defunct (a.k.a. zombie) launch.sh processes, and also propagates signals from Docker to our processes (i.e. does not reopen #2684). Why: In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh currently does not do. This change leverages tini to bake such behaviour in, as recommended by Docker. See also: - github.com/docker-library/official-images#init - github.com/krallin/tini/issues/8
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
Fixes #2836, i.e. prevents from generating defunct (a.k.a. zombie) launch.sh processes, and also propagates signals from Docker to our processes (i.e. does not reopen #2684). Why: In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh currently does not do. This change leverages tini to bake such behaviour in, as recommended by Docker. See also: - github.com/docker-library/official-images#init - github.com/krallin/tini/issues/8 Sample output: - During initialisation: ``` $ ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [...] root 1380 0.5 1.9 879660 74864 ? Ssl Mar13 5:42 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18644 ? Ssl Mar13 0:05 \_ docker-containerd -l /var/run/docker/li root 9716 0.0 0.1 134960 5412 ? Sl 11:00 0:00 \_ docker-containerd-shim 4946a0467c5a root 9734 0.0 0.0 736 4 ? Ss 11:00 0:00 | \_ /sbin/tini -s -- /home/weave/la root 9738 6.0 1.5 483756 59948 ? Sl 11:00 0:00 | \_ /home/weave/weaver --port=6 root 10020 0.0 0.0 1524 64 ? S 11:00 0:00 | \_ /bin/sh /home/weave/launch. root 10110 0.0 0.0 1772 1264 ? S 11:00 0:00 | \_ /bin/sh /home/weave/wea root 10135 0.0 0.0 1772 324 ? S 11:00 0:00 | \_ /bin/sh /home/weave root 10136 0.0 0.0 14656 2912 ? S 11:00 0:00 | \_ curl -o /tmp/we [...] ``` - Once initialised successfully: ``` $ ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [...] root 1380 0.5 1.9 879660 74864 ? Ssl Mar13 5:42 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18644 ? Ssl Mar13 0:05 \_ docker-containerd -l /var/run/docker/li root 9716 0.0 0.1 134960 5412 ? Sl 11:00 0:00 \_ docker-containerd-shim 4946a0467c5a root 9734 0.0 0.0 736 4 ? Ss 11:00 0:00 | \_ /sbin/tini -s -- /home/weave/la root 9738 4.2 1.5 491952 59948 ? Sl 11:00 0:00 | \_ /home/weave/weaver --port=6 [...] ```
This was referenced Mar 14, 2017
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
Reproduces issue #2836. Sample output: test #5 "run_on mct-0.us-central1-a.weave-net ps aux | grep -c '[d]efunct'" failed: expected "0" got "1" test #6 "run_on mct-1.us-central1-a.weave-net ps aux | grep -c '[d]efunct'" failed: expected "0" got "1" test #7 "run_on mct-2.us-central1-a.weave-net ps aux | grep -c '[d]efunct'" failed: expected "0" got "1" 3 of 8 tests failed in 61.780s.
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
Fixes #2836, i.e. prevents from generating defunct (a.k.a. zombie) launch.sh processes, and also propagates signals from Docker to our processes (i.e. does not reopen #2684). Why: In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh currently does not do. This change leverages tini to bake such behaviour in, as recommended by Docker. See also: - github.com/docker-library/official-images#init - github.com/krallin/tini/issues/8 Sample output: - During initialisation: ``` $ ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [...] root 1380 0.5 1.9 879660 74864 ? Ssl Mar13 5:42 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18644 ? Ssl Mar13 0:05 \_ docker-containerd -l /var/run/docker/li root 9716 0.0 0.1 134960 5412 ? Sl 11:00 0:00 \_ docker-containerd-shim 4946a0467c5a root 9734 0.0 0.0 736 4 ? Ss 11:00 0:00 | \_ /sbin/tini -s -- /home/weave/la root 9738 6.0 1.5 483756 59948 ? Sl 11:00 0:00 | \_ /home/weave/weaver --port=6 root 10020 0.0 0.0 1524 64 ? S 11:00 0:00 | \_ /bin/sh /home/weave/launch. root 10110 0.0 0.0 1772 1264 ? S 11:00 0:00 | \_ /bin/sh /home/weave/wea root 10135 0.0 0.0 1772 324 ? S 11:00 0:00 | \_ /bin/sh /home/weave root 10136 0.0 0.0 14656 2912 ? S 11:00 0:00 | \_ curl -o /tmp/we [...] ``` - Once initialised successfully: ``` $ ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [...] root 1380 0.5 1.9 879660 74864 ? Ssl Mar13 5:42 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18644 ? Ssl Mar13 0:05 \_ docker-containerd -l /var/run/docker/li root 9716 0.0 0.1 134960 5412 ? Sl 11:00 0:00 \_ docker-containerd-shim 4946a0467c5a root 9734 0.0 0.0 736 4 ? Ss 11:00 0:00 | \_ /sbin/tini -s -- /home/weave/la root 9738 4.2 1.5 491952 59948 ? Sl 11:00 0:00 | \_ /home/weave/weaver --port=6 [...] ```
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
This: 1. prevents from generating defunct (a.k.a. zombie) processes, and therefore fixes #2836; but also 2. reintroduces running more than one process and issues related to signals forwarding, effectively reverting #2688 / reopening #2684. A proper fix would leverage something like Tini. See also: - github.com/krallin/tini - github.com/krallin/tini/issues/8 - github.com/docker-library/official-images#init
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
Reproduces issue #2836, i.e.: - during initialisation: ``` $ ps auxf [...] root 1380 0.4 1.6 879660 60732 ? Ssl Mar13 5:54 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18540 ? Ssl Mar13 0:06 \_ docker-containerd -l /var/run/docker/li root 12615 0.0 0.1 200496 5424 ? Sl 14:38 0:00 \_ docker-containerd-shim c5637e5bbdcb root 12629 6.2 1.6 296400 62312 ? Ssl 14:38 0:00 | \_ /home/weave/weaver --port=6783 root 12910 0.0 0.0 1524 68 ? S 14:38 0:00 | \_ /bin/sh /home/weave/launch. root 13002 0.0 0.0 1772 1232 ? S 14:38 0:00 | \_ /bin/sh /home/weave/wea root 13027 0.0 0.0 1772 320 ? S 14:38 0:00 | \_ /bin/sh /home/weave root 13028 0.0 0.0 14656 2708 ? S 14:38 0:00 | \_ curl -o /tmp/we [...] ``` - after initialisation: ``` $ ps auxf [...] root 1380 0.4 1.6 879660 60732 ? Ssl Mar13 5:54 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18540 ? Ssl Mar13 0:06 \_ docker-containerd -l /var/run/docker/li root 12615 0.0 0.1 200496 5424 ? Sl 14:38 0:00 \_ docker-containerd-shim c5637e5bbdcb root 12629 3.5 1.6 297460 63340 ? Ssl 14:38 0:00 | \_ /home/weave/weaver --port=6783 root 12910 0.0 0.0 0 0 ? Z 14:38 0:00 | \_ [launch.sh] <defunct> [...] ``` Sample output: ``` test #5 "run_on mct-0.us-central1-a.weave-net ps aux | grep -c '[d]efunct'" failed: expected "0" got "1" test #6 "run_on mct-1.us-central1-a.weave-net ps aux | grep -c '[d]efunct'" failed: expected "0" got "1" test #7 "run_on mct-2.us-central1-a.weave-net ps aux | grep -c '[d]efunct'" failed: expected "0" got "1" 3 of 8 tests failed in 62.012s. ```
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
This: 1. prevents from generating defunct (a.k.a. zombie) processes, and therefore fixes #2836; but also 2. reintroduces running more than one process and issues related to signals forwarding, effectively reverting #2688 / reopening #2684. A proper fix would leverage something like Tini. See also: - github.com/krallin/tini - github.com/krallin/tini/issues/8 - github.com/docker-library/official-images#init
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
Fixes #2836, i.e. prevents from generating defunct (a.k.a. zombie) launch.sh processes, and also propagates signals from Docker to our processes (i.e. does not reopen #2684). Why: In Docker, ENTRYPOINT is PID 1 and therefore has the responsibility of reaping processes and forwarding signals to child processes, which launch.sh currently does not do. This change leverages tini to bake such behaviour in, as recommended by Docker. See also: - github.com/docker-library/official-images#init - github.com/krallin/tini/issues/8 Sample output: - During initialisation: ``` $ ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [...] root 1380 0.5 1.9 879660 74864 ? Ssl Mar13 5:42 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18644 ? Ssl Mar13 0:05 \_ docker-containerd -l /var/run/docker/li root 9716 0.0 0.1 134960 5412 ? Sl 11:00 0:00 \_ docker-containerd-shim 4946a0467c5a root 9734 0.0 0.0 736 4 ? Ss 11:00 0:00 | \_ /sbin/tini -s -- /home/weave/la root 9738 6.0 1.5 483756 59948 ? Sl 11:00 0:00 | \_ /home/weave/weaver --port=6 root 10020 0.0 0.0 1524 64 ? S 11:00 0:00 | \_ /bin/sh /home/weave/launch. root 10110 0.0 0.0 1772 1264 ? S 11:00 0:00 | \_ /bin/sh /home/weave/wea root 10135 0.0 0.0 1772 324 ? S 11:00 0:00 | \_ /bin/sh /home/weave root 10136 0.0 0.0 14656 2912 ? S 11:00 0:00 | \_ curl -o /tmp/we [...] ``` - Once initialised successfully: ``` $ ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND [...] root 1380 0.5 1.9 879660 74864 ? Ssl Mar13 5:42 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18644 ? Ssl Mar13 0:05 \_ docker-containerd -l /var/run/docker/li root 9716 0.0 0.1 134960 5412 ? Sl 11:00 0:00 \_ docker-containerd-shim 4946a0467c5a root 9734 0.0 0.0 736 4 ? Ss 11:00 0:00 | \_ /sbin/tini -s -- /home/weave/la root 9738 4.2 1.5 491952 59948 ? Sl 11:00 0:00 | \_ /home/weave/weaver --port=6 [...] ```
marccarre
added a commit
that referenced
this issue
Mar 14, 2017
This: 1. prevents from generating defunct (a.k.a. zombie) processes, and therefore fixes #2836; but also 2. reintroduces #2684/#2688 as shells do not forward signals and we are still running more than one process. A proper fix would leverage something like tini. See also: - github.com/krallin/tini - github.com/krallin/tini/issues/8 - github.com/docker-library/official-images#init Sample output: - during initialisation: ``` $ ps auxf [...] root 1380 0.4 1.9 879660 74660 ? Ssl Mar13 6:02 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18528 ? Ssl Mar13 0:06 \_ docker-containerd -l /var/run/docker/li root 15482 0.0 0.0 200496 3364 ? Sl 15:04 0:00 \_ docker-containerd-shim 3d1f5eb6e090 root 15496 0.0 0.0 1524 996 ? Ss 15:04 0:00 | \_ /bin/sh /home/weave/launch.sh root 15780 0.0 0.0 1524 64 ? S 15:04 0:00 | \_ /bin/sh /home/weave/launch. root 15872 0.0 0.0 1772 1272 ? S 15:04 0:00 | | \_ /bin/sh /home/weave/wea root 15897 0.0 0.0 1772 324 ? S 15:04 0:00 | | \_ /bin/sh /home/weave root 15898 0.0 0.0 14656 2948 ? S 15:04 0:00 | | \_ curl -o /tmp/we root 15781 10.0 1.5 484556 60296 ? Sl 15:04 0:00 | \_ /home/weave/weaver --port=6 [...] ``` - after initialisation: ``` $ ps auxf [...] root 1380 0.4 1.9 879660 74660 ? Ssl Mar13 6:02 /usr/bin/docker daemon -H fd:// -H unix:/// root 1664 0.0 0.4 502116 18592 ? Ssl Mar13 0:06 \_ docker-containerd -l /var/run/docker/li root 15482 0.0 0.0 200496 3364 ? Sl 15:04 0:00 \_ docker-containerd-shim 3d1f5eb6e090 root 15496 0.0 0.0 1524 996 ? Ss 15:04 0:00 | \_ /bin/sh /home/weave/launch.sh root 15781 2.3 1.6 503064 64320 ? Sl 15:04 0:00 | \_ /home/weave/weaver --port=6[...] ```
bboreham
added a commit
that referenced
this issue
Mar 14, 2017
Remove exec from weaver command; Fixes #2836.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Kubernetes:
Weave:
Zombies:
uname -a
:Maybe it is connected with following issue in kubernetes: kubernetes/kubernetes#39334
We experience this cluster-wide.
The text was updated successfully, but these errors were encountered: