Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weave not removing dns entries #3432

Closed
MikeMichel opened this issue Oct 19, 2018 · 10 comments
Closed

weave not removing dns entries #3432

MikeMichel opened this issue Oct 19, 2018 · 10 comments
Milestone

Comments

@MikeMichel
Copy link

What you expected to happen?

weave removes dns entry after container was killed

What happened?

If you try often enough dns entry stays and there is no way we found to remove it

How to reproduce it?

  • start multiple containers with the same fqn
  • kill some
  • repeat
  • check with weave status dns
  • at some point there will be more dns entries then running containers

Anything else we need to know?

Versions:

$ weave version 
2.4.0
$ docker version
Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:08:18 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:10:42 2018
  OS/Arch:          linux/amd64
  Experimental:     false

$ uname -a
Linux node1 3.10.0-862.11.6.el7.x86_64

Reproduced with a 4 node vagrant setup:

[root@node1 ~]# weave status dns
Starting 4 containers
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
46e8c17644cb62ece159b159c6651e3d3827971e776b78a1699743b2a8815ec5
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
b483e0bea8f230ae52ca98cc8b795b98e3b38690e8607cf59437da55e8c258bf
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
a7f4a8fb692d34dc30c90d3e5a7d54e3a6f17196f50f3b51538ffdedb9abf184
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
d5a6f8c03d45861670b8baee74d5c5296e288487d168eba634d8978a5c0f2ca4
[root@node1 ~]# weave status dns
web1         10.1.0.1        46e8c17644cb 00:00:00:00:00:01
web1         10.1.0.3        a7f4a8fb692d 00:00:00:00:00:01
web1         10.1.0.2        b483e0bea8f2 00:00:00:00:00:01
web1         10.1.0.4        d5a6f8c03d45 00:00:00:00:00:01
4 entries = fine
[root@node1 ~]# docker stop b483e0bea8f2 a7f4a8fb692d
b483e0bea8f2
a7f4a8fb692d
[root@node1 ~]# weave status dns
web1         10.1.0.1        46e8c17644cb 00:00:00:00:00:01
web1         10.1.0.4        d5a6f8c03d45 00:00:00:00:00:01
2 entries = fine
starting another 2 containers
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
502dd7c1304beb42515bea1c23b2f3ce740a768b06df7a547b79a3b3ad820577
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
8826109a5bb05aa628c0d1274a1cd4f3bc834b0b03cd5ba990283305a7acdf10
[root@node1 ~]# weave status dns
web1         10.1.0.1        46e8c17644cb 00:00:00:00:00:01
web1         10.1.0.5        502dd7c1304b 00:00:00:00:00:01
web1         10.1.0.6        8826109a5bb0 00:00:00:00:00:01
web1         10.1.0.4        d5a6f8c03d45 00:00:00:00:00:01
kill 2
[root@node1 ~]# docker stop d5a6f8c03d45 8826109a5bb0
d5a6f8c03d45
8826109a5bb0
[root@node1 ~]# weave status dns
web1         10.1.0.1        46e8c17644cb 00:00:00:00:00:01
web1         10.1.0.5        502dd7c1304b 00:00:00:00:00:01
2 entries = still fine
starting 3
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
0277bcd9b7b782c70096a4e966cbbf4f2a8416113ae577f2fd7bf5a9a8add9a8
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
ca0d145216ccccbbf66b6dabcc6275c83c15b0411e705dd624de8900ad2580ae
[root@node1 ~]# weave status dns
web1         10.1.0.3        0277bcd9b7b7 00:00:00:00:00:01
web1         10.1.0.2        38ce8a451d71 00:00:00:00:00:01
web1         10.1.0.1        46e8c17644cb 00:00:00:00:00:01
web1         10.1.0.5        502dd7c1304b 00:00:00:00:00:01
web1         10.1.0.7        ca0d145216cc 00:00:00:00:00:01
killing 4 of 5
[root@node1 ~]# docker stop 0277bcd9b7b7 38ce8a451d71 46e8c17644cb 502dd7c1304b
0277bcd9b7b7
38ce8a451d71
46e8c17644cb
502dd7c1304b
[root@node1 ~]# weave status dns
web1         10.1.0.2        38ce8a451d71 00:00:00:00:00:01
web1         10.1.0.7        ca0d145216cc 00:00:00:00:00:01
oh, this should be only 1. what about 38ce8a451d71 ?
waiting some secs
[root@node1 ~]# weave status dns
web1         10.1.0.2        38ce8a451d71 00:00:00:00:00:01
web1         10.1.0.7        ca0d145216cc 00:00:00:00:00:01
nope still 2 entries but only 1 container is running
[root@node1 ~]# docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED              STATUS              PORTS               NAMES
ca0d145216cc        nginx                    "/w/w nginx -g 'daem…"   About a minute ago   Up About a minute   80/tcp              pedantic_bartik
9eac57defb22        weaveworks/weave:2.4.0   "/home/weave/weaver …"   2 minutes ago        Up 2 minutes                            weave

Logs:

$ docker logs weave 2>&1|grep 38ce8a451d71

INFO: 2018/10/19 15:43:47.028606 POST /v1.38/containers/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d/wait?condition=next-exit
INFO: 2018/10/19 15:43:47.035069 POST /v1.38/containers/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d/start
DEBU: 2018/10/19 15:43:47.426027 Wait for start of container 38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d
DEBU: 2018/10/19 15:43:47.430344 [net] ContainerStarted: 38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d
DEBU: 2018/10/19 15:43:47.431526 [net] ContainerStarted: 38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d
INFO: 2018/10/19 15:43:47.439148 Attaching container 38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d with WEAVE_CIDR "net:10.1.0.0/24" to weave network
DEBU: 2018/10/19 15:43:47.439186 weave POST to http://127.0.0.1:6784/ip/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d/10.1.0.0/24 with map[check-alive:[true]]
DEBU: 2018/10/19 15:43:47.440234 [http] POST /ip/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d/10.1.0.0/24
DEBU: 2018/10/19 15:43:47.443663 [allocator 00:00:00:00:00:01]: Allocated 10.1.0.2 for 38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d in 10.1.0.0/24
DEBU: 2018/10/19 15:43:47.446763 Running image "weaveworks/weaveexec:2.4.0"; entrypoint=["sh"]; cmd="[-c echo '# created by Weave - BEGIN\n# container hostname\n10.1.0.2        we]"; binds=["/var/lib/docker/containers/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d:/container"]
DEBU: 2018/10/19 15:43:48.029190 weave PUT to http://127.0.0.1:6784/name/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d/10.1.0.2 with map[fqdn:[web1.node.intern.]]
DEBU: 2018/10/19 15:43:48.029884 [http] PUT /name/38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d/10.1.0.2
INFO: 2018/10/19 15:43:48.030138 [nameserver 00:00:00:00:00:01] adding entry for 38ce8a451d7152fe37e600ae7dd921d680e051f26bcf7eb883a283c3263b920d: web1.node.intern. -> 10.1.0.2

It seems weave did not received/noticed the kill of container 38ce8a451d71

Now i am not able to remove the dead entry even when i kill the other left container.

[root@node1 ~]# docker stop ca0d145216cc
ca0d145216cc
[root@node1 ~]# weave status dns
web1         10.1.0.2        38ce8a451d71 00:00:00:00:00:01
[root@node1 ~]# weave dns-remove 10.1.0.2 -h web1
[root@node1 ~]# weave status dns
web1         10.1.0.2        38ce8a451d71 00:00:00:00:00:01
[root@node1 ~]# weave dns-remove 10.1.0.2 -h web1.node.intern
[root@node1 ~]# weave status dns
web1         10.1.0.2        38ce8a451d71 00:00:00:00:00:01

Only weave restart helps now. A force flag for weave dns-remove would be helpfull which ignores the fact that container 38ce8a451d71 is not running anymore and just removes the dns entry.

@rade
Copy link
Member

rade commented Oct 19, 2018

A force flag for weave dns-remove would be helpfull which ignores the fact that container 38ce8a451d71 is not running anymore and just removes the dns entry.

That's #1976.

@MikeMichel
Copy link
Author

Is there any workaround for now except weave restart?

@jzaefferer
Copy link

The reference to #1976 isn't helping much, since that is a much older (stale) discussion.

Any chance to get this bug addressed?

@murali-reddy
Copy link
Contributor

@MikeMichel I tried to reproduce this issue. I am not able to repro after several tries (with mix of docker container add and deletes). Are you able to reproduce this issue consistently just following the steps mentioned.

@MikeMichel
Copy link
Author

@murali-reddy yep, already with the first try:

[root@node1 ~]# weave status

        Version: 2.4.0 (version 2.5.1 available - please upgrade!)

        Service: router
       Protocol: weave 1..2
           Name: 00:00:00:00:00:01(node1)
     Encryption: enabled
  PeerDiscovery: enabled
        Targets: 3
    Connections: 4 (3 established, 1 failed)
          Peers: 4 (with 12 established connections)
 TrustedSubnets: none

        Service: ipam
         Status: ready
          Range: 10.0.0.0/8
  DefaultSubnet: 10.1.1.0/24

        Service: dns
         Domain: node.intern.
       Upstream: 10.0.2.3
            TTL: 1
        Entries: 0

        Service: proxy
        Address: unix:///var/run/weave/weave.sock

        Service: plugin (legacy)
     DriverName: weave

[root@node1 ~]# docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS               NAMES
29c4d2902389        weaveworks/weave:2.4.0   "/home/weave/weaver …"   45 minutes ago      Up 45 minutes                           weave
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
2f507c5ef0072bfb2d98acbdb0acee6be666140bf6b56d9289d8770e927ff820
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
5efc78343247b08402cc78091aa3501ce261b76202f89954a405858e4fdc179d
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
36213f6d786f2ab45be547e267b3a43cb4b8334c565eefef281e6861e8e65ae9
[root@node1 ~]# docker -H unix:///var/run/weave/weave.sock run -d -e WEAVE_CIDR=net:10.1.0.0/24 -h web1.node.intern nginx
53ef40002d317ef370d7d92fa7229e6a58fdc92eb82a2bfa94bd2e71af076b20
[root@node1 ~]# weave status dns
web1         10.1.0.1        2f507c5ef007 00:00:00:00:00:01
web1         10.1.0.3        36213f6d786f 00:00:00:00:00:01
web1         10.1.0.4        53ef40002d31 00:00:00:00:00:01
web1         10.1.0.2        5efc78343247 00:00:00:00:00:01
[root@node1 ~]# docker stop 2f507c5ef007 36213f6d786f 53ef40002d31
2f507c5ef007
36213f6d786f
53ef40002d31
[root@node1 ~]# weave status dns
web1         10.1.0.1        2f507c5ef007 00:00:00:00:00:01
web1         10.1.0.2        5efc78343247 00:00:00:00:00:01
[root@node1 ~]# date
Fri Feb  8 10:41:47 UTC 2019
[root@node1 ~]# docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED              STATUS              PORTS               NAMES
5efc78343247        nginx                    "/w/w nginx -g 'daem…"   About a minute ago   Up About a minute   80/tcp              adoring_lamport
29c4d2902389        weaveworks/weave:2.4.0   "/home/weave/weaver …"   About an hour ago    Up About an hour                        weave
[root@node1 ~]# weave status dns
web1         10.1.0.1        2f507c5ef007 00:00:00:00:00:01
web1         10.1.0.2        5efc78343247 00:00:00:00:00:01

@MikeMichel
Copy link
Author

@murali-reddy adding my config:

[root@node1 ~]# cat /etc/sysconfig/weave
# Ansible managed
PEERS="20.20.20.21 20.20.20.22 20.20.20.23"
SEED="::1,::2,::3"
WEAVE_NO_FASTDP=true
WEAVE_MTU=1414


[root@node1 ~]# cat /etc/systemd/system/weave.service
[Unit]
Description=Weave Network
Documentation=http://docs.weave.works/weave/latest_release/
Requires=docker.service
After=docker.service

[Service]
EnvironmentFile=-/etc/sysconfig/weave

ExecStartPre=/usr/local/bin/weave launch --log-level=debug --name ::1 --no-restart --dns-domain="node.intern." --ipalloc-range 10.0.0.0/8 --ipalloc-default-subnet 10.1.1.0/24 --password hjoo7VourohpeiChiepai8piakoh --rewrite-inspect -H unix:///var/run/weave/weave.sock --no-default-ipalloc --ipalloc-init seed=::1,::2,::3 $PEERS
ExecStart=/usr/bin/docker attach weave
ExecStop=/usr/local/bin/weave stop
Restart=always

[Install]
WantedBy=multi-user.target

@murali-reddy
Copy link
Contributor

thanks @MikeMichel will give it a try with your config

@jzaefferer
Copy link

@murali-reddy we're currently testing a downgrade of weave to avoid this issue, and so far it looks like 2.3 doesn't have it. I've put my more automated setup for testing this in a small repo: https://github.com/jzaefferer/weave-test-setup - hopefully that helps reproducing this issue on your end and eventually fixing it.

I could also use this to help test #3570, though I need some instructions for how to install that patch. Basically a replacement for https://github.com/weaveworks/weave/releases/download/latest_release/weave that will also use the correct Docker images.

@murali-reddy
Copy link
Contributor

@jzaefferer yes problem manifests only from 2.4 on wards. Unfortunately fix in #3570 does not work consistently. There is some latent problem after this change how network namespaces are switched. I need to rework on proper fix.

@bboreham
Copy link
Contributor

bboreham commented Nov 4, 2019

Fixed by #3705

@bboreham bboreham closed this as completed Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants