New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker provider stops working when dockerd is restarted, requires traefik restart to fix #5833
Comments
I tried to reproduce and it looks like it's not caused by bind-mounts. My Traefik is running on host, connecting via UNIX socket and it fails when I restart Docker service. |
Is there a chance to at least detect this error condition via a failing health ping and/or Prometheus metric? ATM Traefik starts logging the following error, but no metrics hint at something being fundamentally broken:
|
Same issue
|
Could not reproduce. Traefik at
api:
dashboard: true
insecure: true
log:
level: DEBUG
entryPoints:
http:
address: ":3080"
https:
address: ":3443"
providers:
docker:
endpoint: "unix:///var/run/docker.sock" |
This is the key difference. With live restore off when dockerd is restarted so is every container, including traefik. Enable live restore to reproduce. |
Yes, but when I explored this back in 2019 it didn't work this way too, so partially fixed (and probably the issue needs renaming). |
Weird, that works as well. Maybe I have some difference in setup?
|
We see this issue in production with the following setup. Hopefully, this can help you replicate this issue. Basically, we run Docker in live-restore mode, so that we can update the docker daemon without having to restart the running containers ( especially the more complex java based-containers with several minute start-up penalties ). We have Debian based VMs running docker with traefik running in a container to route traffic to other containers loaded on the server. Debian is configured to run unattended-upgrades between 12 AM - 1 AM. Unattended-upgrades is configured to auto-update docker components. When unattended-upgrades runs and updates docker, we see the following errors in /var/log/traefik/traefik.log.json:
Since this runs at the middle-of-the-night, our elastic stack picks up this error via filebeat and floods our e-mails with alerts. This post-docker upgrade state requires a restart of the traefik container. I think a graceful resolution would be to try to reconnect to the docker socket ( unless this is a limitation of docker a/o docker-compose ). Or would a better best-practice be to connect to /var/run/docker.sock via a proxy service? And then, would the proxy socket service handle this edge-case of docker restarts with live-restore with a containerized traefik instance? We have the following environment: libvirt/qemu running Debian 11.1 as the virtual machine. Debian 11.1
docker-compose version 1.29.2, build 5becea4c # cat /etc/apt/sources.list.d/docker.list
# cat /etc/apt/apt.conf.d/50unattended-upgrades
# cat /etc/default/docker
# cat /opt/docker-compose/traefik/docker-compose.yml
|
Is this maybe fixable by connecting to the dockerd via IP instead of a socket file? |
Since docker daemon has not built-in authentication by doing that you are essentially giving root access to any process that can connect to localhost IP on that host. |
thats not true, you can authenticate via server-client TLS. However there is def less authorization vs ro mount. Maybe solvable with a different way, I dont think this is a traefik issue but maybe one with dockerd and how mounts work. |
We are experiencing the same issue with a setup like @jhowe-uw (VMs, docker with live-restore enabled). We are updating docker via ansible and restarting the traefik container along with it. This is just a workaround and not very satisfactory. We tried to use the ping healthcheck endpoint, but when it happens, the healthcheck is successful. So we are not able to detect the problem with the healthcheck. I also think the problem is how docker handles the volume mounts. |
I investigated a little bit further. It looks like docker handles file mounts and directory mounts differently. I think, that I found a solution:
Mount the
I cannot say much about the consequences mouting the whole My testsetup looks like this https://gist.github.com/michaelkebe/a1fd64c5d31aaca5b092aa2b7409bf6d
If you want to try the different mount options, edit the With the not working option (mount
|
Here is a discussion exactly about this problem. |
What did you do?
Updated Docker OS package, which caused Docker daemon (dockerd) to be restarted.
Simply restarting dockerd service has the same effect.
What did you expect to see?
Traefik Docker provider keeps working as it was prior to dockerd restart
What did you see instead?
Traefik Docker provider stops picking up changes until traefik is restarted.
This is likely related to #5589 - dockerd socket is bind-mounted into traefik container and when dockerd is restarted that socket is recreated.Edit: it looks like it isn't - see #5833 (comment)Output of
traefik version
: (What version of Traefik are you using?)What is your environment & configuration (arguments, toml, provider, platform, ...)?
Traefik running in a docker container, using docker provider
Our dockerd config contains
"live-restore": true
which results in dockerd restarts not causing container restarts.The text was updated successfully, but these errors were encountered: