Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak using the Docker provider (Swarm mode) #8872

Closed
2 tasks done
kgeri opened this issue Mar 23, 2022 · 3 comments
Closed
2 tasks done

Memory leak using the Docker provider (Swarm mode) #8872

kgeri opened this issue Mar 23, 2022 · 3 comments

Comments

@kgeri
Copy link

kgeri commented Mar 23, 2022

Welcome!

  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

Configured Traefik as a service in Docker Swarm, with command-line options and labels, relying mostly on the defaults.

I must note here that I have an unorthodox setup: the swarm consists of two manager nodes, one of which regularly goes down. As long as it is down, Traefik constantly logs errors like:

frontend_traefik.1.qsk3rquh4y98@srvu    | time="2022-03-22T13:12:11Z" level=error msg="Failed to list services for docker swarm mode, error Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online." providerName=docker

That in itself wouldn't be a big deal, however...

What did you see instead?

...I noticed memory usage steadily increasing, too.
I'm not sure whether the excessive logging and memory usage are related, but just in a day, it grew from ~25MB to ~250MB.
As soon as it reached its resource limits, it went into periodic CPU bursts, which I suspect is Go's GC kicking in (?). Once restarted, memory usage went back to ~25MB.

I enabled debug mode and took these snapshots after about a day:
allocs
heap

I noticed most memory is consumed in data structures related to the Docker provider and HTTP request processing, so I first tried setting:
--providers.docker.httpClientTimeout=60
but that didn't help.

Setting this, however, fixed the issue!
--providers.docker.watch=false

The docs aren't exactly wordy on this one, but I suppose this means if my services bounce/get relocated, then Traefik will be none the wiser. Like I said, my setup is weird, so I can live with this... but wanted to let you know in case it uncovers some deeper issue.

What version of Traefik are you using?

Version: 2.6.1
Codename: rocamadour
Go version: go1.17.7
Built: 2022-02-14T16:50:25Z
OS/Arch: linux/amd64

What is your environment & configuration?

--entrypoints.web.address=:80
--providers.docker.swarmMode=true
--providers.docker.exposedbydefault=false

If applicable, please paste the log output in DEBUG level

No response

@mpl mpl added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. area/provider/docker/swarm and removed status/0-needs-triage labels Mar 25, 2022
@aeburriel
Copy link

aeburriel commented Aug 24, 2022

I've been bit by the same problem on Traefik 2.8.3.
Leak happens when swarm status can't be retrieved.
It can easily be reproduced by demoting the Swarm node configured as Docker provider in Traefik.

My relevant configuration is:

--providers.docker=true
--providers.docker.endpoint=unix:///var/run/docker.sock
--providers.docker.swarmMode=true
--providers.docker.watch=true

Open file descriptors count is steady until Docker node is demoted, causing Docker's control socket to return error and file descriptors leaking begin:
traefik-file-descriptors-leak
Leaking stops when promoting the node back:
traefik-file-descriptors-leak-end

Latest Traefik log entries, when leak happens:

time="2022-08-24T13:21:13Z" level=error msg="Failed to list services for docker swarm mode, error Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager." providerName=docker
time="2022-08-24T13:21:13Z" level=error msg="Provider connection error Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager., retrying in 5.45179408s" providerName=docker
time="2022-08-24T13:21:18Z" level=error msg="Failed to list services for docker swarm mode, error Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager." providerName=docker

Traefik's process opened file descriptor list:
traefik-file-descriptors-leak.txt

The following set of file descriptors stay open for each unsuccessful provider endpoint request:

lrwx------ 1 root root 64 Aug 24 13:20 9 -> 'socket:[362439]'
lrwx------ 1 root root 64 Aug 24 13:20 90 -> 'socket:[511925]'
lrwx------ 1 root root 64 Aug 24 13:20 91 -> 'socket:[512472]'
lrwx------ 1 root root 64 Aug 24 13:20 92 -> 'socket:[511948]'
lrwx------ 1 root root 64 Aug 24 13:20 93 -> 'socket:[512483]'
lrwx------ 1 root root 64 Aug 24 13:20 94 -> 'socket:[511974]'
lrwx------ 1 root root 64 Aug 24 13:20 95 -> 'socket:[511991]'
lrwx------ 1 root root 64 Aug 24 13:20 96 -> 'socket:[512520]'
lrwx------ 1 root root 64 Aug 24 13:20 97 -> 'socket:[512584]'
lrwx------ 1 root root 64 Aug 24 13:20 98 -> 'socket:[512608]'
lrwx------ 1 root root 64 Aug 24 13:20 99 -> 'socket:[512617]'

@mpl
Copy link
Collaborator

mpl commented Aug 26, 2022

@aeburriel thanks for the tip, I can now reproduce.
it's barely noticeable in my case, so I'm not a 100% sure, but I'm fairly convinced.

@mpl mpl added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Aug 26, 2022
@traefiker
Copy link
Contributor

Closed by #9288.

@traefiker traefiker added this to the 2.8 milestone Aug 31, 2022
@traefik traefik locked and limited conversation to collaborators Oct 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants