Still a memory leak with k8s - 1.1 RC4 #844

rrichardson · 2016-11-14T16:48:38Z

I know this was addressed in #387

I also note that somebody reported an existing leak in the new solution (back in May).

I am running 1.1 RC4

Here is what I'm seeing:

The 2nd item in the list git the memory limit and was killed.

emilevauge · 2016-11-14T16:54:47Z

Ouch, can you give us more details? Which Kubernetes version are you using? Were you running traefik 1.0 before? Any difference?
/cc @containous/traefik

rrichardson · 2016-11-14T17:03:02Z

k8s version : Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:10:32Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

I haven't run a prior version of Traefik (at least not while monitoring and using the cluster much)

Are there any debug/gc metrics I can pull out of Traefik to help narrow this down?

emilevauge · 2016-11-14T17:45:12Z

Anything in the logs?

rrichardson · 2016-11-14T17:52:37Z

Here's an anonymized debug log, I'm not really sure what I'm looking for..

traefik.log.gz

There are a couple errors involving broken connections and failure to parse :

time="2016-11-14T16:43:45Z" level=error msg="Kubernetes connection error failed to decode watch event: GET "https://100.64.0.1:443/apis/extensions/v1beta1/ingresses\" : unexpected EOF, retrying in 605.782735ms"

time="2016-11-14T16:45:58Z" level=error msg="Kubernetes connection error failed to decode watch event: GET "https://100.64.0.1:443/api/v1/endpoints\" : invalid character 'o' looking for beginning of value, retrying in 276.215364ms"

Other than that, I can't find anything that might lead to a leak.

errm · 2016-11-14T18:26:50Z

Are you running a HA master with the --leader-elect flag set on kube-scheduler and/or kube-controller-manager ?

rrichardson · 2016-11-14T18:38:32Z

I am running HA with leader-elect=true on the controller-manager.
3 masters.

emilevauge · 2016-11-14T19:29:22Z

There was an old issue with event spamming, but it did not lead to a memory leak...
#449

emilevauge · 2016-11-15T09:11:08Z

I tried to reproduce this with Kubernetes v1.4.4 and have not run into a memory leak in ours, despite the leadership election events spamming.
Another way to investigate would be to kill -SIGABRT $PID_OF_TRAEFIK (while traefik uses a lot of memory) and send us the stack.

rrichardson · 2016-11-15T14:29:59Z

I wonder if it a function of the number of services/pods. We currently have about 70 pods across 30 services.

Either way, attached is the stack trace. (with 6 million goroutines :) )

traefik-abrt.log.gz

emilevauge · 2016-11-15T14:35:02Z

@rrichardson thanks for investigating on this, it will help us alot :)

emilevauge · 2016-11-16T09:58:57Z

Hey @rrichardson, it would be awesome if you could test with this docker image containous/traefik:k8s, it has been built with the fix #845.

rrichardson · 2016-11-16T16:24:09Z

The new image has been running for 1 hour and so far the results are encouraging. Each pod is using about 20MB of RAM. I don't have any pretty charts yet, but things look good so far.

emilevauge · 2016-11-16T17:18:03Z

@rrichardson I love what you are saying 👍

rrichardson · 2016-11-16T17:38:11Z

rrichardson · 2016-11-16T18:50:18Z

It hasn't leveled off yet, which is a bit of a concern. However, the previous build would increase at a rate of 35MB/hour. The current build went from 20MB to 40MB in 2.5 hours, or 8MB/hr. I'll check again this evening to see if it has stopped increasing.

emilevauge · 2016-11-16T18:52:00Z

@rrichardson

It hasn't leveled off yet, which is a bit of a concern

Indeed...
Could you kill -SIGABRT $PID_OF_TRAEFIK again?

rrichardson · 2016-11-16T19:02:53Z

traefik.log.gz

emilevauge · 2016-11-17T13:41:26Z

I updated the Docker image containous/traefik:k8s with another fix.
Goroutines and memory allocation are pretty stable on my laptop:

rrichardson · 2016-11-17T19:41:33Z

I've been breaking and fixing prometheus all day, so I don't have much historical data, but I have been running the new image for almost an hour, and so far.

In the previous build, the instances were at 25MB by now, these seem to be holding steady at 14MB. I'll report back tomorrow.

rrichardson · 2016-11-18T15:36:40Z

The latest fix definitely worked. All 3 instances are sitting at 14MB after almost 24 hours.

emilevauge · 2016-11-18T15:38:32Z

@rrichardson Awesome news :) I really would like to thank you for your help on this 👍

emilevauge added the investigation-needed label Nov 14, 2016

emilevauge added this to the 1.1 milestone Nov 15, 2016

emilevauge mentioned this issue Nov 15, 2016

Fix Kubernetes watch leak #845

Merged

rrichardson closed this as completed Nov 18, 2016

emilevauge mentioned this issue Nov 25, 2016

Fix k8s panic #900

Merged

ldez removed the investigation-needed label May 12, 2017

traefik locked and limited conversation to collaborators Sep 1, 2019

traefiker added the status/5-frozen-due-to-age label Sep 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Still a memory leak with k8s - 1.1 RC4 #844

Still a memory leak with k8s - 1.1 RC4 #844

rrichardson commented Nov 14, 2016

emilevauge commented Nov 14, 2016

rrichardson commented Nov 14, 2016

emilevauge commented Nov 14, 2016

rrichardson commented Nov 14, 2016 •

edited

errm commented Nov 14, 2016

rrichardson commented Nov 14, 2016 •

edited

emilevauge commented Nov 14, 2016

emilevauge commented Nov 15, 2016

rrichardson commented Nov 15, 2016 •

edited

emilevauge commented Nov 15, 2016 •

edited

emilevauge commented Nov 16, 2016

rrichardson commented Nov 16, 2016

emilevauge commented Nov 16, 2016

rrichardson commented Nov 16, 2016

rrichardson commented Nov 16, 2016

emilevauge commented Nov 16, 2016

rrichardson commented Nov 16, 2016

emilevauge commented Nov 17, 2016 •

edited

rrichardson commented Nov 17, 2016

rrichardson commented Nov 18, 2016

emilevauge commented Nov 18, 2016

Still a memory leak with k8s - 1.1 RC4 #844

Still a memory leak with k8s - 1.1 RC4 #844

Comments

rrichardson commented Nov 14, 2016

emilevauge commented Nov 14, 2016

rrichardson commented Nov 14, 2016

emilevauge commented Nov 14, 2016

rrichardson commented Nov 14, 2016 • edited

errm commented Nov 14, 2016

rrichardson commented Nov 14, 2016 • edited

emilevauge commented Nov 14, 2016

emilevauge commented Nov 15, 2016

rrichardson commented Nov 15, 2016 • edited

emilevauge commented Nov 15, 2016 • edited

emilevauge commented Nov 16, 2016

rrichardson commented Nov 16, 2016

emilevauge commented Nov 16, 2016

rrichardson commented Nov 16, 2016

rrichardson commented Nov 16, 2016

emilevauge commented Nov 16, 2016

rrichardson commented Nov 16, 2016

emilevauge commented Nov 17, 2016 • edited

rrichardson commented Nov 17, 2016

rrichardson commented Nov 18, 2016

emilevauge commented Nov 18, 2016

rrichardson commented Nov 14, 2016 •

edited

rrichardson commented Nov 14, 2016 •

edited

rrichardson commented Nov 15, 2016 •

edited

emilevauge commented Nov 15, 2016 •

edited

emilevauge commented Nov 17, 2016 •

edited