New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Still a memory leak with k8s - 1.1 RC4 #844
Comments
Ouch, can you give us more details? Which Kubernetes version are you using? Were you running traefik 1.0 before? Any difference? |
k8s version : Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:10:32Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"} I haven't run a prior version of Traefik (at least not while monitoring and using the cluster much) Are there any debug/gc metrics I can pull out of Traefik to help narrow this down? |
Anything in the logs? |
Here's an anonymized debug log, I'm not really sure what I'm looking for.. There are a couple errors involving broken connections and failure to parse : time="2016-11-14T16:43:45Z" level=error msg="Kubernetes connection error failed to decode watch event: GET "https://100.64.0.1:443/apis/extensions/v1beta1/ingresses\" : unexpected EOF, retrying in 605.782735ms" time="2016-11-14T16:45:58Z" level=error msg="Kubernetes connection error failed to decode watch event: GET "https://100.64.0.1:443/api/v1/endpoints\" : invalid character 'o' looking for beginning of value, retrying in 276.215364ms" Other than that, I can't find anything that might lead to a leak. |
Are you running a HA master with the |
I am running HA with leader-elect=true on the controller-manager. |
There was an old issue with event spamming, but it did not lead to a memory leak... |
I tried to reproduce this with Kubernetes v1.4.4 and have not run into a memory leak in ours, despite the leadership election events spamming. |
I wonder if it a function of the number of services/pods. We currently have about 70 pods across 30 services. Either way, attached is the stack trace. (with 6 million goroutines :) ) |
@rrichardson thanks for investigating on this, it will help us alot :) |
Hey @rrichardson, it would be awesome if you could test with this docker image |
The new image has been running for 1 hour and so far the results are encouraging. Each pod is using about 20MB of RAM. I don't have any pretty charts yet, but things look good so far. |
@rrichardson I love what you are saying 👍 |
It hasn't leveled off yet, which is a bit of a concern. However, the previous build would increase at a rate of 35MB/hour. The current build went from 20MB to 40MB in 2.5 hours, or 8MB/hr. I'll check again this evening to see if it has stopped increasing. |
Indeed... |
The latest fix definitely worked. All 3 instances are sitting at 14MB after almost 24 hours. |
@rrichardson Awesome news :) I really would like to thank you for your help on this 👍 |
I know this was addressed in #387
I also note that somebody reported an existing leak in the new solution (back in May).
I am running 1.1 RC4
Here is what I'm seeing:
The 2nd item in the list git the memory limit and was killed.
The text was updated successfully, but these errors were encountered: