New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traefik web entrypoint dies randomly #8071
Comments
Hello @urosgruber Thanks a lot for your interest in Traefik and for reporting the issue. We take each reported issue seriously. That's why we try to reproduce on our test environment in order to provide outcomes that help to fix a potential bug. In that issue, we can't find any relevant information that can create a reproducible case. Would you please try to review the Traefik debug logs in order to find the root cause of the issue? And then try providing the reproducible use case. Thank you, |
@jakubhajek I've enabled debug level and now I need to wait for next crash. |
I have a very similar problem at https://community.traefik.io/t/traffic-stops-accepting-connections-on-certain-entrypoints-after-running-for-about-a-week/10390 |
After throwing out my old log search tool which just quits searching after a few thousand log entries apparently, I found this in my logs:
Since that is with debug enabled, there is unfortunately no more info available. How can we collect more information on this issue? |
Thanks @robske110 thanks for the report. I'm glad I'm not the onlyone seeing this. Quick question is web only entrypoint that it dies on your end? |
Not sure if this could be related but my default |
Yes, for me, at least up to now, only one of my entry points die at a time. It's not always the web one, sometimes it's my websecure or one of the others.
Mine is also 128 (Please note that I am on Darwin / macOS). But looking at the error messages mine says |
Agree. I've created a script to check if all the sockets are up and listening and in case one is down service restart is executed. But this reminds me back in the days with IIS memory leak issues :) I really hope something else we can setup to understand the root cause. |
It happened again. So even though I raised
Is there something I can debug to find what is happening. Can we pinpoint where in the code this is errored set tcp 172.16.0.17:8088-\u003e101.37.28.132:59132: setsockopt: connection reset by peer And why it tries to start it back twice without success and then back off. |
Not sure if this is related, but traffic comming from this IP on this very same day was always POST and the message right after request was |
I think also a simple "hot fix" to force Traefik to crash after one of its endpoints went down will help here, since a restart script can then simply take over. I think continuing to run when maybe half of the proxy is dead is simply useless. |
Hello @urosgruber @robske110, Thanks for your interest in Traefik and your feedbacks. Unfortunately, since the issue can't be reproduced easily, we cannot really troubleshoot it. |
Can you pinpoint where in the code setsockopt would be called? I am assuming it is probably in a library. Maybe we could run modified builds that output more information? |
Only thing I found is traefik/pkg/server/server_entrypoint_tcp.go Line 531 in 31a5f35
where it catches the error. But I'm not good with golang so can't really tell from where it was called and how to add more debug info. |
I seem to encounter the same problem. Last message from debug log:
Still investigating. |
I've encountered this issue on a entryPoint that rarely sees traffic, it seems to be not related to traffic volume.
@rtribotte Sorry to reiterate: Can you pinpoint where in the code setsockopt would be called? I am assuming it is probably in a library. Maybe we could run modified builds that output more information? |
@robske110 Well, this error most probably comes from Golang standard library. traefik/pkg/server/server_entrypoint_tcp.go Lines 344 to 354 in 6ae1949
Thus, without a reproduction case, it's not easy to go further to fix this issue. You can comment those lines and make your own build to confirm that the problem comes from that part of the code. But even if you succeed to get rid of that error, we would not accept or make a PR, unless we could reproduce the issue. Also, we are not willing to make Traefik stop whenever an entryPoint dies, it would be a breaking behavior, and it's not obviously expected, as you may want that traffic going on on other entryPoints to continue to be handled. What you can do, besides detecting that Traefik died, is to health check Traefik on all its entryPoints, then restart it when you diagnose an error. Unfortunately, as already said, without a reproduction case, it will be difficult to address this issue. |
I'm seeing similar issue with traefik v2.4.8 on docker container.
|
Hi! I'm Træfiker 🤖 the bot in charge of tidying up the issues. I have to close this one because of its lack of activity 😞 Feel free to re-open it or join our Community Forum. |
I believe this is still happening. I have a workaround to check if ports are up and if not restart the service. But feels kinda lame, so anything we can add to get more debug info what is going on would be more than welcome. |
I am experiencing this issue too
|
Do you want to request a feature or report a bug?
Bug
What did you do?
I've been using Traefik 1.x for almost a year without any downtimes. But after switching to 2.x I started to see downtimes. In the last month alone it happened about 4 times. There is no connection I could find why would this be happening and the error is also not descriptive to understand what to look.
Dashboard and websecure entrypoint are working fine at that time and I can clearly see there is nothing listening on port 8088 where web entrypoint is waiting for the request. In access log there is nothing suspicious but there is no connection from this IP either. Every time this happens external IP is different so I doubt any kind of attack is in place.
I've tested
address: 0.0.0.0:8088
with different options like without IP, on fixed IP etc. always the same issue.What did you expect to see?
http requests go through without disruption
What did you see instead?
http is randomly crashing
Output of
traefik version
: (What version of Traefik are you using?)I'm using version traefik-2.4.8 (traefik-2.4.7 same issue)
OS: FreeBSD 12.1
What is your environment & configuration (arguments, toml, provider, platform, ...)?
I can enable DEBUG but there is too much noise and this happens randomly so not sure if it could help here.
The text was updated successfully, but these errors were encountered: