Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik web entrypoint dies randomly #8071

Closed
urosgruber opened this issue Apr 15, 2021 · 21 comments
Closed

Traefik web entrypoint dies randomly #8071

urosgruber opened this issue Apr 15, 2021 · 21 comments
Labels
contributor/need-more-information kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/5-frozen-due-to-age
Projects

Comments

@urosgruber
Copy link

Do you want to request a feature or report a bug?

Bug

What did you do?

I've been using Traefik 1.x for almost a year without any downtimes. But after switching to 2.x I started to see downtimes. In the last month alone it happened about 4 times. There is no connection I could find why would this be happening and the error is also not descriptive to understand what to look.

{"entryPointName":"web","level":"error","msg":"set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e114.55.164.60:46112: setsockopt: connection reset by peer","time":"2021-04-14T03:39:21Z"}

Dashboard and websecure entrypoint are working fine at that time and I can clearly see there is nothing listening on port 8088 where web entrypoint is waiting for the request. In access log there is nothing suspicious but there is no connection from this IP either. Every time this happens external IP is different so I doubt any kind of attack is in place.

I've tested address: 0.0.0.0:8088 with different options like without IP, on fixed IP etc. always the same issue.

What did you expect to see?

http requests go through without disruption

What did you see instead?

http is randomly crashing

Output of traefik version: (What version of Traefik are you using?)

I'm using version traefik-2.4.8 (traefik-2.4.7 same issue)
OS: FreeBSD 12.1

Version:      2.4.8
Codename:     portbuild
Go version:   go1.16.2
Built:        2021-04-01_11:19:34AM
OS/Arch:      freebsd/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

pilot:
  token: "xxxxx"
global:
  checkNewVersion: false
  sendAnonymousUsage: false
entryPoints:
  web:
    address: 0.0.0.0:8088
  websecure:
    address: 0.0.0.0:8443
log:
  level: WARN
  filePath: /var/log/traefik.log
  format: json
accessLog:
  filePath: /var/log/traefik.access.log
  format: json
api:
  insecure: true
ping:
  entryPoint: "web"
certificatesResolvers:
  myresolver:
    acme:
      email: foo@bar.com
      storage: /usr/local/etc/acme.json
      httpChallenge:
        # used during the challenge
        entryPoint: web

providers:
  consulCatalog:
    prefix: traefik2
    exposedByDefault: false
    refreshInterval: 10s
    cache: false
    endpoint:
      address: 172.16.0.15:8500
  consul:
    endpoints:
      - "172.16.0.15:8500"

I can enable DEBUG but there is too much noise and this happens randomly so not sure if it could help here.

@jakubhajek
Copy link
Contributor

Hello @urosgruber

Thanks a lot for your interest in Traefik and for reporting the issue.

We take each reported issue seriously. That's why we try to reproduce on our test environment in order to provide outcomes that help to fix a potential bug. In that issue, we can't find any relevant information that can create a reproducible case.

Would you please try to review the Traefik debug logs in order to find the root cause of the issue? And then try providing the reproducible use case.

Thank you,

@urosgruber
Copy link
Author

@jakubhajek I've enabled debug level and now I need to wait for next crash.

@robske110
Copy link

@robske110
Copy link

robske110 commented Apr 19, 2021

After throwing out my old log search tool which just quits searching after a few thousand log entries apparently, I found this in my logs:
(I have included the previous and next unrelated message to show that there is indeed no further info)

121.196.15.242 - - [06/Apr/2021:06:31:05 +0000] "POST /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1" 404 25 "-" "-" 3679 "HouseControl@file" "http://192.168.1.19/" 2ms
ERRO[2021-04-06T08:31:05+02:00] set tcp [::c0a8:117]:80->121.196.15.242:38628: setsockopt: invalid argument  entryPointName=web
ERRO[2021-04-06T08:31:05+02:00] Error while starting server: set tcp [::c0a8:117]:80->121.196.15.242:38628: setsockopt: invalid argument  entryPointName=web
ERRO[2021-04-06T08:31:05+02:00] Error while starting server: set tcp [::c0a8:117]:80->121.196.15.242:38628: setsockopt: invalid argument  entryPointName=web
DEBU[2021-04-06T08:32:24+02:00] URL.Path is now /vwid3//carStatus.php (was //carStatus.php).  middlewareName=dev-idddatalogger-qualifier@file middlewareType=AddPrefix
DEBU[2021-04-06T08:32:24+02:00] vulcand/oxy/roundrobin/rr: begin ServeHttp on request  Request="{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/vwid3//carStatus.php\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"key=REDACTED\",\"Fragment\":\"\",\"RawFragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"en-gb\"],\"User-Agent\":[\"Scriptable/177 CFNetwork/1220.1 Darwin/20.3.0\"],\"X-Forwarded-Host\":[\"dev.id3.srv.ti.domain.de\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"some-iMac\"],\"X-Real-Ip\":[\"192.168.178.1\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"dev.id3.srv.ti.domain.de\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"192.168.178.1:63730\",\"RequestURI\":\"/vwid3//carStatus.php?key=REDACTED\",\"TLS\":null}"

Since that is with debug enabled, there is unfortunately no more info available. How can we collect more information on this issue?
After that log entry (which occurred 2 days after starting traefik), no further requests were handled on the web entrypoint.

@urosgruber
Copy link
Author

Thanks @robske110 thanks for the report. I'm glad I'm not the onlyone seeing this. Quick question is web only entrypoint that it dies on your end?

@urosgruber
Copy link
Author

Not sure if this could be related but my default kern.ipc.somaxconn was set to 128 and it might be to low. I've raised it to 1024 since server should be able to handle this very easily. What's yours @robske110 ?

@robske110
Copy link

Thanks @robske110 thanks for the report. I'm glad I'm not the onlyone seeing this. Quick question is web only entrypoint that it dies on your end?

Yes, for me, at least up to now, only one of my entry points die at a time. It's not always the web one, sometimes it's my websecure or one of the others.

Not sure if this could be related but my default kern.ipc.somaxconn was set to 128 and it might be to low. I've raised it to 1024 since server should be able to handle this very easily. What's yours @robske110 ?

Mine is also 128 (Please note that I am on Darwin / macOS). But looking at the error messages mine says setsockopt: invalid argument while yours is setsockopt: connection reset by peer. Maybe we have different causes? The result is the same, and I still wonder why traefik does not crash itself after detecting such a severe fault (one entrypoint down is something I'd consider quite catastrophic). That way restart scripts could workaround this issue.

@urosgruber
Copy link
Author

Agree. I've created a script to check if all the sockets are up and listening and in case one is down service restart is executed. But this reminds me back in the days with IIS memory leak issues :) I really hope something else we can setup to understand the root cause.

@urosgruber
Copy link
Author

It happened again. So even though I raised kern.ipc.somaxconn one entrypoint died without anything in logs even with debug enabled.

{"ForwardURL":{"Scheme":"http","Opaque":"","User":null,"Host":"192.168.0.42:8080","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Request":"{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/opgp/b5gekiyasumaouf9c51214849sd-1\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\",\"RawFragment\":\"\"},\"Proto\":\"HTTP/1.1\",\"ProtoMajor\":1,\"ProtoMinor\":1,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip, deflate\"],\"Cache-Control\":[\"no-cache\"],\"Connection\":[\"Keep-Alive\"],\"From\":[\"bingbot(at)microsoft.com\"],\"Pragma\":[\"no-cache\"],\"User-Agent\":[\"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)\"],\"X-Forwarded-Host\":[\"www.host1.si\"],\"X-Forwarded-Port\":[\"80\"],\"X-Forwarded-Proto\":[\"http\"],\"X-Forwarded-Server\":[\"traefik.local\"],\"X-Real-Ip\":[\"207.46.13.129\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"www.host1.si\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"207.46.13.129:20008\",\"RequestURI\":\"/opgp/b5gekiyasumaouf9c51214849sd-1\",\"TLS\":null}","level":"debug","msg":"vulcand/oxy/roundrobin/rr: Forwarding this request to URL","time":"2021-04-30T15:18:42Z"}
{"Request":"{\"Method\":\"GET\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/opgp/b5gekiyasumaouf9c51214849sd-1\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"\",\"Fragment\":\"\",\"RawFragment\":\"\"},\"Proto\":\"HTTP/1.1\",\"ProtoMajor\":1,\"ProtoMinor\":1,\"Header\":{\"Accept\":[\"*/*\"],\"Accept-Encoding\":[\"gzip, deflate\"],\"Cache-Control\":[\"no-cache\"],\"Connection\":[\"Keep-Alive\"],\"From\":[\"bingbot(at)microsoft.com\"],\"Pragma\":[\"no-cache\"],\"User-Agent\":[\"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)\"],\"X-Forwarded-Host\":[\"www.host1.si\"],\"X-Forwarded-Port\":[\"80\"],\"X-Forwarded-Proto\":[\"http\"],\"X-Forwarded-Server\":[\"traefik.local\"],\"X-Real-Ip\":[\"207.46.13.129\"]},\"ContentLength\":0,\"TransferEncoding\":null,\"Host\":\"www.host1.si\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"207.46.13.129:20008\",\"RequestURI\":\"/opgp/b5gekiyasumaouf9c51214849sd-1\",\"TLS\":null}","level":"debug","msg":"vulcand/oxy/roundrobin/rr: completed ServeHttp on request","time":"2021-04-30T15:18:42Z"}
{"entryPointName":"web","level":"error","msg":"set tcp 172.16.0.17:8088-\u003e101.37.28.132:59132: setsockopt: connection reset by peer","time":"2021-04-30T15:18:42Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e101.37.28.132:59132: setsockopt: connection reset by peer","time":"2021-04-30T15:18:42Z"}
{"entryPointName":"web","level":"error","msg":"Error while starting server: set tcp 172.16.0.17:8088-\u003e101.37.28.132:59132: setsockopt: connection reset by peer","time":"2021-04-30T15:18:42Z"}
{"Request":"{\"Method\":\"POST\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"_task=mail\\u0026_action=refresh\",\"Fragment\":\"\",\"RawFragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"application/json, text/javascript, */*; q=0.01\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"sl-SI,sl;q=0.9,en-GB;q=0.8,en;q=0.7\"],\"Content-Length\":[\"402\"],\"Content-Type\":[\"application/x-www-form-urlencoded; charset=UTF-8\"],\"Cookie\":[\"language=sl; roundcube_sessid=72b499c55b093ef32269b75640780040; roundcube_sessauth=2gOFJID5rbQeThEPLLTh5S5Qz7IQ8jKR-1619795700\"],\"Origin\":[\"https://webmail.local\"],\"Referer\":[\"https://webmail.local/?_task=mail\\u0026_mbox=INBOX\"],\"Sec-Ch-Ua\":[\"\\\" Not A;Brand\\\";v=\\\"99\\\", \\\"Chromium\\\";v=\\\"90\\\", \\\"Google Chrome\\\";v=\\\"90\\\"\"],\"Sec-Ch-Ua-Mobile\":[\"?0\"],\"Sec-Fetch-Dest\":[\"empty\"],\"Sec-Fetch-Mode\":[\"cors\"],\"Sec-Fetch-Site\":[\"same-origin\"],\"User-Agent\":[\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36\"],\"X-Forwarded-Host\":[\"webmail.local\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"traefik.local\"],\"X-Real-Ip\":[\"31.15.249.124\"],\"X-Requested-With\":[\"XMLHttpRequest\"],\"X-Roundcube-Request\":[\"byXTqCZiDguk3RHmlcXVpbyDYFk6vb3t\"]},\"ContentLength\":402,\"TransferEncoding\":null,\"Host\":\"webmail.local\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"31.15.249.124:55116\",\"RequestURI\":\"/?_task=mail\\u0026_action=refresh\",\"TLS\":null}","level":"debug","msg":"vulcand/oxy/roundrobin/rr: begin ServeHttp on request","time":"2021-04-30T15:18:43Z"}
{"ForwardURL":{"Scheme":"http","Opaque":"","User":null,"Host":"172.16.0.9:8080","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"Request":"{\"Method\":\"POST\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"_task=mail\\u0026_action=refresh\",\"Fragment\":\"\",\"RawFragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"application/json, text/javascript, */*; q=0.01\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"sl-SI,sl;q=0.9,en-GB;q=0.8,en;q=0.7\"],\"Content-Length\":[\"402\"],\"Content-Type\":[\"application/x-www-form-urlencoded; charset=UTF-8\"],\"Cookie\":[\"language=sl; roundcube_sessid=72b499c55b093ef32269b75640780040; roundcube_sessauth=2gOFJID5rbQeThEPLLTh5S5Qz7IQ8jKR-1619795700\"],\"Origin\":[\"https://webmail.local\"],\"Referer\":[\"https://webmail.local/?_task=mail\\u0026_mbox=INBOX\"],\"Sec-Ch-Ua\":[\"\\\" Not A;Brand\\\";v=\\\"99\\\", \\\"Chromium\\\";v=\\\"90\\\", \\\"Google Chrome\\\";v=\\\"90\\\"\"],\"Sec-Ch-Ua-Mobile\":[\"?0\"],\"Sec-Fetch-Dest\":[\"empty\"],\"Sec-Fetch-Mode\":[\"cors\"],\"Sec-Fetch-Site\":[\"same-origin\"],\"User-Agent\":[\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36\"],\"X-Forwarded-Host\":[\"webmail.local\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"traefik.local\"],\"X-Real-Ip\":[\"31.15.249.124\"],\"X-Requested-With\":[\"XMLHttpRequest\"],\"X-Roundcube-Request\":[\"byXTqCZiDguk3RHmlcXVpbyDYFk6vb3t\"]},\"ContentLength\":402,\"TransferEncoding\":null,\"Host\":\"webmail.local\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"31.15.249.124:55116\",\"RequestURI\":\"/?_task=mail\\u0026_action=refresh\",\"TLS\":null}","level":"debug","msg":"vulcand/oxy/roundrobin/rr: Forwarding this request to URL","time":"2021-04-30T15:18:43Z"}
{"Request":"{\"Method\":\"POST\",\"URL\":{\"Scheme\":\"\",\"Opaque\":\"\",\"User\":null,\"Host\":\"\",\"Path\":\"/\",\"RawPath\":\"\",\"ForceQuery\":false,\"RawQuery\":\"_task=mail\\u0026_action=refresh\",\"Fragment\":\"\",\"RawFragment\":\"\"},\"Proto\":\"HTTP/2.0\",\"ProtoMajor\":2,\"ProtoMinor\":0,\"Header\":{\"Accept\":[\"application/json, text/javascript, */*; q=0.01\"],\"Accept-Encoding\":[\"gzip, deflate, br\"],\"Accept-Language\":[\"sl-SI,sl;q=0.9,en-GB;q=0.8,en;q=0.7\"],\"Content-Length\":[\"402\"],\"Content-Type\":[\"application/x-www-form-urlencoded; charset=UTF-8\"],\"Cookie\":[\"language=sl; roundcube_sessid=72b499c55b093ef32269b75640780040; roundcube_sessauth=2gOFJID5rbQeThEPLLTh5S5Qz7IQ8jKR-1619795700\"],\"Origin\":[\"https://webmail.local\"],\"Referer\":[\"https://webmail.local/?_task=mail\\u0026_mbox=INBOX\"],\"Sec-Ch-Ua\":[\"\\\" Not A;Brand\\\";v=\\\"99\\\", \\\"Chromium\\\";v=\\\"90\\\", \\\"Google Chrome\\\";v=\\\"90\\\"\"],\"Sec-Ch-Ua-Mobile\":[\"?0\"],\"Sec-Fetch-Dest\":[\"empty\"],\"Sec-Fetch-Mode\":[\"cors\"],\"Sec-Fetch-Site\":[\"same-origin\"],\"User-Agent\":[\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36\"],\"X-Forwarded-Host\":[\"webmail.local\"],\"X-Forwarded-Port\":[\"443\"],\"X-Forwarded-Proto\":[\"https\"],\"X-Forwarded-Server\":[\"traefik.local\"],\"X-Real-Ip\":[\"31.15.249.124\"],\"X-Requested-With\":[\"XMLHttpRequest\"],\"X-Roundcube-Request\":[\"byXTqCZiDguk3RHmlcXVpbyDYFk6vb3t\"]},\"ContentLength\":402,\"TransferEncoding\":null,\"Host\":\"webmail.local\",\"Form\":null,\"PostForm\":null,\"MultipartForm\":null,\"Trailer\":null,\"RemoteAddr\":\"31.15.249.124:55116\",\"RequestURI\":\"/?_task=mail\\u0026_action=refresh\",\"TLS\":null}","level":"debug","msg":"vulcand/oxy/roundrobin/rr: completed ServeHttp on request","time":"2021-04-30T15:18:43Z"}

Is there something I can debug to find what is happening.

Can we pinpoint where in the code this is errored

set tcp 172.16.0.17:8088-\u003e101.37.28.132:59132: setsockopt: connection reset by peer

And why it tries to start it back twice without success and then back off.

@urosgruber
Copy link
Author

Not sure if this is related, but traffic comming from this IP on this very same day was always POST and the message right after request was
{"level":"debug","msg":"'499 Client Closed Request' caused by: context canceled","time":"2021-04-30T12:47:12Z"}

@robske110
Copy link

I think also a simple "hot fix" to force Traefik to crash after one of its endpoints went down will help here, since a restart script can then simply take over. I think continuing to run when maybe half of the proxy is dead is simply useless.
For me this still happens every other week. My error messages stay exactly the same as posted above and I couldn't find any pattern around them. Only that it tends to happen more during "busy" times of the day though, but that's neither consistent nor surprising.

@rtribotte rtribotte added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. and removed contributor/waiting-for-feedback labels May 3, 2021
@rtribotte
Copy link
Member

Hello @urosgruber @robske110,

Thanks for your interest in Traefik and your feedbacks.

Unfortunately, since the issue can't be reproduced easily, we cannot really troubleshoot it.
Please let us know if you find a consistent way to reproduce the bug.

@robske110
Copy link

robske110 commented May 4, 2021

Can you pinpoint where in the code setsockopt would be called? I am assuming it is probably in a library. Maybe we could run modified builds that output more information?
It is unlikely that we will find a consistent way to reproduce it because this is most likely either a race condition, highly dependent on network activity and maybe even a kernel bug.
I think the real bug/issue here is that Traefik does not handle this error correctly. In my opinion it should crash completely, allowing for example restart scripts / docker / systemd to restart it. What is your opinion on this? I think it just doesn't make sense to continue to run when an entrypoint is down and unreachable. Traefik apparently somehow detects the error and attempts to restart the server twice, but then just gives up and continues running like nothing happened, which is not a good choice in my opinion.

@urosgruber
Copy link
Author

Only thing I found is

err := serverHTTP.Serve(listener)

where it catches the error. But I'm not good with golang so can't really tell from where it was called and how to add more debug info.

@mathiasp
Copy link

mathiasp commented May 12, 2021

I seem to encounter the same problem.
FreeBSD 12 & 13 (just updated), now with traefik 2.4.8 from pkg, but this has been going on for some time. The service was allmost never used in the last months, but now I need to fix this.

Last message from debug log:

  time="2021-05-12T07:54:52+02:00" level=error msg="set tcp 176.9.205.227:443->193.46.255.97:51275: setsockopt: connection reset b
y peer" entryPointName=https
  time="2021-05-12T07:54:52+02:00" level=error msg="Error while starting server: set tcp 176.9.205.227:443->193.46.255.97:51275: s
etsockopt: connection reset by peer" entryPointName=https
  time="2021-05-12T07:54:52+02:00" level=error msg="Error while starting server: set tcp 176.9.205.227:443->193.46.255.97:51275: s
etsockopt: connection reset by peer" entryPointName=https

Still investigating.

@robske110
Copy link

I've encountered this issue on a entryPoint that rarely sees traffic, it seems to be not related to traffic volume.

ERRO[2021-05-10T02:41:57+02:00] set tcp 192.168.1.23:90->193.46.255.97:63925: setsockopt: invalid argument  entryPointName=hctrlAuth
ERRO[2021-05-10T02:41:57+02:00] Error while starting server: set tcp 192.168.1.23:90->193.46.255.97:63925: setsockopt: invalid argument  entryPointName=hctrlAuth
ERRO[2021-05-10T02:41:57+02:00] Error while starting server: set tcp 192.168.1.23:90->193.46.255.97:63925: setsockopt: invalid argument  entryPointName=hctrlAuth

@rtribotte Sorry to reiterate: Can you pinpoint where in the code setsockopt would be called? I am assuming it is probably in a library. Maybe we could run modified builds that output more information?
Could it be implemented that when Traefik detects that an entry point crashes, it shuts down completely to allow for example restart scripts / docker / systemd to restart it. Continuing to run a half dead proxy (or even completely dead if there is only one entry point) does not make sense to me.

@rtribotte
Copy link
Member

@robske110 Well, this error most probably comes from Golang standard library.
My best guess for the code of Traefik involved in the production of the error:

if err := tc.SetKeepAlive(true); err != nil {
return nil, err
}
if err := tc.SetKeepAlivePeriod(3 * time.Minute); err != nil {
// Some systems, such as OpenBSD, have no user-settable per-socket TCP
// keepalive options.
if !errors.Is(err, syscall.ENOPROTOOPT) {
return nil, err
}
}

Thus, without a reproduction case, it's not easy to go further to fix this issue.

You can comment those lines and make your own build to confirm that the problem comes from that part of the code. But even if you succeed to get rid of that error, we would not accept or make a PR, unless we could reproduce the issue.

Also, we are not willing to make Traefik stop whenever an entryPoint dies, it would be a breaking behavior, and it's not obviously expected, as you may want that traffic going on on other entryPoints to continue to be handled.

What you can do, besides detecting that Traefik died, is to health check Traefik on all its entryPoints, then restart it when you diagnose an error.

Unfortunately, as already said, without a reproduction case, it will be difficult to address this issue.

@cert-ghg
Copy link

I'm seeing similar issue with traefik v2.4.8 on docker container.

time="2021-05-24T13:12:48Z" level=error msg="set tcp [::]:443->80.249.131.237:10044: setsockopt: invalid argument" entryPointName=websecure
time="2021-05-24T13:12:48Z" level=error msg="Error while starting server: set tcp [::]:443->80.249.131.237:10044: setsockopt: invalid argument" entryPointName=websecure
time="2021-05-24T13:12:48Z" level=error msg="Error while starting server: set tcp [::]:443->80.249.131.237:10044: setsockopt: invalid argument" entryPointName=websecure

@traefiker
Copy link
Contributor

Hi! I'm Træfiker 🤖 the bot in charge of tidying up the issues.

I have to close this one because of its lack of activity 😞

Feel free to re-open it or join our Community Forum.

v2 automation moved this from issues to Done Jul 24, 2021
@urosgruber
Copy link
Author

I believe this is still happening. I have a workaround to check if ports are up and if not restart the service. But feels kinda lame, so anything we can add to get more debug info what is going on would be more than welcome.

@michbsd
Copy link

michbsd commented Sep 10, 2021

I am experiencing this issue too
Traefik 2.3.7 on FreeBSD

time="2021-09-10T14:52:09+02:00" level=error msg="set tcp xx.xx.xx.xx:8888->xx.xx.xx.xx:55434: setsockopt: connection reset by peer" entryPointName=http
time="2021-09-10T14:52:09+02:00" level=error msg="Error while starting server: set tcp xx.xx.xx.xx:8888->xx.xx.xx.xx:55434: setsockopt: connection reset by peer" entryPointName=http
time="2021-09-10T14:52:09+02:00" level=error msg="Error while starting server: set tcp xx.xx.xx.xx:8888->xx.xx.xx.xx:55434: setsockopt: connection reset by peer" entryPointName=http

@traefik traefik locked and limited conversation to collaborators Oct 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
contributor/need-more-information kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. status/5-frozen-due-to-age
Projects
No open projects
v2
Done
Development

No branches or pull requests

9 participants