Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik return 500 internal error - no 500 on backend #4481

Closed
itielshwartz opened this issue Feb 11, 2019 · 14 comments
Closed

Traefik return 500 internal error - no 500 on backend #4481

itielshwartz opened this issue Feb 11, 2019 · 14 comments
Labels
area/server contributor/waiting-for-feedback kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. priority/P2 need to be fixed in the future status/5-frozen-due-to-age

Comments

@itielshwartz
Copy link

Do you want to request a feature or report a bug?

Bug

What did you do?

I use Traefik helm inside GKE,

Google LB -> Traefik -> Uwsgi -> Flask

When having high pressure I see Traefik logs:
500 Internal Server Error

At the same time my backend shows this:
[uwsgi-body-read] Error reading 162 bytes. Content-Length: 162 consumed: 0 left: 162 message: Client closed connection

Might be related to:
#615
OR
#3163

What did you expect to see?

200? Server timeout? not 500

What did you see instead?

500 internal server error, even the downstream backend didn't show anything

Output of traefik version: (What version of Traefik are you using?)

Version:      v1.7.8
Codename:     maroilles
Go version:   go1.11.5
Built:        2019-01-29_04:33:36PM
OS/Arch:      linux/amd64```

### What is your environment & configuration (arguments, toml, provider, platform, ...)?

```toml
# traefik.toml
logLevel = "INFO"
defaultEntryPoints = ["http","https"]
[entryPoints]
  [entryPoints.http]
  address = ":80"
  compress = true
    [entryPoints.http.forwardedHeaders]
    trustedIPs = ["0.0.0.0/0"]
    [entryPoints.http.redirect]
      regex = "^http://(.*)"
      replacement = "https://$1"
      permanent = true
  [entryPoints.https]
  address = ":443"
  compress = true
    [entryPoints.https.forwardedHeaders]
    trustedIPs = ["0.0.0.0/0"]
    [entryPoints.https.tls]
      minVersion = "VersionTLS12"
      cipherSuites = [
        "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
        "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
        "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
        "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA"
      ]
      [[entryPoints.https.tls.certificates]]
      CertFile = "/ssl/tls.crt"
      KeyFile = "/ssl/tls.key"
[kubernetes]
ingressClass = "dogfood-traefik"
  [kubernetes.ingressEndpoint]
  publishedService = "dogfood/dogfood-traefik"
[traefikLog]
  format = "json"
[accessLog]
  format = "json"
[accessLog.fields]
  defaultMode = "keep"
[accessLog.fields.names]
[accessLog.fields.headers]
  defaultMode = "keep"
[accessLog.fields.headers.names]
  [accessLog.filters]
    statusCodes = ["400-602"]
[metrics]
  [metrics.datadog]
    address = "datadog-agent-datadog.datadog.svc.cluster.local:8125"
    pushinterval = "10s"

im also running using this:

      --retry.attempts=10
      --forwardingTimeouts.dialTimeout=60s
      --forwardingTimeouts.responseHeaderTimeout=60s

my uwsgi config:

[uwsgi]
;https://uwsgi-docs.readthedocs.io/en/latest/HTTP.html
http = :8080
plugin= http
wsgi-file = main.py
callable = wsgi_application
processes = 10
enable-threads = true
master = true
die-on-term = true
reload-mercy = 30
worker-reload-mercy = 30
log-5xx = true
log-4xx = true
disable-logging = true
stats = 127.0.0.1:1717
stats-http = true
single-interpreter= true
;https://github.com/containous/traefik/issues/615
;traefik is 90 seconds, non-configurable
http-keepalive=100
add-header = Connection: Keep-Alive
so-keepalive= true
http-auto-chunked = true
;https://github.com/containous/traefik/issues/3163
http-timeout = 120
socket-timeout = 120
vacuum = true

If applicable, please paste the log output in DEBUG level (--logLevel=DEBUG switch)

I'm afraid to use the debug level log, as it will spam my logging system, any chance to increase logging only on errors?

@juliens juliens added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. priority/P2 need to be fixed in the future area/server and removed status/0-needs-triage labels Feb 11, 2019
@itielshwartz
Copy link
Author

Hi, any update on this?

@om0nster
Copy link

om0nster commented Aug 20, 2019

Add some debug info. It's likely go http keepalive use problem or server setting problem

2019/08/20 05:49:41.849 [E] [handler.go:48]  '500 Internal Server Error' caused by: http: server closed idle connection
2019/08/20 05:56:22.369 [E] [handler.go:48]  '502 Bad Gateway' caused by: EOF
2019/08/20 05:57:09.771 [E] [handler.go:48]  '500 Internal Server Error' caused by: http: server closed idle connection
2019/08/20 05:57:35.649 [E] [handler.go:48]  '500 Internal Server Error' caused by: http: server closed idle connection
2019/08/20 05:57:45.500 [E] [handler.go:48]  '502 Bad Gateway' caused by: read tcp 10.0.2.136:60842->10.0.2.3:8097: read: connection reset by peer
2019/08/20 05:58:09.236 [E] [handler.go:48]  '502 Bad Gateway' caused by: EOF
goroutine 1219 [running]:
runtime/debug.Stack(0xc0005449a0, 0xc0005449a8, 0x2101)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/containous/traefik/vendor/d.StatusBadGateway(0x2ed0a20)
	/go/src/github.com/containous/traefik/vendor/d/httpstatus.go:9 +0x20
github.com/containous/traefik/vendor/github.com/vulcand/oxy/utils.(*StdHandler).ServeHTTP(0x4fa5400, 0x2efc5c0, 0xc000bb21c0, 0xc0008d9900, 0x2ed0a20, 0xc00007c060)
	/go/src/github.com/containous/traefik/vendor/github.com/vulcand/oxy/utils/handler.go:41 +0x579
github.com/containous/traefik/vendor/github.com/vulcand/oxy/forward.ErrorHandlingRoundTripper.RoundTrip(0x2ed4ae0, 0xc000309c20, 0x2ed3640, 0x4fa5400, 0xc0008d9900, 0xf, 0xc00066a640, 0xa)
	/go/src/github.com/containous/traefik/vendor/github.com/vulcand/oxy/forward/fwd.go:168 +0x133
net/http/httputil.(*ReverseProxy).ServeHTTP(0xc000993ef0, 0x7ffb2e227a20, 0xc0002066d8, 0xc000544f98)
	/usr/local/go/src/net/http/httputil/reverseproxy.go:232 +0x2ff
github.com/containous/traefik/vendor/github.com/vulcand/oxy/forward.(*httpForwarder).serveHTTP(0xc00000e3f0, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9800, 0xc000b39a30)
	/go/src/github.com/containous/traefik/vendor/github.com/vulcand/oxy/forward/fwd.go:554 +0xb40
github.com/containous/traefik/vendor/github.com/vulcand/oxy/forward.(*Forwarder).ServeHTTP(0xc000b3d240, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9800)
	/go/src/github.com/containous/traefik/vendor/github.com/vulcand/oxy/forward/fwd.go:283 +0x121
github.com/containous/traefik/middlewares/pipelining.(*Pipelining).ServeHTTP(0xc000b39a40, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9800)
	/go/src/github.com/containous/traefik/middlewares/pipelining/pipelining.go:24 +0x6f
github.com/containous/traefik/vendor/github.com/vulcand/oxy/roundrobin.(*RoundRobin).ServeHTTP(0xc00000e460, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9700)
	/go/src/github.com/containous/traefik/vendor/github.com/vulcand/oxy/roundrobin/rr.go:147 +0x124
github.com/containous/traefik/middlewares.(*EmptyBackendHandler).ServeHTTP(0xc000b39a80, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9700)
	/go/src/github.com/containous/traefik/middlewares/empty_backend_handler.go:28 +0x118
github.com/containous/traefik/vendor/github.com/urfave/negroni.Wrap.func1(0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9700, 0xc000b35bc0)
	/go/src/github.com/containous/traefik/vendor/github.com/urfave/negroni/negroni.go:41 +0x4d
github.com/containous/traefik/vendor/github.com/urfave/negroni.HandlerFunc.ServeHTTP(0xc000b3d300, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9700, 0xc000b35bc0)
	/go/src/github.com/containous/traefik/vendor/github.com/urfave/negroni/negroni.go:24 +0x4e
github.com/containous/traefik/vendor/github.com/urfave/negroni.middleware.ServeHTTP(0x2ed6ba0, 0xc000b3d300, 0xc000b3d320, 0x7ffb2e227a20, 0xc0002066d8, 0xc0008d9700)
	/go/src/github.com/containous/traefik/vendor/github.com/urfave/negroni/negroni.go:33 +0x9c

@om0nster
Copy link

om0nster commented Aug 20, 2019

reason is python api server not set to support http keepalive, traefik default use http keepalive.

after set python api server support keepalive, problem fixed.

below code will disable keepalive force for debug and test

diff --git a/vendor/github.com/vulcand/oxy/forward/fwd.go b/vendor/github.com/vulcand/oxy/forward/fwd.go
index b8500489..34c90c87 100644
--- a/vendor/github.com/vulcand/oxy/forward/fwd.go
+++ b/vendor/github.com/vulcand/oxy/forward/fwd.go
@@ -534,6 +534,12 @@ func (f *httpForwarder) serveHTTP(w http.ResponseWriter, inReq *http.Request, ct
                ModifyResponse: f.modifyResponse,
                BufferPool:     f.bufferPool,
        }
+       if ert, ok := f.roundTripper.(ErrorHandlingRoundTripper); ok {
+               if rt, ok := ert.RoundTripper.(*http.Transport); ok {
+                       rt.DisableKeepAlives = true
+                       inReq.Close = true
+               }
+       }
 
        if f.log.GetLevel() >= log.DebugLevel {
                pw := [`utils.NewProxyWriter(w)

@joy17joy
Copy link

joy17joy commented Jun 9, 2020

We use uwsgi options --http=:80 and --http-keepalive=1,some random 500 disappear. Random 500 response shows when the uwsgi options is --http-socket=:80

@languitar
Copy link

We stumbled across this as well: https://community.traefik.io/t/observing-5xx-requests-with-content-size-header-0-but-requestcontentsize-0/10584 Would be nice if traefik did not blindly assume keep-alive support.

@jakubhajek
Copy link
Contributor

Here is a link to the scenario https://github.com/languitar/traefik-uwsgi-issue by @languitar that should help to reproduce that issue.

@languitar
Copy link

CC @Jeinhaus

@tomMoulard
Copy link
Member

Hello,

Thanks all for your interest in Traefik!

I tried to reproduce the issue and I came to some conclusions.

First, the status of keepalive of the webserver is not related: I tried with another web server, with and without keepalive enabled, it worked fine both ways.

Tests with custom server
---
Testing: no keepalive in the custom server
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:265614  
Error Set:

---
Testing: keepalive in the custom server
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:1019637  
Error Set:

---
Testing: no keepalive in the custom server with traefik
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:130116  
Error Set:

---
Testing: keepalive in the custom server with traefik
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:129455  
Error Set:

Second, It seems to be related to having a body in the request (why have a body in your GET request in the reproducible case ?).

Tests with custom server & uwsgi
---
Testing: no keepalive in the custom server with body
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:146016  
Error Set:

---
Testing: keepalive in the custom server with body
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:996105  
Error Set:

---
Testing: no keepalive in the custom server with traefik with body
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:126991  
Error Set:

---
Testing: keepalive in the custom server with traefik with body
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:128236  
Error Set:

---
Testing: no keepalive in uwsgi
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:195316  
Error Set:

---
Testing: no keepalive in uwsgi with body
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:196950  
Error Set:

---
Testing: keepalive in uwsgi
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:300668  
Error Set:

---
Testing: keepalive in uwsgi with body
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:296473  
Error Set:

---
Testing: no keepalive in uwsgi with traefik
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:112821  
Error Set:

---
Testing: no keepalive in uwsgi with traefik with body
Success       [ratio]                    68.74%
Status Codes  [code:count]               200:84642  500:19721  502:18764  
Error Set:
502 Bad Gateway
500 Internal Server Error

---
Testing: keepalive in uwsgi with traefik
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:143937  
Error Set:

---
Testing: keepalive in uwsgi with traefik with body
Success       [ratio]                    97.23%
Status Codes  [code:count]               200:75036  500:724  502:1405  
Error Set:
502 Bad Gateway
500 Internal Server Error

Third, you can mitigate 500 by setting maxidleconnsperhost to -1 but you will still get 502 errors.

Tests with `maxidleconnsperhost=-1`
---
Testing: no keepalive in uwsgi with traefik set to maxidleconnsperhost=-1
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:73964  
Error Set:

---
Testing: no keepalive in uwsgi with traefik set to maxidleconnsperhost=-1 with body
Success       [ratio]                    91.46%
Status Codes  [code:count]               200:27458  502:2563  
Error Set:
502 Bad Gateway

---
Testing: keepalive in uwsgi with traefik set to maxidleconnsperhost=-1
Success       [ratio]                    78.29%
Status Codes  [code:count]               200:28215  502:7825  
Error Set:
502 Bad Gateway


---
Testing: keepalive in uwsgi with traefik set to maxidleconnsperhost=-1 with body
Success       [ratio]                    93.00%
Status Codes  [code:count]               200:30521  502:2299  
Error Set:
502 Bad Gateway

Note that I used vegeta for all test cases(vegeta attack -rate=0 -max-workers=12 -duration=20s -http2) and that I removed unnecessary data for the output (like the duration, bytes transmitted, ...). You can find my custom server here.

When digging through Traefik's logs, we can find a few errors that explain the 502:

  • '502 Bad Gateway' caused by: readfrom tcp [...]->[...]: write tcp [...]->[...]: use of closed network connection
  • '502 Bad Gateway' caused by: EOF
  • '502 Bad Gateway' caused by: read tcp [...]->[...]: read: connection reset by peer
  • '502 Bad Gateway' caused by: readfrom tcp [...]->[...]: write tcp [...]->[...]: write: broken pipe

As for the 500 errors, I get those:

  • '500 Internal Server Error' caused by: http: server closed idle connection
  • '500 Internal Server Error' caused by: readLoopPeekFailLocked: read tcp [...]->[...]: read: connection reset by peer
  • '500 Internal Server Error' caused by: write tcp [...]->[...]: write: broken pipe
  • '500 Internal Server Error' caused by: write tcp [...]->[...]: write: connection reset by peer

These 500 are mitigated by setting maxidleconnsperhost to -1. It means that Traefik will not keep connections between calls.
It means that somehow the connection is closed between uwsgi and Traefik.
And since I tried with my own server, and I've shown that the issue does not come from Traefik, I suggest that the issue comes from uwsgi (like #2067, or #1623) but I am not an expert in uwsgi.

WDYT?

@languitar
Copy link

Hm, maybe the reproduction case I created also triggers further errors. But in our production setup the sporadic 500 codes have definitely disappeared after enabling keepalive in uwsgi. My suspicion was that traefik doesn't care for the keep alive header and always assumes support.

@ralfbecker

This comment has been minimized.

@ralfbecker

This comment has been minimized.

@tomMoulard

This comment has been minimized.

@ralfbecker

This comment has been minimized.

@traefiker
Copy link
Contributor

Hi! I'm Træfiker 🤖 the bot in charge of tidying up the issues.

I have to close this one because of its lack of activity 😞

Feel free to re-open it or join our Community Forum.

@traefik traefik locked and limited conversation to collaborators Sep 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/server contributor/waiting-for-feedback kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. priority/P2 need to be fixed in the future status/5-frozen-due-to-age
Projects
None yet
Development

No branches or pull requests

10 participants