Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffering/corruption of gRPC stream response #2264

Closed
tcolgate opened this issue Oct 14, 2017 · 7 comments
Closed

Buffering/corruption of gRPC stream response #2264

tcolgate opened this issue Oct 14, 2017 · 7 comments
Assignees
Labels
area/oxy kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. priority/P1 need to be fixed in next release status/5-frozen-due-to-age
Milestone

Comments

@tcolgate
Copy link
Contributor

What did you do?

We use traefik as a gateway for some of our gRPC services.
Single calls, and streaming requests appear to work fine.Streaming
responses are being buffered somewhere.
I've put together a small test case at:
https://github.com/tcolgate/grpcbuffer

What did you expect to see?

traefik should forward the individual response messages as they are sent

What did you see instead?

If I create messages around 512 bytes, and send them once a second, messages
are passed on in batches of 8

Output of traefik version: (What version of Traefik are you using?)

Version:      914f3d1fa324bba96c1cd2084bc928235b432e64
Codename:     cheddar
Go version:   go1.9.1
Built:        2017-10-13_05:17:17PM
OS/Arch:      linux/amd64

What is your environment & configuration (arguments, toml, provider, platform, ...)?

rootcas = [ "cert.pem" ]

[file]

[entryPoints]
  [entryPoints.https]
  address = ":8888"
    [entryPoints.https.tls]
      [[entryPoints.https.tls.certificates]]
      certFile = "cert.pem"
      keyFile = "key.pem"

[frontends]
  [frontends.frontend1]
  entrypoints = ["https"]
  backend = "backend1"

[backends]
  [backends.backend1]
    [backends.backend1.servers.server1]
    url = "https://localhost:443"
    weight = 10

If applicable, please paste the log output in debug mode (--debug switch)

There is no relevant output

@tcolgate
Copy link
Contributor Author

Wireshark suggests that the server in the example is sending the packets to traefik once per second, the traefik process ACKs, but the packets are not forwarded on.

@tcolgate
Copy link
Contributor Author

tcolgate commented Oct 14, 2017

This appears to be due to oxy buffering the responses. adding forward.StreamResponse() to the forwarder "fixes" the issue.
I still get an odd error in the client if I ctrl-c the server ("2017/10/14 09:57:08 err: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (1853121906 vs. 4194304)"

@tcolgate
Copy link
Contributor Author

tcolgate commented Oct 14, 2017

FWIW, swapping forward out of reverseproxy causes the same buffering behaviour unless a FlushInterval is set. However, httputil.ReverseProxy + FlushInterval appears to give the most correct behaviour (The error mentioned above does not appear).

@emilevauge emilevauge added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. area/oxy and removed status/0-needs-triage labels Oct 16, 2017
@tcolgate
Copy link
Contributor Author

I'm thinking of making Streaming configurable via an annotation. I'm experiencing another problem in production where, after an undermined (but long'ish), time, my gRPC streaming endpoints just start getting 502s. Forwarder complains about " stream error: stream ID 238203; PROTOCOL_ERROR". I'm nnot sure what is causing that, but am vaguely hopeful that streaming may solve both.

@emilevauge emilevauge added the priority/P1 need to be fixed in next release label Oct 20, 2017
@tcolgate
Copy link
Contributor Author

oxy/forward seems to have quite a bit of overhead. Makes copies of requests and headers, it's generally fairly complicated. ReverseProxy shows improved latency, throughput and stddev of latency, and not real difference in CPU usage (slightly lower insn/cycle, probably due to more time waiting on IO vs shuffling data).
I need to reimplement websocket support, but it might be worth considering. (we'll probably run with the patch at qubit I think).

@traefiker
Copy link
Contributor

Closed by #2309.

@traefiker traefiker added this to the 1.4 milestone Oct 24, 2017
@tcolgate
Copy link
Contributor Author

I'm not convinced that streaming for all HTTP/2 is really the best approach. Additionally, when I did enable streaming I saw another error (the one mentioned above grpc: received message larger than max ...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/oxy kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. priority/P1 need to be fixed in next release status/5-frozen-due-to-age
Projects
None yet
Development

No branches or pull requests

5 participants