Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted HTTP2 GOAWAY when receiving late WINDOW_UPDATE #639

Closed
lucasdicioccio opened this issue Aug 10, 2017 · 9 comments
Closed

Unwanted HTTP2 GOAWAY when receiving late WINDOW_UPDATE #639

lucasdicioccio opened this issue Aug 10, 2017 · 9 comments
Assignees
Labels

Comments

@lucasdicioccio
Copy link
Contributor

lucasdicioccio commented Aug 10, 2017

It's me again. Found some time continuing my http2 client.
While implementing flow-control I've realized there's a race-condition leading to unwanted GOWAYs.

The race-condition is triggered as follows:

All frames are on a same StreamID (except the GOAWAY which is on the control StreamID = 0).

  o client                        o server
   |                                   |
   |  ------------------------------>  |  HEADERS (set end stream)
   |  <------------------------------  |  HEADERS
   |  <------------------------------  |  DATA
   |  -------------_                   |  WINDOW_UPDATE (sent from client)
   |  <------------------------------  |  DATA (set end stream)
   |                  `------------->  |  WINDOW_UPDATE (received at server)
   |  <------------------------------  |  GOAWAY
   |                                   |

The GOAWAY message is: (FrameHeader {payloadLength = 51, flags = 0, streamId = 0},GoAwayFrame 127 ProtocolError "this frame is not allowed in an idel stream") .

The problem occurs in warp/Network/Wai/Handler/Warp/HTTP2/Receiver.hs when receiving a frame and looking up the StreamTable: the StreamId key has been removed when closing the stream after the last DATA chunk is sent and hence the receiver enters the "new stream logic", which checks for frame types, increasing stream IDs, etc.

The relevant RFC (from draft-ietf-httpbis-http2-17) section is:

   WINDOW_UPDATE can be sent by a peer that has sent a frame bearing the
   END_STREAM flag.  This means that a receiver could receive a
   WINDOW_UPDATE frame on a "half closed (remote)" or "closed" stream.
   A receiver MUST NOT treat this as an error, see Section 5.1.

and Section 5.1 says

      WINDOW_UPDATE or RST_STREAM frames can be received in this state
      for a short period after a DATA or HEADERS frame containing an
      END_STREAM flag is sent.  Until the remote peer receives and
      processes RST_STREAM or the frame bearing the END_STREAM flag, it
      might send frames of these types.  Endpoints MUST ignore
      WINDOW_UPDATE or RST_STREAM frames received in this state, though
      endpoints MAY choose to treat frames that arrive a significant
      time after sending END_STREAM as a connection error
      (Section 5.4.1) of type PROTOCOL_ERROR.

In short, the "closed state" should be maintained for some duration.
I'm currently using a workaround patch (https://gist.github.com/lucasdicioccio/852f6c7d069d272e64d89b9a51d73c04) which computes the delta between received frame ID and the max stream ID and ignores WINDOW_UPDATE frames when the delta is small (i.e., the stream existed recently). This workaround is acceptable in my setup because streams are short-lived and the delta I compute is well correlated with the age of the stream.

If you have guidance on what a proper fix would be, let me know and I can try implementing it.

@kazu-yamamoto kazu-yamamoto self-assigned this Sep 6, 2017
@kazu-yamamoto
Copy link
Contributor

Sorry for keeping you waiting. I'm back.
Thank you for your good catch.
I support your approach.
Please consider following two items and send us a PR:

  • Please add comments about lag, e.g. its purpose.
  • Please also ignore RST_STREAM if possible.

@lucasdicioccio
Copy link
Contributor Author

Thanks for the reply, don't worry for the delay :-). I'll probably have some time next Week-End to work on a PR.
Tha said, I don't feel like my proposed workaround is a proper fix and I'm afraid it would make finding this issue much harder => are you OK with keeping the issue open until we design and implement a proper fix?

lucasdicioccio added a commit to lucasdicioccio/wai that referenced this issue Jan 1, 2018
@kazu-yamamoto
Copy link
Contributor

Now I resume my HTTP/2 activity.
Please send me your PR!

@lucasdicioccio
Copy link
Contributor Author

Hi Kazu we need to catch each other working on similar topics in the same week!
I also resumed working on HTTP2/gRPC recently! :-)

I'd like to get a proper fix rather than a workaround like in my above diff. This diff measures "recently-closed" in terms of stream-IDS. However, I now think we should measure "recently closed" in time rather than stream-IDs. Otherwise we face a risk that a long "streamed" RPC (e.g., a subscription started at connection time), triggers this unwanted GOAWAY past a few hundreds of RPC calls. But if you're happy with just a workaround I can make the patch anyway.

How would you go about adding some extra state for "recently-closed" streams? could we maybe re-use warp's timers support for timeouts to clean up streamIDs (I'm afraid of the performance hit as we'll likely want to support thousands RPCs/sec on a single stream).

lucasdicioccio added a commit to lucasdicioccio/wai that referenced this issue Aug 25, 2018
This commit addresses yesodweb#639 by ignoring RST and WINDOW_UPDATE
frames on a closed (or-implictly-closed) stream.
@kazu-yamamoto
Copy link
Contributor

@lucasdicioccio Yes. You can add "recently-closed".

Should I merge #711 anyway? Or is it unnecessary if we implement "recently-closed"?

@lucasdicioccio
Copy link
Contributor Author

After some research I think we should go ahead with #711 and weigh the pros&cons before doing anything more. In particular, comparing to how the Go implementation handle this case. The Go handler for WindowUpdate ignores on closed streams (cf. https://github.com/golang/net/blob/master/http2/server.go#L1405-L1411) and the ResetHandler on an unreferenced stream (cf. https://github.com/golang/net/blob/master/http2/server.go#L1437). That is, the Go implementation does not carry the perf+code-maintenance overhead of keeping an up-to-date list of "recently-closed".

@lucasdicioccio
Copy link
Contributor Author

Aslo, FYI my warp-grpc implementation works like a charm with #711. I'd rather just merge this simple patch and move on :-) to help on other bugs.

kazu-yamamoto pushed a commit to kazu-yamamoto/wai that referenced this issue Aug 30, 2018
This commit addresses yesodweb#639 by ignoring RST and WINDOW_UPDATE
frames on a closed (or-implictly-closed) stream.
@lucasdicioccio
Copy link
Contributor Author

Thanks Kazu for the merge! do you think we can close this Issue or keep it around?

@lucasdicioccio
Copy link
Contributor Author

Let's say it's closed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants