New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receive buffers are incorrectly aliased, causing corruption of request body and framing data. #406

Closed
awpr opened this Issue Jul 28, 2015 · 10 comments

Comments

Projects
None yet
3 participants
@awpr
Member

awpr commented Jul 28, 2015

TL;DR I'm almost certain that BufferPool, or some misuse of it, is breaking referential transparency of ByteStrings.

The setup:

  • Modify Warp HTTP/2 to accept 10,000 concurrent streams instead of only 100. (This isn't strictly necessary, but you'll be re-running the client program a lot of times otherwise.) This entails replacing 'recommendedConcurrency' with 10000 in both HTTP2/Sender.hs and HTTP2/Receiver.hs.
  • Make a client that attempts to make 10,000 concurrent POSTS with the same contents on a single HTTP/2 connection.
  • Make a server that simply checks the request body for the expected value and prints something (e.g. "ERROR: "<>requestBody) to stdout if it's wrong.
  • Run the client against the server.

What happens:

  • A lot of requests will be fine at the beginning. After a while (well, some fractions of a second), you start seeing a lot of "ERROR: etc.". Sometimes the connection is killed.
  • [0] The observed incorrect contents of the request body are overwhelmingly the right length. Very rarely they are the wrong length, and the connection dies shortly afterwards.
  • [1] The observed incorrect contents of the request body are consistently fragments of the HTTP/2 protocol. Usually some part of the expected data is present, but with frame headers and other frames' contents instead of parts of it.

Spooky debugging notes:

  • [2] Adding print traces to recvN in Receiver.hs shows (among many correct reads) some incorrect ones, but not as many as there are incorrect request bodies.
  • [3] Adding print traces inside 'receiveloop' in Recv.hs always shows the expected HTTP/2 data.
  • [4] With both of the above traces active, you can see that the errors happen when a new 'receiveloop' happens; and the 'receiveloop' traces are occasionally inconsistent with the 'recvN' traces, which themselves are occasionally inconsistent with the application's 'ERROR' prints.

Notes on why the above are consistent with buffer aliasing issues:

  • HTTP/2 frame headers are interpreted immediately after reading them. Request bodies are interpreted significantly later in a different thread. [0] suggests that corruption occurs far more often in request bodies themselves than in request headers. This is consistent with buffers being overwritten, since this is less likely to affect earlier reads.
  • The HTTP/2 protocol fragments noted in [1] suggest that the data being presented to the Application is still making it to the framing sequence! Otherwise the connection would almost certainly terminate after the first error, since the framing would be out of sync. This is consistent with ByteStrings being incorrectly aliased.
  • The inconsistencies noted in [2], [3], and [4] suggest that the buffers are changing out from under the code, rather than being mishandled in some referentially transparent way.
  • That the errors happen right after new 'receiveloop' chunks are read in [4] is also consistent with buffer misuse: 'receiveloop' somehow pulls an aliased buffer and overwrites it, and suddenly a bunch of ByteStrings that used to be request bodies are now something else.

The crucial evidence (a.k.a why did I even write all of the above?):

  • Stubbing out 'usefulBuffer' in Buffer.hs with 'const False' makes the problem disappear entirely.

Some other info:

  • % ghc --version

The Glorious Glasgow Haskell Compilation System, version 7.10.1

  • Wai/Warp at Git commit f370437 (current master, "Version bump").
  • Builds with (-O2), (-O2 -threaded), and [no flags] are all affected.
  • Cabal sandbox contents:

aeson-0.9.0.1-7365c51d3d907371be889d743b3c50e9.conf
aeson-pretty-0.7.2-1a0bc1cf180193a64c8c88ed50922857.conf
appar-0.1.4-4bc5b0597a63038176cc0329e74ad375.conf
attoparsec-0.13.0.1-320991fb573b51aa07d37e52c2b02288.conf
auto-update-0.1.2.2-911de7ca12c6b21b9fc101236242cccb.conf
blaze-builder-0.4.0.1-bab26a30fba957c623746d00f18aec3a.conf
byteorder-1.0.4-34904fc7a33a54eaddd22171b75ba4e7.conf
bytestring-builder-0.10.6.0.0-0ef0c82e02fc5f1fa77c665b35287eb2.conf
case-insensitive-1.2.0.4-cc72d0c8dbe680cb1e3f8c723bfda680.conf
cmdargs-0.10.13-6c17a061a453bc0c80fae1f74f5f59fb.conf
hashable-1.2.3.3-0e969725fa1779f5aa9d26791de72722.conf
hex-0.1.2-6d1fddb523ee827e9437565bf2920098.conf
http2-1.0.2-8ce47b4e257a42504add8c9d6e52236d.conf
http-date-0.0.6.1-299eab5dd5b562a3b4186176ecc5f7ca.conf
http-types-0.8.6-6ca2c04caa27775edc6905f83c33e802.conf
iproute-1.5.0-32483e4535afa06bdeda29477a76e706.conf
mwc-random-0.13.3.2-c65cecb61437157d65a70ae606396cc5.conf
network-2.6.2.1-d8eece37fed05c5fe286461dc0f78883.conf
package.cache
primitive-0.6-cd5e7f2c82e67169d584438d071a9b1a.conf
scientific-0.3.3.8-029229f9aa3d3261c3742d2cb26f18ab.conf
simple-sendfile-0.2.21-c0ccfb085a4ebfa91f8566f5a7304318.conf
stm-2.4.4-1ae235706b0ce6e6ef8d19cd83227592.conf
streaming-commons-0.1.12.1-543275aec8c588f40a24e535f071d3db.conf
syb-0.5.1-9c22b124c53290dc8251e48995b9698d.conf
text-1.2.1.1-e2477224a964de7f076f81877e94e68f.conf
unix-compat-0.4.1.4-8f5d7fb77e066846ca7d83671322a94b.conf
unordered-containers-0.2.5.1-726036df88909866f487919c94b0cede.conf
vault-0.3.0.4-918d462079e7a68fe0703c0bdb393e32.conf
vector-0.11.0.0-d8765e5e59af4c21a76fe8ca2f3eb8fd.conf
wai-3.0.3.0-2bf485482fe3dae077cd82b44162b4de.conf
warp-3.1.0-dc7b30d0b43f25df86f09f97daac96cd.conf
word8-0.1.2-7612cd5c904a9961bc4fa8ca13d1986b.conf
zlib-0.6.1.1-555bd3cef0ce1b18a28e4089cd29dd30.conf

@awpr

This comment has been minimized.

Member

awpr commented Jul 28, 2015

Oh.

@awpr

This comment has been minimized.

Member

awpr commented Jul 28, 2015

This isn't limited to just HTTP/2, and I've got a fairly simple Application that triggers it for HTTP1. The short summary is: stream the whole request body a chunk at a time, storing both the original ByteString and a WHNF'd copy. Compare the two "copies" of the request body and respond "error" if they mismatch. This consistently says "error" for request bodies 16K bytes or larger, and consistently "good" for <= 15K request bodies.

I think this is something we can add as a test; I'll work on this and the BufferPool test this evening.

@awpr

This comment has been minimized.

Member

awpr commented Jul 28, 2015

Also recording here that this affects warp-3.1.0 but not 3.0.13.1.

@creichert

This comment has been minimized.

Member

creichert commented Jul 28, 2015

Great, the tests make it much easier to understand the context and help
contribute (esp. for some of the in-depth http2 issues). It's much
appreciated.

I'm trying to get an exact sha1 to tag warp-3.1.

On Tue, Jul 28 2015, awpr notifications@github.com wrote:

Also recording here that this affects warp-3.1.0 but not 3.0.13.1.


Reply to this email directly or view it on GitHub:
#406 (comment)

@awpr

This comment has been minimized.

Member

awpr commented Jul 28, 2015

It appears to be 8eb8a0b -- the source tarball on Hackage contains the Changelog changes of that commit but not the 'hasBody' changes of the following one.

@creichert creichert added the http2 label Jul 28, 2015

@codedmart

This comment has been minimized.

codedmart commented Jul 29, 2015

@awpr I believe we are seeing a related issue in Servant which uses warp. Once we run our tests with warp 3.1 we have a failure on a route that has a large request body where we return [(String, [Rational])]. I would have to confirm the exact size she it fails, but this sounds related.

@awpr

This comment has been minimized.

Member

awpr commented Jul 29, 2015

That does sound like this bug. I've updated my pull request with a test that would provoke the issue.

@codedmart

This comment has been minimized.

codedmart commented Jul 29, 2015

@awpr your pull request solves this issue for you?

@awpr

This comment has been minimized.

Member

awpr commented Jul 29, 2015

Yes -- the problem was that a BufferPool would get stuck handing out the same buffer in certain cases.

@awpr awpr changed the title from Applications occasionally get incorrect request body data in HTTP/2. to Receive buffers are incorrectly aliased, causing corruption of request body and framing data. Jul 29, 2015

@codedmart

This comment has been minimized.

codedmart commented Jul 29, 2015

@awpr thanks I will check that against my code in the morning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment