Thread killed by timeout reaper when sending large HTTP/2 response #593

ip1981 · 2016-11-24T20:50:57Z

With Warp 3.2.8 I'm getting error Thread killed by Warp's timeout reaper when sending large (GiB) bodies from backend servers. This looks similar to #351, but happens only on HTTP/2 (HTTP/1.1 to the backend).

The docs say

Every time data is successfully sent to the client, the timeout is tickled.

The data is definitely being sent to the client. Also this commit suggests that timeout should be disabled when request body is consumed, that I believe is the case.

Source code:

forward :: BE.Manager -> W.Application
forward mgr req resp = do
  let beReq = BE.defaultRequest
        { BE.method = W.requestMethod req
        , BE.path = W.rawPathInfo req
        , BE.queryString = W.rawQueryString req
        , BE.requestHeaders = modifyRequestHeaders $ W.requestHeaders req
        , BE.redirectCount = 0
        , BE.decompress = const False
        , BE.requestBody = case W.requestBodyLength req of
            W.ChunkedBody   -> requestBodySourceChunkedIO (sourceRequestBody req)
            W.KnownLength l -> requestBodySourceIO (fromIntegral l) (sourceRequestBody req)
        }
      msg = unpack (BE.method beReq <> " " <> BE.path beReq <> BE.queryString beReq)
  Log.debug $ "BACKEND <<< " ++ msg ++ " " ++ show (BE.requestHeaders beReq)
  BE.withResponse beReq mgr $ \res -> do
        let status = BE.responseStatus res
            headers = BE.responseHeaders res
            body = mapOutput (Chunk . fromByteString) . bodyReaderSource $ BE.responseBody res
            logging = if statusCode status `elem` [ 400, 500 ] then
                      Log.warn else Log.debug
        logging $ "BACKEND >>> " ++ show (statusCode status) ++ " on " ++ msg ++ " " ++ show headers ++ "\n"
        resp $ responseSource status (modifyResponseHeaders headers) body

The text was updated successfully, but these errors were encountered:

kazu-yamamoto · 2016-12-06T06:51:02Z

I need a small test case.
Consider this application:

{-# LANGUAGE OverloadedStrings #-}
import Network.Wai
import Network.HTTP.Types (status200)
import Network.Wai.Handler.Warp
import Control.Concurrent

application req respond =
    respond $ responseStream status200 [("Content-Type", "text/plain")] strm
  where
    strm write flush = do
        write "Hello\n"
        threadDelay 3000000
        write "World\n"

main = runSettings settings application
  where
    settings = setTimeout 1 $ setPort 3000 $ setOnException (\r e -> print e) defaultSettings

And test it with nghttp:

% nghttp http://localhost:3000
Hello
Some requests were not processed. total=1, processed=0

I think this is not your problem. No error is displayed in the we app side.

Could you modify this application to reproduce your problem easily?

lucasdicioccio · 2017-05-26T20:37:18Z

I think I can reproduce a similar problem with my nano test case and my experimental client library.

Client: my executable at https://github.com/lucasdicioccio/http2-client

Server:

hello = (ByteString.replicate 1024 80)

server :: IO ()
server =
    runTLS defaultTlsSettings (Warp.setOnException (\a b -> print (a,b)) $
                               Warp.setTimeout 3600 $
                               Warp.defaultSettings) app
  where
    app req f = f $ responseBuilder status200 [ ("Content-Type", "text/plain")
                                              , ("Content-Length", "5")
                                              ] $ copyByteString hello

If I don't set the timeout with Warp.setTimeout 3600 I noticed my client failing around 1min in (the default timeout being 30s. If I set the timeout to 5s then my client fails after 10s seconds. No idea what's going on, I've not tried whether 3600 gives me a 2h run for my client. I can have a look in a short while.

lucasdicioccio · 2017-05-30T05:56:44Z

I think I found a cause for at least my version of the bug.

In Run.hs when it is decided to switch to HTTP2 over normal HTTP, ( https://github.com/yesodweb/wai/blob/master/warp/Network/Wai/Handler/Warp/Run.hs#L334-L346 ), the HTTP side wraps the handle with a wrappedRecv. This function is responsible for tickling the timer. The HTTP2 side wraps the handle with makeReceiveN only.

@kazu-yamamoto what do you think?

kazu-yamamoto · 2017-05-30T07:50:33Z

@lucasdicioccio I think that your observation is right.
Could you create a wrapped version of makeReceiveN or something?

lucasdicioccio · 2017-05-30T19:53:46Z

Ack. I'll give it a shot this week-end.

This patch addresses a bug preventing long HTTP2 connections. See yesodweb#593 for an initial report, as well as yesodweb#593 (comment) for a reproducible example -- which this patch addresses (the actual root cause for yesodweb#593 might differ but seem closely related).

lucasdicioccio · 2017-05-30T20:40:14Z

Hi, @ip1981 can you give a try to my patch? It definitely fixed my problem but the patch might need more work (e.g., see my TODO in the current diff 51934a3 + I've cargo-culted an IORef which is only written-to -- and never read -- in HTTP2 but I don't know what it's used for in HTTP).

kazu-yamamoto · 2017-06-08T02:01:50Z

Ping @ip1981

ip1981 · 2017-06-08T05:05:29Z

It will take some time :)

ip1981 · 2017-06-10T12:19:11Z

Negative. I'm still getting timeout.

To be 100% sure I used cabal sandbox, installed warp from the http2-timer-tickle branch of https://github.co/lucasdicioccio/wai, then installed sproxy.

I used 1G file created with dd if=/dev/zero of=/var/tmp/zero bs=1M count=1024 and python -m SimpleHTTPServer 1234 as a backend.

Sproxy log:

DEBUG: BACKEND <<< GET /zero [("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),("Accept-Encoding","gzip, deflate"),("Accept-Language","ru,eo;q=0.7,en;q=0.3"),("dnt","1"),("From","pashev.igor@gmail.com"),("Host","example.ru:8443"),("Referer","https://example.ru:8443/"),("User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"),("X-Family-Name","\208\159\208\176\209\136\208\181\208\178"),("X-Forwarded-For","127.0.0.1"),("X-Forwarded-Proto","https"),("X-Given-Name","\208\152\208\179\208\190\209\128\209\140"),("X-Groups","foo")]
DEBUG: BACKEND >>> 200 on GET /zero [("Server","SimpleHTTP/0.6 Python/2.7.13"),("Date","Sat, 10 Jun 2017 12:13:53 GMT"),("Content-type","application/octet-stream"),("Content-Length","1073741824"),("Last-Modified","Sat, 10 Jun 2017 10:19:16 GMT")]

ERROR: 500 Internal Server Error: Thread killed by Warp's timeout reaper on GET example.ru:8443/zero HTTP/2.0 "https://example.ru:8443/" "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" from 127.0.0.1:46098

Did I miss something?

kazu-yamamoto · 2017-07-03T03:41:51Z

I finally reproduce @ip1981's problem (with @lucasdicioccio's patch).
I will try to fix it.

kazu-yamamoto · 2017-07-03T04:42:50Z

Here is what I did:

I created 1073741824-bytes file and configurd mighty to serve it
nghttp can download it
Chrome can download it
But Firefox stalls when a pop-up window ("save to a file?" stuff) is displayed

What I found is probably a bug of Firefox, not @ip1981's one.

This patch addresses a bug preventing long HTTP2 connections. See yesodweb#593 for an initial report, as well as yesodweb#593 (comment) for a reproducible example -- which this patch addresses (the actual root cause for yesodweb#593 might differ but seem closely related).

kazu-yamamoto · 2017-07-07T07:57:16Z

This happens Firefox on macOS only.
I guess that this is because of bugs of poll() on Sierra.
sendAll in the network package does not work on Sierra in a specific situation.

kazu-yamamoto · 2018-04-25T07:31:01Z

I'm 100% sure that Warp version 3.2.20 fixes this issue.

kazu-yamamoto added the http2 label Dec 6, 2016

kazu-yamamoto self-assigned this Dec 6, 2016

lucasdicioccio mentioned this issue May 30, 2017

Wrap recvN in HTTP2 to tickle the timer. #624

Closed

kazu-yamamoto added a commit to kazu-yamamoto/wai that referenced this issue Apr 25, 2018

specifying the correct timeout handler for a worker (yesodweb#593)

efc0fd2

kazu-yamamoto closed this as completed Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread killed by timeout reaper when sending large HTTP/2 response #593

Thread killed by timeout reaper when sending large HTTP/2 response #593

ip1981 commented Nov 24, 2016

kazu-yamamoto commented Dec 6, 2016

lucasdicioccio commented May 26, 2017

lucasdicioccio commented May 30, 2017

kazu-yamamoto commented May 30, 2017

lucasdicioccio commented May 30, 2017

lucasdicioccio commented May 30, 2017 •

edited

kazu-yamamoto commented Jun 8, 2017

ip1981 commented Jun 8, 2017

ip1981 commented Jun 10, 2017 •

edited

kazu-yamamoto commented Jul 3, 2017

kazu-yamamoto commented Jul 3, 2017

kazu-yamamoto commented Jul 7, 2017

kazu-yamamoto commented Apr 25, 2018

Thread killed by timeout reaper when sending large HTTP/2 response #593

Thread killed by timeout reaper when sending large HTTP/2 response #593

Comments

ip1981 commented Nov 24, 2016

kazu-yamamoto commented Dec 6, 2016

lucasdicioccio commented May 26, 2017

lucasdicioccio commented May 30, 2017

kazu-yamamoto commented May 30, 2017

lucasdicioccio commented May 30, 2017

lucasdicioccio commented May 30, 2017 • edited

kazu-yamamoto commented Jun 8, 2017

ip1981 commented Jun 8, 2017

ip1981 commented Jun 10, 2017 • edited

kazu-yamamoto commented Jul 3, 2017

kazu-yamamoto commented Jul 3, 2017

kazu-yamamoto commented Jul 7, 2017

kazu-yamamoto commented Apr 25, 2018

lucasdicioccio commented May 30, 2017 •

edited

ip1981 commented Jun 10, 2017 •

edited