New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread killed by timeout reaper when sending large HTTP/2 response #593

Closed
ip1981 opened this Issue Nov 24, 2016 · 13 comments

Comments

Projects
None yet
3 participants
@ip1981

ip1981 commented Nov 24, 2016

With Warp 3.2.8 I'm getting error Thread killed by Warp's timeout reaper when sending large (GiB) bodies from backend servers. This looks similar to #351, but happens only on HTTP/2 (HTTP/1.1 to the backend).

The docs say

Every time data is successfully sent to the client, the timeout is tickled.

The data is definitely being sent to the client. Also this commit suggests that timeout should be disabled when request body is consumed, that I believe is the case.

Source code:

forward :: BE.Manager -> W.Application
forward mgr req resp = do
  let beReq = BE.defaultRequest
        { BE.method = W.requestMethod req
        , BE.path = W.rawPathInfo req
        , BE.queryString = W.rawQueryString req
        , BE.requestHeaders = modifyRequestHeaders $ W.requestHeaders req
        , BE.redirectCount = 0
        , BE.decompress = const False
        , BE.requestBody = case W.requestBodyLength req of
            W.ChunkedBody   -> requestBodySourceChunkedIO (sourceRequestBody req)
            W.KnownLength l -> requestBodySourceIO (fromIntegral l) (sourceRequestBody req)
        }
      msg = unpack (BE.method beReq <> " " <> BE.path beReq <> BE.queryString beReq)
  Log.debug $ "BACKEND <<< " ++ msg ++ " " ++ show (BE.requestHeaders beReq)
  BE.withResponse beReq mgr $ \res -> do
        let status = BE.responseStatus res
            headers = BE.responseHeaders res
            body = mapOutput (Chunk . fromByteString) . bodyReaderSource $ BE.responseBody res
            logging = if statusCode status `elem` [ 400, 500 ] then
                      Log.warn else Log.debug
        logging $ "BACKEND >>> " ++ show (statusCode status) ++ " on " ++ msg ++ " " ++ show headers ++ "\n"
        resp $ responseSource status (modifyResponseHeaders headers) body

@kazu-yamamoto kazu-yamamoto added the http2 label Dec 6, 2016

@kazu-yamamoto kazu-yamamoto self-assigned this Dec 6, 2016

@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented Dec 6, 2016

I need a small test case.
Consider this application:

{-# LANGUAGE OverloadedStrings #-}
import Network.Wai
import Network.HTTP.Types (status200)
import Network.Wai.Handler.Warp
import Control.Concurrent

application req respond =
    respond $ responseStream status200 [("Content-Type", "text/plain")] strm
  where
    strm write flush = do
        write "Hello\n"
        threadDelay 3000000
        write "World\n"

main = runSettings settings application
  where
    settings = setTimeout 1 $ setPort 3000 $ setOnException (\r e -> print e) defaultSettings

And test it with nghttp:

% nghttp http://localhost:3000
Hello
Some requests were not processed. total=1, processed=0

I think this is not your problem. No error is displayed in the we app side.

Could you modify this application to reproduce your problem easily?

@lucasdicioccio

This comment has been minimized.

Contributor

lucasdicioccio commented May 26, 2017

I think I can reproduce a similar problem with my nano test case and my experimental client library.

Client: my executable at https://github.com/lucasdicioccio/http2-client

Server:

hello = (ByteString.replicate 1024 80)

server :: IO ()
server =
    runTLS defaultTlsSettings (Warp.setOnException (\a b -> print (a,b)) $
                               Warp.setTimeout 3600 $
                               Warp.defaultSettings) app
  where
    app req f = f $ responseBuilder status200 [ ("Content-Type", "text/plain")
                                              , ("Content-Length", "5")
                                              ] $ copyByteString hello

If I don't set the timeout with Warp.setTimeout 3600 I noticed my client failing around 1min in (the default timeout being 30s. If I set the timeout to 5s then my client fails after 10s seconds. No idea what's going on, I've not tried whether 3600 gives me a 2h run for my client. I can have a look in a short while.

@lucasdicioccio

This comment has been minimized.

Contributor

lucasdicioccio commented May 30, 2017

I think I found a cause for at least my version of the bug.

In Run.hs when it is decided to switch to HTTP2 over normal HTTP, ( https://github.com/yesodweb/wai/blob/master/warp/Network/Wai/Handler/Warp/Run.hs#L334-L346 ), the HTTP side wraps the handle with a wrappedRecv. This function is responsible for tickling the timer. The HTTP2 side wraps the handle with makeReceiveN only.

@kazu-yamamoto what do you think?

@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented May 30, 2017

@lucasdicioccio I think that your observation is right.
Could you create a wrapped version of makeReceiveN or something?

@lucasdicioccio

This comment has been minimized.

Contributor

lucasdicioccio commented May 30, 2017

Ack. I'll give it a shot this week-end.

lucasdicioccio added a commit to lucasdicioccio/wai that referenced this issue May 30, 2017

Wrap recvN in HTTP2 to tickle the timer.
This patch addresses a bug preventing long HTTP2 connections.  See yesodweb#593
for an initial report, as well as
yesodweb#593 (comment) for a
reproducible example -- which this patch addresses (the actual root
cause for yesodweb#593 might differ but seem closely related).
@lucasdicioccio

This comment has been minimized.

Contributor

lucasdicioccio commented May 30, 2017

Hi, @ip1981 can you give a try to my patch? It definitely fixed my problem but the patch might need more work (e.g., see my TODO in the current diff 51934a3 + I've cargo-culted an IORef which is only written-to -- and never read -- in HTTP2 but I don't know what it's used for in HTTP).

@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented Jun 8, 2017

Ping @ip1981

@ip1981

This comment has been minimized.

ip1981 commented Jun 8, 2017

It will take some time :)

@ip1981

This comment has been minimized.

ip1981 commented Jun 10, 2017

Negative. I'm still getting timeout.

To be 100% sure I used cabal sandbox, installed warp from the http2-timer-tickle branch of https://github.co/lucasdicioccio/wai, then installed sproxy.

I used 1G file created with dd if=/dev/zero of=/var/tmp/zero bs=1M count=1024 and python -m SimpleHTTPServer 1234 as a backend.

Sproxy log:

DEBUG: BACKEND <<< GET /zero [("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),("Accept-Encoding","gzip, deflate"),("Accept-Language","ru,eo;q=0.7,en;q=0.3"),("dnt","1"),("From","pashev.igor@gmail.com"),("Host","example.ru:8443"),("Referer","https://example.ru:8443/"),("User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"),("X-Family-Name","\208\159\208\176\209\136\208\181\208\178"),("X-Forwarded-For","127.0.0.1"),("X-Forwarded-Proto","https"),("X-Given-Name","\208\152\208\179\208\190\209\128\209\140"),("X-Groups","foo")]
DEBUG: BACKEND >>> 200 on GET /zero [("Server","SimpleHTTP/0.6 Python/2.7.13"),("Date","Sat, 10 Jun 2017 12:13:53 GMT"),("Content-type","application/octet-stream"),("Content-Length","1073741824"),("Last-Modified","Sat, 10 Jun 2017 10:19:16 GMT")]

ERROR: 500 Internal Server Error: Thread killed by Warp's timeout reaper on GET example.ru:8443/zero HTTP/2.0 "https://example.ru:8443/" "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" from 127.0.0.1:46098

Did I miss something?

@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented Jul 3, 2017

I finally reproduce @ip1981's problem (with @lucasdicioccio's patch).
I will try to fix it.

@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented Jul 3, 2017

Here is what I did:

  • I created 1073741824-bytes file and configurd mighty to serve it
  • nghttp can download it
  • Chrome can download it
  • But Firefox stalls when a pop-up window ("save to a file?" stuff) is displayed

What I found is probably a bug of Firefox, not @ip1981's one.

kazu-yamamoto added a commit to kazu-yamamoto/wai that referenced this issue Jul 3, 2017

Wrap recvN in HTTP2 to tickle the timer.
This patch addresses a bug preventing long HTTP2 connections.  See yesodweb#593
for an initial report, as well as
yesodweb#593 (comment) for a
reproducible example -- which this patch addresses (the actual root
cause for yesodweb#593 might differ but seem closely related).
@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented Jul 7, 2017

This happens Firefox on macOS only.
I guess that this is because of bugs of poll() on Sierra.
sendAll in the network package does not work on Sierra in a specific situation.

@kazu-yamamoto

This comment has been minimized.

Contributor

kazu-yamamoto commented Apr 25, 2018

I'm 100% sure that Warp version 3.2.20 fixes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment