Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread killed by timeout reaper when sending large HTTP/2 response #593

Closed
ip1981 opened this issue Nov 24, 2016 · 13 comments
Closed

Thread killed by timeout reaper when sending large HTTP/2 response #593

ip1981 opened this issue Nov 24, 2016 · 13 comments
Assignees
Labels

Comments

@ip1981
Copy link

ip1981 commented Nov 24, 2016

With Warp 3.2.8 I'm getting error Thread killed by Warp's timeout reaper when sending large (GiB) bodies from backend servers. This looks similar to #351, but happens only on HTTP/2 (HTTP/1.1 to the backend).

The docs say

Every time data is successfully sent to the client, the timeout is tickled.

The data is definitely being sent to the client. Also this commit suggests that timeout should be disabled when request body is consumed, that I believe is the case.

Source code:

forward :: BE.Manager -> W.Application
forward mgr req resp = do
  let beReq = BE.defaultRequest
        { BE.method = W.requestMethod req
        , BE.path = W.rawPathInfo req
        , BE.queryString = W.rawQueryString req
        , BE.requestHeaders = modifyRequestHeaders $ W.requestHeaders req
        , BE.redirectCount = 0
        , BE.decompress = const False
        , BE.requestBody = case W.requestBodyLength req of
            W.ChunkedBody   -> requestBodySourceChunkedIO (sourceRequestBody req)
            W.KnownLength l -> requestBodySourceIO (fromIntegral l) (sourceRequestBody req)
        }
      msg = unpack (BE.method beReq <> " " <> BE.path beReq <> BE.queryString beReq)
  Log.debug $ "BACKEND <<< " ++ msg ++ " " ++ show (BE.requestHeaders beReq)
  BE.withResponse beReq mgr $ \res -> do
        let status = BE.responseStatus res
            headers = BE.responseHeaders res
            body = mapOutput (Chunk . fromByteString) . bodyReaderSource $ BE.responseBody res
            logging = if statusCode status `elem` [ 400, 500 ] then
                      Log.warn else Log.debug
        logging $ "BACKEND >>> " ++ show (statusCode status) ++ " on " ++ msg ++ " " ++ show headers ++ "\n"
        resp $ responseSource status (modifyResponseHeaders headers) body
@kazu-yamamoto kazu-yamamoto self-assigned this Dec 6, 2016
@kazu-yamamoto
Copy link
Contributor

I need a small test case.
Consider this application:

{-# LANGUAGE OverloadedStrings #-}
import Network.Wai
import Network.HTTP.Types (status200)
import Network.Wai.Handler.Warp
import Control.Concurrent

application req respond =
    respond $ responseStream status200 [("Content-Type", "text/plain")] strm
  where
    strm write flush = do
        write "Hello\n"
        threadDelay 3000000
        write "World\n"

main = runSettings settings application
  where
    settings = setTimeout 1 $ setPort 3000 $ setOnException (\r e -> print e) defaultSettings

And test it with nghttp:

% nghttp http://localhost:3000
Hello
Some requests were not processed. total=1, processed=0

I think this is not your problem. No error is displayed in the we app side.

Could you modify this application to reproduce your problem easily?

@lucasdicioccio
Copy link
Contributor

I think I can reproduce a similar problem with my nano test case and my experimental client library.

Client: my executable at https://github.com/lucasdicioccio/http2-client

Server:

hello = (ByteString.replicate 1024 80)

server :: IO ()
server =
    runTLS defaultTlsSettings (Warp.setOnException (\a b -> print (a,b)) $
                               Warp.setTimeout 3600 $
                               Warp.defaultSettings) app
  where
    app req f = f $ responseBuilder status200 [ ("Content-Type", "text/plain")
                                              , ("Content-Length", "5")
                                              ] $ copyByteString hello

If I don't set the timeout with Warp.setTimeout 3600 I noticed my client failing around 1min in (the default timeout being 30s. If I set the timeout to 5s then my client fails after 10s seconds. No idea what's going on, I've not tried whether 3600 gives me a 2h run for my client. I can have a look in a short while.

@lucasdicioccio
Copy link
Contributor

I think I found a cause for at least my version of the bug.

In Run.hs when it is decided to switch to HTTP2 over normal HTTP, ( https://github.com/yesodweb/wai/blob/master/warp/Network/Wai/Handler/Warp/Run.hs#L334-L346 ), the HTTP side wraps the handle with a wrappedRecv. This function is responsible for tickling the timer. The HTTP2 side wraps the handle with makeReceiveN only.

@kazu-yamamoto what do you think?

@kazu-yamamoto
Copy link
Contributor

@lucasdicioccio I think that your observation is right.
Could you create a wrapped version of makeReceiveN or something?

@lucasdicioccio
Copy link
Contributor

Ack. I'll give it a shot this week-end.

lucasdicioccio added a commit to lucasdicioccio/wai that referenced this issue May 30, 2017
This patch addresses a bug preventing long HTTP2 connections.  See yesodweb#593
for an initial report, as well as
yesodweb#593 (comment) for a
reproducible example -- which this patch addresses (the actual root
cause for yesodweb#593 might differ but seem closely related).
@lucasdicioccio
Copy link
Contributor

lucasdicioccio commented May 30, 2017

Hi, @ip1981 can you give a try to my patch? It definitely fixed my problem but the patch might need more work (e.g., see my TODO in the current diff 51934a3 + I've cargo-culted an IORef which is only written-to -- and never read -- in HTTP2 but I don't know what it's used for in HTTP).

@kazu-yamamoto
Copy link
Contributor

Ping @ip1981

@ip1981
Copy link
Author

ip1981 commented Jun 8, 2017

It will take some time :)

@ip1981
Copy link
Author

ip1981 commented Jun 10, 2017

Negative. I'm still getting timeout.

To be 100% sure I used cabal sandbox, installed warp from the http2-timer-tickle branch of https://github.co/lucasdicioccio/wai, then installed sproxy.

I used 1G file created with dd if=/dev/zero of=/var/tmp/zero bs=1M count=1024 and python -m SimpleHTTPServer 1234 as a backend.

Sproxy log:

DEBUG: BACKEND <<< GET /zero [("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),("Accept-Encoding","gzip, deflate"),("Accept-Language","ru,eo;q=0.7,en;q=0.3"),("dnt","1"),("From","pashev.igor@gmail.com"),("Host","example.ru:8443"),("Referer","https://example.ru:8443/"),("User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0"),("X-Family-Name","\208\159\208\176\209\136\208\181\208\178"),("X-Forwarded-For","127.0.0.1"),("X-Forwarded-Proto","https"),("X-Given-Name","\208\152\208\179\208\190\209\128\209\140"),("X-Groups","foo")]
DEBUG: BACKEND >>> 200 on GET /zero [("Server","SimpleHTTP/0.6 Python/2.7.13"),("Date","Sat, 10 Jun 2017 12:13:53 GMT"),("Content-type","application/octet-stream"),("Content-Length","1073741824"),("Last-Modified","Sat, 10 Jun 2017 10:19:16 GMT")]

ERROR: 500 Internal Server Error: Thread killed by Warp's timeout reaper on GET example.ru:8443/zero HTTP/2.0 "https://example.ru:8443/" "Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0" from 127.0.0.1:46098

Did I miss something?

@kazu-yamamoto
Copy link
Contributor

I finally reproduce @ip1981's problem (with @lucasdicioccio's patch).
I will try to fix it.

@kazu-yamamoto
Copy link
Contributor

Here is what I did:

  • I created 1073741824-bytes file and configurd mighty to serve it
  • nghttp can download it
  • Chrome can download it
  • But Firefox stalls when a pop-up window ("save to a file?" stuff) is displayed

What I found is probably a bug of Firefox, not @ip1981's one.

kazu-yamamoto pushed a commit to kazu-yamamoto/wai that referenced this issue Jul 3, 2017
This patch addresses a bug preventing long HTTP2 connections.  See yesodweb#593
for an initial report, as well as
yesodweb#593 (comment) for a
reproducible example -- which this patch addresses (the actual root
cause for yesodweb#593 might differ but seem closely related).
@kazu-yamamoto
Copy link
Contributor

This happens Firefox on macOS only.
I guess that this is because of bugs of poll() on Sierra.
sendAll in the network package does not work on Sierra in a specific situation.

@kazu-yamamoto
Copy link
Contributor

I'm 100% sure that Warp version 3.2.20 fixes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants