-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout left running after use of sourceRequestBody #351
Comments
Haven't tested this yet, but just looking at |
I tested, and had to increase the |
OK, I've figured out what's going on here... and it's tricky. There's a subtle difference between the conduit and non-conduit version. The former is feeding data into the parser. Once the parser is done, the conduit stops. Termination therefore happens as soon as the JSON message is fully consumed. This means that the code ends up making just a single call to With There's an easy workaround for this; after getting your JSON value, you can run: sourceRequestBody rq $$ sinkNull This will force the rest of the body (in this case, nothing) to be consumed, and the timeout protection will be disabled. To be clear: this is a workaround, because warp is not handling this situation correctly. I need to think a bit more carefully about the slowloris protection to be certain, but there may not be a good resolution to this problem. If we simply pause the timeout handler each time we get a chunk, we could be attacked by having a slow stream of 1-byte chunks. So this may just need to be a "known deficiency." To start off with though: can you try out my workaround and see if it solves your actual problem? Thanks for the very clear bug report and reproducing case. |
Once again, your response time is amazing, thanks! I wondered if there might be something like this going on, and it does seem tricky to resolve cleanly. In the meantime, I can confirm that your workaround solves the original problem. |
This addresses the common case of a non-chunked request body. Previously, we would wait until a zero-length `ByteString` is returned, but that is suboptimal for some cases. For more information, see issue #351.
OK, I came up with a 95% solution to the problem. For the case of non-chunked request bodies (which is the usual case), we can determine that the request is done without an extra read, which I've now implemented. I added a note about the general problem of partially-consumed request bodies to the documentation. One final thing I could do is add a function to the vault to pause the timeout to give users more explicit control. Since that's likely not too hard, I'll give it a shot. |
OK, Again, thanks for the solid bug report and for testing. |
I just upgraded to Warp 3.0.10, which includes this change. I'm already using I'm not sure if my problem is related to this issue. I just encountered it and haven't looked into it deeply. Just wanted to check in with you to see what you thought. |
Were you getting that behavior with the older Warp as well, or is this a On Fri, Mar 20, 2015 at 3:14 PM Sean Leather notifications@github.com
|
@snoyberg TBH, I'm not sure. Several things have changed simultaneously, one of those changes being the addition of the exception handler. I'm building the same app with 3.0.9.3 right now to test it. It should be fairly easy to determine if the issue was introduced in 3.0.10 because the reaper is now coming for regular visits with my current test setup. |
Sadly (for me), it's not a problem introduced by this change. The response is not taking longer than 1 second in most cases, and the exception comes much later. There's a |
Okay, I may be missing something obvious, too. I'm able to reproduce my problem with a very simple program, adapted from @adamgundry's code: module Test (main) where
import Data.ByteString.Lazy.Char8 ( pack )
import Data.Time.Clock ( getCurrentTime )
import Network.HTTP.Types.Status ( ok200 )
import Network.Wai ( responseLBS, strictRequestBody )
import Network.Wai.Handler.Warp
main :: IO ()
main = runSettings stgs app
where
stgs = setTimeout 2 $ setPort 3000 $ setOnException err defaultSettings
err _ e = do t <- getCurrentTime
putStrLn $ show t ++ "\tError: " ++ show e
app rq k = do v <- strictRequestBody rq
t <- getCurrentTime
putStrLn $ show t ++ "\tRequest: " ++ show v
k $ responseLBS ok200 [] (pack "OK") I POSTed a simple input:
This is using warp-3.0.10. |
I'm unable to reproduce this, I just get a result immediately. I don't see anything which would cause the server to block long enough to cause the 2 second timeout to ever get triggered. How are you sending the post? I'm using the following script:
|
Interesting. I was using the Postman plugin for Chrome. I just tried it with a GET request from both Chrome and Firefox, and the same reap happens. But with curl, the thread does not get reaped. The issue came up with an AWS EC2 worker receiving messages from SQS via some local server on the worker instance (nginx, I'd guess). |
Also, the request is always printed immediately no matter what I do. The reaper appears later, presumably after the timeout. |
I modified the code to print the full request.
|
It's not the HTTP headers. I used the Chrome headers with curl, but no reaping. |
In your real application do you actually have such a low timeout value? Are you able to reproduce the error with the normal timeout? I have a feeling that the failures you're experiencing right now just have to do with chrome keeping a persistent connection. |
No, we use the default. I only used the low timeout here to reduce the wait before the reaper.
Yes. This is the result of using Chrome and removing the
This reflects approximately what I saw in the real application as well: about a minute of delay before the reaper arrived.
Okay... I have reproduced the same problem with netcat by forcing a persistent connection by sleeping for 70 seconds: $ (printf "GET / HTTP/1.1\n\n"; sleep 70) | netcat localhost 3000 The result:
I don't understand the issue well enough to know what the problem is. But I have now reproduced it with four different HTTP clients (nginx, Chrome, Firefox, and netcat), so I do think there is a problem somewhere. Should the timeout handler be independent of the connection persistence? Why is it still hanging around when the request has been fully processed and the response has been sent? |
Are you actually experiencing any negative effects from this, or is it just an error message? This sounds to me like Warp is just letting you know that a persistent connection was killed by the timeout reaper, but that no problem actually exists. I just realized that you ignored the first parameter to the exception handler, which would indeed let you know if this exception happened during a request or just to a persistent connection. |
I added printing the
I don't see any obvious direct negative effects due to the exception.
First, have you reproduced the issue? Or is it only me? If there is no real error, I think no exception should be thrown. There is a problem with having an exception that isn't actually an error. I can choose to “ignore” it by catching it and doing nothing. But what if, later, there is an actual error, and I miss it? Is it possible to change the timeout handler? I don't claim to understand much related to protecting against the slowloris attack. But, again, I wonder (and these may be meaningless questions for lack of understanding): Should the timeout handler be independent of connection persistence? Does the timeout exception mean that the connection was closed? Why is the timeout still around when the request has been fully processed and the response sent? If it is not possible to change the timeout handler, then I would like to disable it. In my case, I know exactly who's making the request and the server is firewalled against everything else. |
If you look at the default exception handler, you'll see that these I don't really see what should be done here, beyond possibly improving On Sun, Mar 22, 2015 at 4:24 PM Sean Leather notifications@github.com
|
Fair enough. So, I currently see the following options:
|
Or does this not even do what I would expect? -- ...
import qualified Network.Wai.Handler.Warp.Timeout as Timeout
main :: IO ()
main = do
mgr <- Timeout.initialize 30000000
Timeout.stopManager mgr
runSettings (stgs mgr) app
where
stgs mgr = setManager mgr $ setTimeout 2 $ setPort 3000 $ setOnException err defaultSettings
-- ... I still get a visit from the reaper. |
I've never tried to implement (3), so I'm not surprised it doesn't work. On Sun, Mar 22, 2015 at 5:07 PM Sean Leather notifications@github.com
|
I don't think |
I can still confirm that |
I'm debugging a Warp/conduit server that is failing with
Thread killed by Warp's timeout reaper
when a request causes a slow operation in the application. It seems that after a use ofsourceRequestBody
, the timeout is left running when control returns to the application, even if the client sends a complete request. Here's a minimal example:If you POST a JSON object (e.g.
{}
) to this server, it will be printed, but the thread will be killed after a second. Replacing the conduit code withstrictRequestBody
makes the problem go away.I'm a little suspicious of timeoutBody because it seems to read but never write an IORef, but I don't know if that is the direct culprit.
Thanks for taking a look at this, and apologies if I've missed something obvious!
The text was updated successfully, but these errors were encountered: