-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slowloris protection can be triggered by legitimate use of chromium #146
Comments
I've added a lot of debug logging to the timeout manager code to try to gather more data around when this happens. Using this patch http://downloads.kitenet.net/misc/warp-instrumented.patch I'm seeing logs like this:
Note that the timeout checker ran twice for thread 99. First setting it Inactive and then 30 seconds later seeing it was still inactive, and calling its onTimeout. Since I have the killThread commented out, we can see this thread go on to successfully process another request!
So, chromium is opening a connection, using it to request two files within seconds, and then a full minute later, reusing the connection to load another page. Clear evidence that the 30 second slowloris timeout is triggered by Question: Why does the connection not get closed when the slowloris timeout kills the thread? If I'm reading runSettingsConnectionMaker correctly, it only catches IO exceptions and closes the connection. The async ThreadKilled exception is not handled there See FIXME in the code. Perhaps that is the real bug. I suppose a workaround is to set the timeout as high as it will go. Since it's an Int holding microseconds, the highest timeout that can be set is around 35 minutes. Probably high enough for chromium to drop the connection. |
Thank you for filing this report. I've been hunting a bug in yesod devel which I think may be related, though until now I thought it was from a completely different library. I think the Slowloris approach is still correct, even if it kills some Chromium connections. At a certain point, there's no way to distinguish between legitimate client connections and attacks. 30 seconds is a somewhat arbitrary cutoff, but the cutoff can be customized by the user. And if everything worked as it should, this wouldn't be a problem. I don't think the FIXME is relevant to this issue. The code:
should in fact be triggering the connection to be closed on any exception. The Do you have a minimal example showing the problem? And can you provide more information on OS, GHC version, compile flags, and runtime flags? |
I've implemented a possible fix, though I'm not sure if it actually solves the problem (it seems to make yesod devel better behaved in my testing). Would you be able to test the newest Github code? |
|
I would think with both the old and new code this would be true, I can't see any reason why it wouldn't be. Nonetheless, @joeyh's log definitely implies that |
I will investigate this tomorrow. |
BTW, would you explain why mask & restore are used around here? |
A thread created by forkIO inherits mask state. So, reading the code, I seems to me that any child threads cannot receive asynchronous exceptions... |
As the document of Control.Exception says, we should not use mask & restore by ourselves. We should use bracket instead if necessary. |
The reason for the usage of
I think we may want to restore |
If the whole |
Good point; perhaps we should pass |
What about this logic? forever . mask_ $ do
allowInterrupt
(mkConn, addr) <- getConnLoop
void . forkIOWithUnmask $ \unmask -> do
th <- T.registerKillThread tm
conn <- mkConn
#if SENDFILEFD
let cleaner = Cleaner th fc
#else
let cleaner = Cleaner th
#endif
let serve = onOpen >> serveConnection set cleaner port app conn addr
cleanup = connClose conn >> T.cancel th >> onClose
unmask serve `finally` cleanup `catch` onE |
I confirmed that the code above calls Note that I also confirmed that |
I'm not sure that this actually changes anything. You're performing the masking inside the forever loop, so each loop iteration we break out of the mask and therefore have an opportunity to receive an async exception. However, allowInterrupt already achieves that goal. And I'm not sure why forkIOWithUnmask would have different behavior than getting a restore function from mask. |
My code is based on the explanation of Control.Concurrent. This code is clearer than the current one from my point of view. Anyway, I would want @joeyh to test both code and would like to see his results. |
@kazu-yamamoto Shouldn't it be I find @kazu-yamamoto's code easier to read, but I'd add parenthesis around the |
I don't know which is better, I don't not the expected behavior of I have no objection to enclose |
I'm not sure if there's really a good motivation for not using the new bracket code. The only tweak I'd consider making is mask $ \restore -> forever $ do
allowInterrupt
(mkConn, addr) <- getConnLoop
void . forkIO $ do
th <- T.registerKillThread tm
bracket mkConn connClose $ \conn -> restore $ do
#if SENDFILEFD
let cleaner = Cleaner th fc
#else
let cleaner = Cleaner th
#endif
let serve = do
onOpen
serveConnection set cleaner port app conn addr
cleanup
cleanup = T.cancel th >> onClose
handle onE (serve `onException` cleanup) What this says is:
Since no async exceptions can be thrown in steps 3-5, it should be impossible to leak the connection there. And once the bracket function is called, any async exceptions in step 6 will properly trigger cleanup. |
So why not: mask_ . forever $ do
allowInterrupt
(mkConn, addr) <- getConnLoop
void . forkIOWithUnmask $ \unmask ->
bracket (T.registerKillThread tm) T.cancel $ \th ->
bracket mkConn connClose $ \conn ->
#if SENDFILEFD
let cleaner = Cleaner th fc
#else
let cleaner = Cleaner th
#endif
in unmask .
handle onE .
bracket_ onOpen onClose $
serveConnection set cleaner port app conn addr I've put EDIT: Removed unnecessary |
Yes, that looks even better. @kazu-yamamoto any objections to that formulation? |
Michael Snoyman wrote:
I'm sorry, didn't see this thread until today. Yes, I can try testing. I have a fairly good recipe to reproduce the see shy jo |
I'm using ghc 7.4.1-4 from Debian unstable. Linux 3.2.0. Using the threaded RTS, no other special RTS flags or build flags. Sadly I do not have a minimal test case. |
No objection. Please leave a comment to refer this issue around this code. |
I've just pushed a commit to make the code change. The only change I made versus @meteficha's code was to swap the two bracket calls: there's no point registering a timeout handler before we've acquired the connection, since async exceptions will be blocked during connection making anyway. @meteficha and @kazu-yamamoto: Please review this. If you think it's good, we should ask @joeyh to test the new code. |
OK @joeyh, I think the code is ready to be tested. Can you give it a shot? |
Michael Snoyman wrote:
I set the slowloris timeout down to just 10 seconds, and instrumented This isn't quite conclusive, but it's looking likely to be fixed. I would still appreciate a setting to entirely disable the slowloris see shy jo |
I've just released a new version of Warp with this fix, thanks everyone for working on it. I'm hesitant to include the option to disable timeouts, simply because it will likely introduce a bunch of conditionals into some tight loops. I'll leave the issue open for now. |
Does "I've yet to see chromium freeze" mean "chromium still freezes sometime"? |
it means it did not freeze |
…hat caused regular browsers to stall when they reuse a connection after leaving it idle for 30 seconds. (See yesodweb/wai#146)
I have a yesod web app that I'm able to fairly reliably, through a particular pattern of activity, cause to seem to freeze. Ie, I click on a link in the web browser and it spins, forever. Once this has happened, any links I click on will exhibit the behavior, and even pages that are a simple static hamlet template with no other IO freeze. However, open any link in a new tab and it works fine. It's only the http connection used by one particular chromium tab that has gotten stuck somehow.
(There is a mildly embarrassing video of this happening to me in the middle of a demo, here: http://mirror.linux.org.au/linux.conf.au/2013/mp4/gitannex.mp4 around 20 minutes in.)
I got a tcpdump of this happening and verified that chromium was sending a complete http request, and never getting an answer back, no matter how long I wait. Here is a tcpdump that shows a full session from app startup time to hang: http://downloads.kitenet.net/misc/git-annex-webapp-hang.tcpdump.bz2
I instrumented my use of warp, making settingsOnException settingsOnOpen and settingsOnClose do debug logging. From this I could see that ThreadKilled exceptions were being thrown. Only one place in warp calls killThread: registerKillThread does when the timeout is exceeded. I added a print there and verified that it was indeed responsible for killing threads. Here is a server log file to this issue showing the http requests and debug output: http://downloads.kitenet.net/misc/daemon.log
I commented out the killThread call, and my app seems fixed; no more hangs.
The timeout is apparently only supposed to be triggered by the slowloris attack. Ironically, since my webapp only listens for connections from localhost, it doesn't need DOS protection at all. But I cannot find a way to disable the slowloris protection in warp's API.
Adding a knob to disable killing threads would be sufficient for my needs, although I suspect you might want to take further action.
warp version: 1.3.7.4
The text was updated successfully, but these errors were encountered: