Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accept(2) error handling #553

Merged
merged 4 commits into from May 24, 2016
Merged

accept(2) error handling #553

merged 4 commits into from May 24, 2016

Conversation

reiddraper
Copy link
Contributor

This PR contains three commits which deal with an issue we've been having with our production Warp services. Currently, Warp treats all exceptions that can be raised by accept as unrecoverable (other than its special handling of ResourceExhausted, which I've removed for reasons described in 3011cad). However, some errors from the accept syscall are recoverable, and do not imply the entire listening socket is closed or in a bad state. Specifically, accept(2) and accept4(2) will return ECONNABORTED if a connection request is aborted between entering the socket's listen queue, and actually being accepted. In this case, a subsequent call to accept will simply return the next socket on the queue.

Since this was treated as a fatal error before, when this would happen to us in production, the webserver would begin its graceful shutdown routine, but since we use long-lived connections, graceful shutdown would wait indefinitely for HTTP requests that will never finish (we leave an HTTP connection open and stream data). This manifested itself as the entire application hanging indefinitely. In aebd8d4, I've added an option to the Warp Settings for an optional graceful shutdown timeout, so that if there is a fatal error raised by accept, the webserver won't simply hang forever, waiting to shut down.

I've kept this in three commits, so that it's easier to review, and if you only want some of this functionality, it can be cherry-picked.

There is no guarantee that running out of file-descriptors is a
temporary condition, and simply sleeping for a second is a naive
solution. In the general case, it is better to scream loudly and alert
an administrator to this case.
ECONNABORTED is a recoverable error, and only applies to a particular
socket in the accept(2) queue. It is safe to simply try accept'ing the
next socket on the queue.
For servers with long-lived (or infinite lived) connections, waiting for
graceful shutdown can block the server indefinitely. A new setting is
added, which allows for a time (in seconds) to be applied when
commencing a graceful shutdown.
@reiddraper
Copy link
Contributor Author

The failure is a timeout on the BUILD=cabal GHCVER=7.6.3 CABALVER=1.16 HAPPYVER=1.19.5 ALEXVER=3.1.7 build. The other builds all pass.

@@ -347,6 +353,13 @@ setLogger :: (Request -> H.Status -> Maybe Integer -> IO ())
-> Settings -> Settings
setLogger lgr y = y { settingsLogger = lgr }

-- | Set the graceful shutdown timeout. A timeout of `Nothing' will
-- wait indefinitely, and a number, if provided, will be treated as seconds
-- to wait for requests to finish, before shutting down the server entirely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to add -- @since x.x.x or -- Since x.x.x like the other APIs have. I think the next version would be 3.2.8 (Current is 3.2.7, and this PR adds new APIs. Not sure if the ResourceExhausted thing would be considered a breaking change, though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 504fa37.

@reiddraper
Copy link
Contributor Author

I've force-pushed fce3103 to address the formatting issue in the documentation.

@snoyberg
Copy link
Member

Looks fine to me. @kazu-yamamoto do you want to review before merge?

@kazu-yamamoto
Copy link
Contributor

I support this changes.

@snoyberg snoyberg merged commit fce3103 into yesodweb:master May 24, 2016
@snoyberg
Copy link
Member

Thanks!

@reiddraper reiddraper deleted the bugfix/accept-error-handling branch May 24, 2016 14:54
@reiddraper reiddraper restored the bugfix/accept-error-handling branch May 25, 2016 23:42
@kazu-yamamoto
Copy link
Contributor

Warp 3.2.7 has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants