New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accept(2) error handling #553
accept(2) error handling #553
Conversation
There is no guarantee that running out of file-descriptors is a temporary condition, and simply sleeping for a second is a naive solution. In the general case, it is better to scream loudly and alert an administrator to this case.
ECONNABORTED is a recoverable error, and only applies to a particular socket in the accept(2) queue. It is safe to simply try accept'ing the next socket on the queue.
For servers with long-lived (or infinite lived) connections, waiting for graceful shutdown can block the server indefinitely. A new setting is added, which allows for a time (in seconds) to be applied when commencing a graceful shutdown.
The failure is a timeout on the |
@@ -347,6 +353,13 @@ setLogger :: (Request -> H.Status -> Maybe Integer -> IO ()) | |||
-> Settings -> Settings | |||
setLogger lgr y = y { settingsLogger = lgr } | |||
|
|||
-- | Set the graceful shutdown timeout. A timeout of `Nothing' will | |||
-- wait indefinitely, and a number, if provided, will be treated as seconds | |||
-- to wait for requests to finish, before shutting down the server entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to add -- @since x.x.x
or -- Since x.x.x
like the other APIs have. I think the next version would be 3.2.8 (Current is 3.2.7, and this PR adds new APIs. Not sure if the ResourceExhausted
thing would be considered a breaking change, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 504fa37.
I've force-pushed fce3103 to address the formatting issue in the documentation. |
Looks fine to me. @kazu-yamamoto do you want to review before merge? |
I support this changes. |
Thanks! |
Warp 3.2.7 has been released. |
This PR contains three commits which deal with an issue we've been having with our production Warp services. Currently, Warp treats all exceptions that can be raised by
accept
as unrecoverable (other than its special handling ofResourceExhausted
, which I've removed for reasons described in 3011cad). However, some errors from theaccept
syscall are recoverable, and do not imply the entire listening socket is closed or in a bad state. Specifically, accept(2) and accept4(2) will return ECONNABORTED if a connection request is aborted between entering the socket's listen queue, and actually being accepted. In this case, a subsequent call toaccept
will simply return the next socket on the queue.Since this was treated as a fatal error before, when this would happen to us in production, the webserver would begin its graceful shutdown routine, but since we use long-lived connections, graceful shutdown would wait indefinitely for HTTP requests that will never finish (we leave an HTTP connection open and stream data). This manifested itself as the entire application hanging indefinitely. In aebd8d4, I've added an option to the Warp
Settings
for an optional graceful shutdown timeout, so that if there is a fatal error raised byaccept
, the webserver won't simply hang forever, waiting to shut down.I've kept this in three commits, so that it's easier to review, and if you only want some of this functionality, it can be cherry-picked.