Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Documentation of Graceful Shutdown #979

Open
andrewthad opened this issue Mar 8, 2024 · 1 comment
Open

Improve Documentation of Graceful Shutdown #979

andrewthad opened this issue Mar 8, 2024 · 1 comment

Comments

@andrewthad
Copy link
Contributor

I spent a few hours recently figuring out what the how graceful shutdown behaves. I believe that I now mostly understand what is happening, and I am writing this for these purposes:

  • to help other users who are trying to figure out how graceful shutdown works
  • to suggest improvements to the documentation of graceful shutdown
  • to suggest improvements to the behavior of graceful shutdown

Closing Sockets

The documentation of warp suggests closing (i.e. Network.Socket.close) the socket to initiate graceful shutdown. On UNIX systems, a cross-thread close() on a socket like this is dangerous because the kernel could hand out the same file descriptor as a result to an open() or socket() call. But with the network library, this race condition cannot happen because the implementation atomically swaps out the file descriptor with negative one (-1) before the close() syscall. Which means that subsequent use of the socket is not merely likely to return EBADF, it is certain to return EBADF (on Linux at least, -1 is never a valid file descriptor). To someone who is accustomed to UNIX, this improved guarantee is not obvious. It would be nice to mention in the warp docs that network's implementation of close() doesn't have the dangerous race condition that UNIX sockets have. Even the documentation of Network.Socket.close itself is deficient here:

Close the socket. This function does not throw exceptions even if the underlying system call returns errors.

If multiple threads use the same socket and one uses unsafeFdSocket and the other use close, unexpected behavior may happen. For more information, please refer to the documentation of unsafeFdSocket.

This could be improved by mentioning that any calls to send, receive, accept, etc. will certainly throw exceptions after the socket is closed.

A question I have: Does the user have to close the listening socket, or it is sufficient to call shutdown on it instead? Warp's source code leaves me with the impression that shutdown should work, but I'm not certain. If calling shutdown is supported, it would be good to document this.

Ending HTTP Sessions

My Specific Scenario

In the application that I am working on, a load balancer (HAProxy) sits in front of the application and runs in HTTP mode. For anyone unfamiliar with HTTP-mode load balancing, this means that the proxy opens a small number (possibly just 1) of TCP sessions to the backend and leaves these TCP sessions open forever. Any request that a client (on the frontend) makes to the proxy is handled by forwarding (basically copying) this request to one of these existing backend TCP sessions. Running a load balancer like this (instead of in TCP mode) is considered good practice and is pretty common for HTTP traffic.

Graceful Shutdown Behavior

When an HTTP server attempts to gracefully shut downs, I expect two things to happen:

  1. Stop accepting more connections (i.e. stop calling accept())
  2. Terminate existing sessions. This means that, regardless of whether the client asked for connection reuse, the server should include the Connection: close header in its responses and then should closing the socket (or maybe sending a TLS bye first and then closing the socket).

Warp only does (1), not (2). I had incorrectly assumed that it did both. Since warp does not do both of these, then any client using a persistent connection (not just an HTTP load balancer) prevents graceful shutdown from completing. We have to use setGracefulShutdownTimeout and then wait until the timeout happens. This works, but it's a little unsatisfactory because

  • Some HTTP requests may get a "connection closed" instead a response
  • Some HTTP requests might get a partial (i.e. malformed) response
  • The server has to wait for the full length of the timeout, and it accepts more HTTP requests during this time (but does not accept more connections)

If warp included behavior (2) that I suggested above, then in situations where HTTP requests were coming in regularly (probably at least 1 request per second), the graceful shutdown would be even more graceful. If HTTP requests arrived very infrequently, then the server could still exhibit all three of the bad behaviors that I listed. Even so, I feel like this change this would be an improvement.

Possible improvements:

  1. Document how warp's graceful shutdown works, explaining that persistent HTTP sessions thwart it.
  2. Change graceful shutdown to (gracefully) close persistent sessions the next time an HTTP response is sent on the session. If this change is made, also document this behavior.
@Vlix
Copy link
Contributor

Vlix commented May 21, 2024

Thank you for the extensive issue description.

You make some compelling observations and we'd welcome any PR adding documentation, if not addressing these issues. I've also had to dig a lot to find how to gracefully shutdown a WAI server, so I know the pain.

I don't think any of the current maintainers has headroom to work on this, but I do feel this is one of the more important issues to fix because of how core to a web server the shutdown part is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants