Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graceful shutdown #944

Closed
wants to merge 2 commits into from
Closed

Add graceful shutdown #944

wants to merge 2 commits into from

Conversation

craigpastro
Copy link

@craigpastro craigpastro commented May 24, 2023

Hello there 👋 This is an attempt to resolve #395. Please let me know what you think. Thanks!

Sorry about the diff. I removed the early return and indented the TLS part so that the shutdown could be done at the end, which led to the rather big diff.

At the moment, the timeout is 5 seconds and not configurable. Please let me know if you would like me to add that.

Closes #395.

Screenshot 2023-05-24 at 11 14 07 AM

@Acconut
Copy link
Member

Acconut commented May 25, 2023

Thank you for this PR, but I think we should still discuss what kind of graceful shutdown we actually want. Citing from Go's docs about net/http.Server.Shutdown (https://pkg.go.dev/net/http#Server.Shutdown, emphasis mine):

Shutdown gracefully shuts down the server without interrupting any active connections. Shutdown works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down. If the provided context expires before the shutdown is complete, Shutdown returns the context's error, otherwise it returns any error returned from closing the Server's underlying Listener(s).

When Shutdown is called, Serve, ListenAndServe, and ListenAndServeTLS immediately return ErrServerClosed. Make sure the program doesn't exit and waits instead for Shutdown to return.

Shutdown does not attempt to close nor wait for hijacked connections such as WebSockets. The caller of Shutdown should separately notify such long-lived connections of shutdown and wait for them to close, if desired. See RegisterOnShutdown for a way to register shutdown notification functions.

This means that with your PR, tusd will wait 5s for all HTTP requests to complete. If there are still open requests after this timeout, it will stop the process without closing them properly. Given that tusd handles uploads, there is a fair share of requests that will be running longer than 5s. For some data stores (e.g. the s3 store), interrupting the request handling could lead to the loss of data that is in currently in a buffer. Of course, it will be retransmitted once tusd is back up, but it would be great if we could gracefully shut down PATCH requests as well and allow the buffers to be flushed to S3.

Does that make sense?

@craigpastro
Copy link
Author

👋 @Acconut. Yes, that makes perfect sense 👍

I'll take a look at how RegisterOnShutdown works and see if this can be used to flush buffers to S3. Thanks!

@Acconut
Copy link
Member

Acconut commented May 25, 2023

Great, let me know if you need any help! To be honest, I am not sure if RegisterOnShutdown is necessary or not. Maybe this graceful shutdown is possible without this function.

flush buffers to S3

One additional note: It is not necessary to directly flush its buffers. It's enough to call net/http.Request.Body.Close() and all storages should then flush any buffer and wrap up the current upload request. So it should be enough to ensure that the request body for every PATCH request is closed.

@craigpastro
Copy link
Author

craigpastro commented May 25, 2023

I tried something out just for fun in the latest commit, but I am not sure it is a good way to go about it.

What might be a little more straightforward is to just do a "best effort" shutdown with no context cancellation. That is, just call server.Shutdown and wait for any patch requests to complete, then exit. If you really need to exit sooner then it can be killed, which I suppose is what is happening now. Or perhaps make the cancel timeout configurable, and if you set it to 0, then it will wait for all patch requests to complete. WDYT?

@Acconut
Copy link
Member

Acconut commented Jun 12, 2023

@craigpastro Thank you very much for your initial work on this! That was really helpful and I used to to implement the upload interruption in a more elegant way. Please have a look at my PR #963, which borrows some ideas from your PR here.

A shutdown is initiated once signal is received. It then immediately instructs all uploads to stop. After either all connections are closed or a grace period has elapsed or a second signal is received, it will shut down immediately.

The upload interruption is similar to your approach, but reuses existing code around closing the request body.

Please let me know what you think!

@craigpastro
Copy link
Author

👋 @Acconut. Yes, that certainly looks a lot cleaner. I like the second signal force shutdown. I've used that many times in places but never thought to add that to my services. Nice idea! I'll close this in favour of #963. Thank you!

@craigpastro craigpastro deleted the graceful-shutdown branch June 12, 2023 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gracefully terminate on SIGTERM
3 participants