graceful shutdown - preliminary attempt #415

ghost · 2020-08-20T18:34:04Z

As requested, here is my PR to resolve #395

I tested this last night and the behavior is as follows:

when uploading, the server waits until the active handler being executed is complete (so a chunk can be written) before shutting down
if tusd is then started back up (and I resume the upload) it picks back up where it left off

In my testing, I killed tusd with an active upload at 34%, started tusd back up, and it picked right back up at 35% like nothing had happened. Then it completed successfully. This was just a small test but exactly the behavior desired/expected, I think?

Main questions: what do you think about the TODO markers? The shutdown timeout, etc.? I'm guessing these should become CLI flags.

Acconut

Thank you very much for this interesting PR! I have left a few comments for further discussion.

cmd/tusd/cli/serve.go

…meout.

Acconut · 2020-08-26T10:43:32Z

I left two comments on your changes. Please check them out :)

On another topic: How does this behave with long running requests (e.g. a PATCH request transferring multiple GBs for hours)? Is this request interrupted when the shutdown begins or only after the timeout has been reached? Is the request interrupted "nicely" by closing the request body or violently by killing the TCP connection? Did you try that out and can shine light on it?

ghost · 2020-08-28T03:40:11Z

I left two comments on your changes. Please check them out :)

On another topic: How does this behave with long running requests (e.g. a PATCH request transferring multiple GBs for hours)? Is this request interrupted when the shutdown begins or only after the timeout has been reached? Is the request interrupted "nicely" by closing the request body or violently by killing the TCP connection? Did you try that out and can shine light on it?

The only test I performed was the originally specified one in the PR description. The data set size is sub-gigabyte and performed locally. (edit: forgot to mention, 500mb~)

The request is never interrupted. Shutdown fails if in-flight requests are not completed prior to the shutdown's context timeout (see my other comments.)

Acconut · 2020-09-07T10:13:45Z

The request is never interrupted. Shutdown fails if in-flight requests are not completed prior to the shutdown's context timeout

Does that mean that the server cannot shutdown while requests are still open to it? I think that this would be a good decision. I would expect tusd, once asked to shutdown, to wait a few seconds for all requests to end while not accepting new requests. If the shutdown timeout is reached it should end all still active requests by force. Is this behavior implemented by your PR or am I missing something?

ghost · 2020-09-08T15:36:48Z

The request is never interrupted. Shutdown fails if in-flight requests are not completed prior to the shutdown's context timeout

Does that mean that the server cannot shutdown while requests are still open to it? I think that this would be a good decision. I would expect tusd, once asked to shutdown, to wait a few seconds for all requests to end while not accepting new requests. If the shutdown timeout is reached it should end all still active requests by force. Is this behavior implemented by your PR or am I missing something?

This is not how Go's built-in Shutdown API works. Please see the documentation for the best clarification on this matter. Active requests will not be ended by force if the timeout is reached - that would defeat the purpose of graceful shutdown.

Acconut · 2020-09-10T09:05:10Z

Active requests will not be ended by force if the timeout is reached - that would defeat the purpose of graceful shutdown.

Ok, that's good to know. However, it's not optimal if an active request (which might be open for hours) prevents tusd from shutting down. Does you PR already handle that in some case? The usage of run.Group makes those question very hard for me to answer by looking at the code.

ghost · 2020-09-14T13:37:40Z

Active requests will not be ended by force if the timeout is reached - that would defeat the purpose of graceful shutdown.

Ok, that's good to know. However, it's not optimal if an active request (which might be open for hours) prevents tusd from shutting down. Does you PR already handle that in some case? The usage of run.Group makes those question very hard for me to answer by looking at the code.

Remember that part where I was setting ReadTimeout and WriteTimeout and you mentioned you already have timeout handlers in place? What do those timeout handlers do and/or can you point out where they are in the code?

Thanks,
Ryan

ghost · 2020-09-14T13:37:57Z

Closed by accident. No coffee yet this morning, hahaha.

ghost · 2020-09-17T13:09:34Z

Hi again,

Just following up. I see ReadTimeout and WriteTimeout in the net.Listener code that tusd uses. This answers your last question, right? There's nothing to do on my end here?

Acconut · 2020-09-18T14:38:31Z

Remember that part where I was setting ReadTimeout and WriteTimeout and you mentioned you already have timeout handlers in place? What do those timeout handlers do and/or can you point out where they are in the code?

Our timeout handling is located here: https://github.com/tus/tusd/blob/master/cmd/tusd/cli/listener.go Its purpose is to reset the read/write timeouts whenever a read/write operation occurs. Go does not support that kind of timeouts on its own, so we had to build it for ourselves. Let me know if you need more in-depth explanation.

This answers your last question, right? There's nothing to do on my end here?

To be honest, I am still very much confused on how this patch behaves when requests are still open. Even with your explanations I am not able to understand it from just looking at the code. I would have to try it on my own and see how it works.

Also, there is still the usage of run.Group. You said that you are not interested in removing it (#415 (comment)) which is absolutely in your rights. It will take some time until I can tackle this on my own.

All in all, this PR is not in a state where I can confidently merge it into master since it has a lot of potential for breakage. I'm sorry that all your efforts end in such an unsatisfying situation.

ghost · 2020-09-22T19:31:36Z

That's fair. Let me see if I can come up with something else on my end to try to address some of the issues remaining. If I can come up with a solid explanation for how the run.Group functionality works, would you be comfortable with merging based on that? Or are there other blockers?

I think there's a reason why rewriting it away from oklog/run may be less than beneficial, but my research will hopefully figure that out. I don't want to be attached to it unnecessarily either if it's introducing additional complexity. Time is tight on my end though, hence why I'm hoping to avoid rewriting the patch if I can.

Acconut · 2020-09-29T13:07:42Z

That's fair. Let me see if I can come up with something else on my end to try to address some of the issues remaining.

That's great, thank you for understanding our point of view as well :)

If I can come up with a solid explanation for how the run.Group functionality works, would you be comfortable with merging based on that?

The chances are low, I would say. The best code is still the one which does not need explanation for a reasonably experienced Go developer. If it's obvious that the code without run.Group is more complex, I am happy to accept it as a dependency. But unless that has been shown, I would want to avoid it.

I think there's a reason why rewriting it away from oklog/run may be less than beneficial, but my research will hopefully figure that out.

If there is such a reason, I am open to accept run.Group :)

ssikyou · 2021-04-27T04:09:47Z

Any update on this feature?

Acconut · 2021-04-28T11:46:06Z

Any update on this feature?

Sadly no. This PR is still missing a feature where the currently open upload requests get gracefully closed down (in addition to the already implemented stop of accepting new upload requests). Also, I disagree with the author about the use of a dependency and they clearly indicated that they are not interested in updating the patch (which they completely have the right to).

However, maybe in the near future I will attempt to implement this on my own, as gracefully closing open requests is a somewhat harry thing to implement without more knowledge about the internals of tusd.

graceful shutdown - preliminary attempt

90a8d99

ghost mentioned this pull request Aug 20, 2020

Gracefully terminate on SIGTERM #395

Closed

Acconut reviewed Aug 21, 2020

View reviewed changes

cmd/tusd/cli/serve.go Outdated Show resolved Hide resolved

cmd/tusd/cli/serve.go Show resolved Hide resolved

cmd/tusd/cli/serve.go Show resolved Hide resolved

cmd/tusd/cli/serve.go Outdated Show resolved Hide resolved

rbastic added 3 commits August 21, 2020 05:55

remove timeouts, missed other timeout code

e913f0b

shutdown-timeout: add command line parameter for graceful shutdown ti…

6dcf37e

…meout.

remove TODO

0a7c85e

ghost closed this Sep 14, 2020

ghost reopened this Sep 14, 2020

ghost closed this by deleting the head repository Mar 16, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graceful shutdown - preliminary attempt #415

graceful shutdown - preliminary attempt #415

ghost commented Aug 20, 2020 •

edited by ghost

Loading

Acconut left a comment

Acconut commented Aug 26, 2020

ghost commented Aug 28, 2020 •

edited by ghost

Loading

Acconut commented Sep 7, 2020

ghost commented Sep 8, 2020

Acconut commented Sep 10, 2020

ghost commented Sep 14, 2020

ghost commented Sep 14, 2020

ghost commented Sep 17, 2020

Acconut commented Sep 18, 2020 •

edited

Loading

ghost commented Sep 22, 2020

Acconut commented Sep 29, 2020

ssikyou commented Apr 27, 2021

Acconut commented Apr 28, 2021

graceful shutdown - preliminary attempt #415

graceful shutdown - preliminary attempt #415

Conversation

ghost commented Aug 20, 2020 • edited by ghost Loading

Acconut left a comment

Choose a reason for hiding this comment

Acconut commented Aug 26, 2020

ghost commented Aug 28, 2020 • edited by ghost Loading

Acconut commented Sep 7, 2020

ghost commented Sep 8, 2020

Acconut commented Sep 10, 2020

ghost commented Sep 14, 2020

ghost commented Sep 14, 2020

ghost commented Sep 17, 2020

Acconut commented Sep 18, 2020 • edited Loading

ghost commented Sep 22, 2020

Acconut commented Sep 29, 2020

ssikyou commented Apr 27, 2021

Acconut commented Apr 28, 2021

ghost commented Aug 20, 2020 •

edited by ghost

Loading

ghost commented Aug 28, 2020 •

edited by ghost

Loading

Acconut commented Sep 18, 2020 •

edited

Loading