-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graceful shutdown - preliminary attempt #415
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this interesting PR! I have left a few comments for further discussion.
I left two comments on your changes. Please check them out :) On another topic: How does this behave with long running requests (e.g. a PATCH request transferring multiple GBs for hours)? Is this request interrupted when the shutdown begins or only after the timeout has been reached? Is the request interrupted "nicely" by closing the request body or violently by killing the TCP connection? Did you try that out and can shine light on it? |
The only test I performed was the originally specified one in the PR description. The data set size is sub-gigabyte and performed locally. (edit: forgot to mention, 500mb~) The request is never interrupted. Shutdown fails if in-flight requests are not completed prior to the shutdown's context timeout (see my other comments.) |
Does that mean that the server cannot shutdown while requests are still open to it? I think that this would be a good decision. I would expect tusd, once asked to shutdown, to wait a few seconds for all requests to end while not accepting new requests. If the shutdown timeout is reached it should end all still active requests by force. Is this behavior implemented by your PR or am I missing something? |
This is not how Go's built-in Shutdown API works. Please see the documentation for the best clarification on this matter. Active requests will not be ended by force if the timeout is reached - that would defeat the purpose of graceful shutdown. |
Ok, that's good to know. However, it's not optimal if an active request (which might be open for hours) prevents tusd from shutting down. Does you PR already handle that in some case? The usage of run.Group makes those question very hard for me to answer by looking at the code. |
Remember that part where I was setting ReadTimeout and WriteTimeout and you mentioned you already have timeout handlers in place? What do those timeout handlers do and/or can you point out where they are in the code? Thanks, |
Closed by accident. No coffee yet this morning, hahaha. |
Hi again, Just following up. I see ReadTimeout and WriteTimeout in the net.Listener code that tusd uses. This answers your last question, right? There's nothing to do on my end here? |
Our timeout handling is located here: https://github.com/tus/tusd/blob/master/cmd/tusd/cli/listener.go Its purpose is to reset the read/write timeouts whenever a read/write operation occurs. Go does not support that kind of timeouts on its own, so we had to build it for ourselves. Let me know if you need more in-depth explanation.
To be honest, I am still very much confused on how this patch behaves when requests are still open. Even with your explanations I am not able to understand it from just looking at the code. I would have to try it on my own and see how it works. Also, there is still the usage of All in all, this PR is not in a state where I can confidently merge it into master since it has a lot of potential for breakage. I'm sorry that all your efforts end in such an unsatisfying situation. |
That's fair. Let me see if I can come up with something else on my end to try to address some of the issues remaining. If I can come up with a solid explanation for how the run.Group functionality works, would you be comfortable with merging based on that? Or are there other blockers? I think there's a reason why rewriting it away from oklog/run may be less than beneficial, but my research will hopefully figure that out. I don't want to be attached to it unnecessarily either if it's introducing additional complexity. Time is tight on my end though, hence why I'm hoping to avoid rewriting the patch if I can. |
That's great, thank you for understanding our point of view as well :)
The chances are low, I would say. The best code is still the one which does not need explanation for a reasonably experienced Go developer. If it's obvious that the code without run.Group is more complex, I am happy to accept it as a dependency. But unless that has been shown, I would want to avoid it.
If there is such a reason, I am open to accept run.Group :) |
Any update on this feature? |
Sadly no. This PR is still missing a feature where the currently open upload requests get gracefully closed down (in addition to the already implemented stop of accepting new upload requests). Also, I disagree with the author about the use of a dependency and they clearly indicated that they are not interested in updating the patch (which they completely have the right to). However, maybe in the near future I will attempt to implement this on my own, as gracefully closing open requests is a somewhat harry thing to implement without more knowledge about the internals of tusd. |
As requested, here is my PR to resolve #395
I tested this last night and the behavior is as follows:
In my testing, I killed tusd with an active upload at 34%, started tusd back up, and it picked right back up at 35% like nothing had happened. Then it completed successfully. This was just a small test but exactly the behavior desired/expected, I think?
Main questions: what do you think about the TODO markers? The shutdown timeout, etc.? I'm guessing these should become CLI flags.