New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
graceful shutdown for containers #15180
base: master
Are you sure you want to change the base?
Conversation
core/docker/bin/run-trino
Outdated
function graceful_shutdown() | ||
{ | ||
echo "Shutting down this worker gracefully" | ||
curl -v -X PUT -d '"SHUTTING_DOWN"' -H "Content-type: application/json" http://localhost:8080/v1/info/state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be timeouts defined here. What would be the typical expected value for a healthy cluster? Does it depend on the configuration, like some query timeouts?
How to handle errors, when this command would fail? I don't think wait
has a timeout so we'd need to send a signal to the launcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added retry and timeout to curl in d8aa476.
This is a very simplistic solution which would work in limited number of deployments - most notably wouldn't work with any authentication/system access control configured or even just plain TLS - both of which are relatively common configurations. We wouldn't want a situation where some users will not be able to use this at all and maybe even be misled into assuming that the feature works for their configuration. I strongly feel #14876 is a better way to tackle this problem (and allows a lot of flexibility on what tools it can be used with - doesn't need to be limited to k8s only). In the meantime since the patch is so small I think anyone can continue using it as part of a custom docker image in meantime. |
I admit that we need a more holistic solution as I mentioned in my PR description but it occurred to me that the community can benefit from it until we have a better one. I don't think it will hurt anyone but can benefit some since the official docker image and the helm chart provides zero support for worker scaling in. This would help with a typical deployment where internode communication is not encrypted. It's your call if you like to close this PR. |
Is it possible to use graceful shutdown with shared-secret? is here is any special auth header for API? thanks! |
Yes, you need to implement token generation logic in InternalAuthenticationManager to add into the request header |
👋 @aksakalli - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
Description
There is no mechanism to handle
SIGTERM
if your pods are being terminated in Kubernetes.This PR implements a simple graceful shutdown procedure which is documented in here.
Additional context and related issues
I know there is better proposals like #521 #9976 which try to solve this problem more elegantly and holistically but this is just a simpler change for benefiting from existing
/v1/info/state
API endpoint.This implementation doesn't implement additional security aspects in case of having TLS/HTTPS enabled and shared-secret.
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text: