Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graceful shutdown for containers #15180

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

aksakalli
Copy link
Member

@aksakalli aksakalli commented Nov 24, 2022

Description

There is no mechanism to handle SIGTERM if your pods are being terminated in Kubernetes.
This PR implements a simple graceful shutdown procedure which is documented in here.

Additional context and related issues

I know there is better proposals like #521 #9976 which try to solve this problem more elegantly and holistically but this is just a simpler change for benefiting from existing /v1/info/state API endpoint.

This implementation doesn't implement additional security aspects in case of having TLS/HTTPS enabled and shared-secret.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# Section
* Adding graceful shutdown procedure to docker image ({issue}`issuenumber`)

function graceful_shutdown()
{
echo "Shutting down this worker gracefully"
curl -v -X PUT -d '"SHUTTING_DOWN"' -H "Content-type: application/json" http://localhost:8080/v1/info/state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be timeouts defined here. What would be the typical expected value for a healthy cluster? Does it depend on the configuration, like some query timeouts?

How to handle errors, when this command would fail? I don't think wait has a timeout so we'd need to send a signal to the launcher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added retry and timeout to curl in d8aa476.

@hashhar
Copy link
Member

hashhar commented Nov 29, 2022

This is a very simplistic solution which would work in limited number of deployments - most notably wouldn't work with any authentication/system access control configured or even just plain TLS - both of which are relatively common configurations. We wouldn't want a situation where some users will not be able to use this at all and maybe even be misled into assuming that the feature works for their configuration.

I strongly feel #14876 is a better way to tackle this problem (and allows a lot of flexibility on what tools it can be used with - doesn't need to be limited to k8s only).

In the meantime since the patch is so small I think anyone can continue using it as part of a custom docker image in meantime.

@aksakalli
Copy link
Member Author

This is a very simplistic solution which would work in limited number of deployments - most notably wouldn't work with any authentication/system access control configured or even just plain TLS - both of which are relatively common configurations. We wouldn't want a situation where some users will not be able to use this at all and maybe even be misled into assuming that the feature works for their configuration.

I strongly feel #14876 is a better way to tackle this problem (and allows a lot of flexibility on what tools it can be used with - doesn't need to be limited to k8s only).

In the meantime since the patch is so small I think anyone can continue using it as part of a custom docker image in meantime.

I admit that we need a more holistic solution as I mentioned in my PR description but it occurred to me that the community can benefit from it until we have a better one. I don't think it will hurt anyone but can benefit some since the official docker image and the helm chart provides zero support for worker scaling in. This would help with a typical deployment where internode communication is not encrypted. It's your call if you like to close this PR.

@sweetpythoncode
Copy link

Is it possible to use graceful shutdown with shared-secret? is here is any special auth header for API? thanks!

@aksakalli
Copy link
Member Author

Is it possible to use graceful shutdown with shared-secret? is here is any special auth header for API? thanks!

Yes, you need to implement token generation logic in InternalAuthenticationManager to add into the request header X-Trino-Internal-Bearer.

@mosabua
Copy link
Member

mosabua commented Jan 11, 2024

👋 @aksakalli - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants