graceful shutdown for containers #15180

aksakalli · 2022-11-24T13:01:00Z

Description

There is no mechanism to handle SIGTERM if your pods are being terminated in Kubernetes.
This PR implements a simple graceful shutdown procedure which is documented in here.

Additional context and related issues

I know there is better proposals like #521 #9976 which try to solve this problem more elegantly and holistically but this is just a simpler change for benefiting from existing /v1/info/state API endpoint.

This implementation doesn't implement additional security aspects in case of having TLS/HTTPS enabled and shared-secret.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# Section
* Adding graceful shutdown procedure to docker image ({issue}`issuenumber`)

nineinchnick · 2022-11-29T10:41:57Z

core/docker/bin/run-trino

+function graceful_shutdown()
+{
+    echo "Shutting down this worker gracefully"
+    curl -v -X PUT -d '"SHUTTING_DOWN"' -H "Content-type: application/json" http://localhost:8080/v1/info/state


There should be timeouts defined here. What would be the typical expected value for a healthy cluster? Does it depend on the configuration, like some query timeouts?

How to handle errors, when this command would fail? I don't think wait has a timeout so we'd need to send a signal to the launcher.

I added retry and timeout to curl in d8aa476.

hashhar · 2022-11-29T14:16:08Z

This is a very simplistic solution which would work in limited number of deployments - most notably wouldn't work with any authentication/system access control configured or even just plain TLS - both of which are relatively common configurations. We wouldn't want a situation where some users will not be able to use this at all and maybe even be misled into assuming that the feature works for their configuration.

I strongly feel #14876 is a better way to tackle this problem (and allows a lot of flexibility on what tools it can be used with - doesn't need to be limited to k8s only).

In the meantime since the patch is so small I think anyone can continue using it as part of a custom docker image in meantime.

aksakalli · 2022-11-29T23:02:25Z

This is a very simplistic solution which would work in limited number of deployments - most notably wouldn't work with any authentication/system access control configured or even just plain TLS - both of which are relatively common configurations. We wouldn't want a situation where some users will not be able to use this at all and maybe even be misled into assuming that the feature works for their configuration.

I strongly feel #14876 is a better way to tackle this problem (and allows a lot of flexibility on what tools it can be used with - doesn't need to be limited to k8s only).

In the meantime since the patch is so small I think anyone can continue using it as part of a custom docker image in meantime.

I admit that we need a more holistic solution as I mentioned in my PR description but it occurred to me that the community can benefit from it until we have a better one. I don't think it will hurt anyone but can benefit some since the official docker image and the helm chart provides zero support for worker scaling in. This would help with a typical deployment where internode communication is not encrypted. It's your call if you like to close this PR.

sweetpythoncode · 2022-12-16T23:12:00Z

Is it possible to use graceful shutdown with shared-secret? is here is any special auth header for API? thanks!

aksakalli · 2022-12-19T15:50:17Z

Is it possible to use graceful shutdown with shared-secret? is here is any special auth header for API? thanks!

Yes, you need to implement token generation logic in InternalAuthenticationManager to add into the request header X-Trino-Internal-Bearer.

mosabua · 2024-01-11T23:26:10Z

👋 @aksakalli - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

graceful shutdown for containers

1e406de

cla-bot bot added the cla-signed label Nov 24, 2022

aksakalli requested review from nineinchnick and electrum November 25, 2022 19:38

nineinchnick reviewed Nov 29, 2022

View reviewed changes

add timeout and retry

d8aa476

kokosing force-pushed the master branch from 3f05134 to 58d6356 Compare March 14, 2023 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graceful shutdown for containers #15180

graceful shutdown for containers #15180

aksakalli commented Nov 24, 2022 •

edited

nineinchnick Nov 29, 2022

aksakalli Nov 29, 2022

hashhar commented Nov 29, 2022

aksakalli commented Nov 29, 2022

sweetpythoncode commented Dec 16, 2022

aksakalli commented Dec 19, 2022

mosabua commented Jan 11, 2024

graceful shutdown for containers #15180

Are you sure you want to change the base?

graceful shutdown for containers #15180

Conversation

aksakalli commented Nov 24, 2022 • edited

Description

Additional context and related issues

Release notes

nineinchnick Nov 29, 2022

Choose a reason for hiding this comment

aksakalli Nov 29, 2022

Choose a reason for hiding this comment

hashhar commented Nov 29, 2022

aksakalli commented Nov 29, 2022

sweetpythoncode commented Dec 16, 2022

aksakalli commented Dec 19, 2022

mosabua commented Jan 11, 2024

aksakalli commented Nov 24, 2022 •

edited