Handle AWS termination notice for spot instances

#### Motivation

Respond to spot instance terminations more gracefully. That is to prevent getting failed requests when the traffic is supposed to migrate from the terminating instance to another one that is healthy.

#### Questions

* What is the current behavior, and what would this achieve that's better? Does the cluster autoscaler help with this at all?

#### Description

* https://github.com/aws/aws-node-termination-handler
* https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767

#### Edit (Research)

Some relevant articles here:
* https://aws.amazon.com/blogs/compute/best-practices-for-handling-ec2-spot-instance-interruptions/
* https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html#spot-instance-termination-notices
* https://docs.aws.amazon.com/autoscaling/ec2/userguide/healthcheck.html
* https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#use-kubectl-drain-to-remove-a-node-from-service


If we add `aws-node-termination-handler` and make `kubectl` drain the node upon notice, then I think the serving container will react to that by rejecting the requests currently in the queue and for those that are still being processed to finish. For testing, killing/terminating the instance might not be the best way to run this - instead, a way of reproducing the termination notice that AWS emits has to be found.


With https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html and with the `kubectl` drain procedure we might be able to gracefully transition to a healthy instance. And it looks like the back-end connection timeout is set to 300 seconds before the ELB kills the requests headed to the de-registering instance. We’d probably want to set that to 120 seconds, to match the termination notice period.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle AWS termination notice for spot instances #927

Motivation

Questions

Description

Edit (Research)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle AWS termination notice for spot instances #927

Description

Motivation

Questions

Description

Edit (Research)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions