Skip to content

Handle AWS termination notice for spot instances #927

Open
@deliahu

Description

@deliahu

Motivation

Respond to spot instance terminations more gracefully. That is to prevent getting failed requests when the traffic is supposed to migrate from the terminating instance to another one that is healthy.

Questions

  • What is the current behavior, and what would this achieve that's better? Does the cluster autoscaler help with this at all?

Description

Edit (Research)

Some relevant articles here:

If we add aws-node-termination-handler and make kubectl drain the node upon notice, then I think the serving container will react to that by rejecting the requests currently in the queue and for those that are still being processed to finish. For testing, killing/terminating the instance might not be the best way to run this - instead, a way of reproducing the termination notice that AWS emits has to be found.

With https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html and with the kubectl drain procedure we might be able to gracefully transition to a healthy instance. And it looks like the back-end connection timeout is set to 300 seconds before the ELB kills the requests headed to the de-registering instance. We’d probably want to set that to 120 seconds, to match the termination notice period.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions