Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: option to configure kubelet to reserve resources for system daemons #795

Closed
arielvinas opened this issue May 10, 2019 · 9 comments · Fixed by #886
Closed

Comments

@arielvinas
Copy link

Why do you want this feature?
In EKS, nodes that starve for resources for system daemons go to NotReady state and stay there until someone manually deletes the node in EC2 UI. The feature of a timeout from the master to a node if it stays NotReady for too long because it can't comunicate will come with kubernetes 1.15 (in 2020). Reserving resources is very important https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

What feature/behavior/change do you want?
There should be a field in the cluster config yaml to configure this settings and that applies them to the userdata in the cloudformation template, something like extraKubeletFlags or appendToKubeletConfig

it would look like:

appendToKubeletConfig:
    kubeReserved:
      cpu: "300m"
      memory: "300Mi"
      ephemeral-storage: "1Gi"
    kubeReservedCgroup: "/kube-reserved"
    systemReserved:
      cpu: "300m"
      memory: "300Mi"
      ephemeral-storage: "1Gi"
    evictionHard:
      memory.available:  "200Mi"
      nodefs.available: "5%"
@whereisaaron
Copy link

whereisaaron commented May 11, 2019

uh oh, you mean EKS/eksctl is not reserving system resources already? I thought this was default/built-in behavior of current k8s, but sounds like it is optional 😢

The reserved resource for kubelets is extremely important where you use overcommitted workloads (collections of spikey workloads) i.e. any time where Limits >= Requests. Under node resource exhaustion you want some workloads to be rescheduled, not entire nodes to go down.

If you deploy any workloads without resource Requests, or with Requests but without identical Limits, then you need to ‘thumbs up’ this now 😄

@whereisaaron
Copy link

Here is a thread of victims who didn’t know EKS doesn’t handle this by default.
awslabs/amazon-eks-ami#79

As well as being able to specify these setting it would be great to have some sensible defaults in eksctl. To give the kubelet on overloaded nodes a fighting chance to handle it gracefully. The same users who don’t know about this probably also don’t appreciate the importance of pod resource requests/limits (yet 😜).

@arielvinas
Copy link
Author

Here is a thread of victims who didn’t know EKS doesn’t handle this by default.
awslabs/amazon-eks-ami#79

As well as being able to specify these setting it would be great to have some sensible defaults in eksctl. To give the kubelet on overloaded nodes a fighting chance to handle it gracefully. The same users who don’t know about this probably also don’t appreciate the importance of pod resource requests/limits (yet 😜).

Exactly... I'm one of those who struggle for a couple of months (coming from swarm) till realize what was wrong with my nodes

@errordeveloper errordeveloper self-assigned this Jun 5, 2019
@errordeveloper errordeveloper modified the milestone: 0.1.35 Jun 5, 2019
@Jeffwan
Copy link
Contributor

Jeffwan commented Jun 6, 2019

I don't quite understand the case here. What's the expectation for AMI? Do you think it's better to reserve memory by default? But AMI doesn't know the size of the instances and user's workloads, it's hard to reserve right amount of resources, right?

@whereisaaron
Copy link

@Jeffwan this is just to reserve resource for the kubelet process itself, so that it keeps running if the node gets overloaded by its workload. If the kubelet can keep running it can evict excess workload. If it gets slammed, then whole the node goes down.

@errordeveloper errordeveloper modified the milestones: 0.1.35, 0.1.36 Jun 6, 2019
@Jeffwan
Copy link
Contributor

Jeffwan commented Jun 6, 2019

@whereisaaron I agree. The challenge would be figuring out the proper size to reserve.

  1. Use different configs for different instance types?
  2. Use fixed number of memories. (since so many instances are supported, it has to make sure node has enough resources - reserved memories. This would be hard to small instances? )
    Like t3.nano only have 500MiB, If 100Mib reserved for kubelet, it lost 80% resources.

Do you have other ideas how to support this by default?

@errordeveloper
Copy link
Contributor

To begin with, let's just expose the option to the user. Default will have to remain what it is now. In the future, we may devise an algorithm that can take a best guess based on instance type(s), but that will need to discussed separately.

@whereisaaron
Copy link

@Jeffwan I'm not am sure the instance type makes much difference, since the kubelet has the same requirements regardless of instance size. Remember this is to reserve resource for the kubelet process itself. Perhaps larger instances may impact indirectly, if there tend to be more containers per instance.

But as @errordeveloper proposed, if we expose the options to the user, we can't go far wrong.

@Jeffwan
Copy link
Contributor

Jeffwan commented Jun 11, 2019

Agree on the current solution. We will discuss what's the best number to use in separate thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants