kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

DanielAmmar · 2020-12-30T17:45:44Z

What happened?
launch unmanaged node group with p3.2xlarge gpu (ami-0f23f1b20f58cc97f)
however it failed to start -

systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-eksclt.al2.conf
   Active: activating (auto-restart) (Result: exit-code) since Wed 2020-12-30 14:16:36 UTC; 4s ago
     Docs: https://github.com/kubernetes/kubernetes
  Process: 22376 ExecStart=/usr/bin/kubelet --node-ip=${NODE_IP} --node-labels=${NODE_LABELS},alpha.eksctl.io/instance-id=${INSTANCE_ID} --max-pods=${MAX_PODS} --register-node=true --register-with-taints=${NODE_TAINTS} --cloud-provider=aws --container-runtime=docker --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=${AWS_EKS_ECR_ACCOUNT}.dkr.ecr.${AWS_DEFAULT_REGION}.${AWS_SERVICES_DOMAIN}/eks/pause:3.3-eksbuild.1 --kubeconfig=/etc/eksctl/kubeconfig.yaml --config=/etc/eksctl/kubelet.yaml (code=exited, status=255)
  Process: 22365 ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5 (code=exited, status=0/SUCCESS)
 Main PID: 22376 (code=exited, status=255)

error message:
failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs

cat /etc/eksctl/kubelet.yaml points that cgroupDriver: systemd however I suspect it should be cgroupDriver: cgroupfs

docker cgroup in Amazon Linux 2 (GPU) is set to "cgroupfs" (vs. "systemd" in non GPU versions)

How to reproduce it?
launch gpu group node via eksctl v0.35.0

Anything else we need to know?
What OS are you using, are you using a downloaded binary or did you compile eksctl, what type of AWS credentials are you using (i.e. default/named profile, MFA) - please don't include actual credentials though!

Versions

$ eksctl version
0.35.0

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Addiional info
I also tried to set an old GPU AMI version = "ami-0969f51a73874a795" (and even unset) - the same disappointing result.
When manually changing /etc/systemd/system/kubelet.service.d/10-eksclt.al2.conf
to include --cgroup-driver=cgroupfs and restart the service I could see the node registered successfully to my cluster.

The text was updated successfully, but these errors were encountered:

DanielAmmar · 2020-12-31T09:14:49Z

A temporary solution is to add the following lines to the ClusterConfig yaml (only in GPU node groups):

preBootstrapCommands: 
- "sed -i 's/cgroupDriver:.*/cgroupDriver: cgroupfs/' /etc/eksctl/kubelet.yaml"

Callisto13 · 2020-12-31T13:15:10Z

After some digging here is what's going on:

The reasons for setting the cgroupDriver to systemd given in Have kubelet use systemd cgroup driver on al2 and ubuntu #2962 are valid, so we do want to do that for all instance types.
That PR added some config-writing in /etc/eksctl/kubelet.yaml and /etc/docker/daemon.json which ensured that both the kubelet and the docker daemon both start with the same driver
However, when it comes to GPU instances, after we write that /etc/docker/daemon.json file it gets removed by /etc/eks/accelerated-docker-custom.sh. Which means the daemon does not start with it, we get the mismatch and the kubelet fails to start.
This is because "Amazon EKS optimized accelerated Amazon Linux AMIs" (GPU ones) include the NVIDIA drivers and the nvidia-container-runtime and start docker/containerd with a bunch of flags (the base of which comes from /etc/systemd/system/docker.service.d/nvidia-docker-dropin.conf with some vars from /etc/sysconfig/docker and /run/docker/runtimes.env).
When I SSH onto the node and edit /etc/sysconfig/docker to include --exec-opt native.cgroupdriver=systemd in OPTIONS and restart (sudo systemctl daemon-reload && sudo systemctl restart docker), the kubelet starts and the node joins.
- Creating a /etc/docker/daemon.json containing just {"exec-opts":["native.cgroupdriver=systemd"]} and restarting the docker process with sudo pkill -SIGHUP dockerd to make docker reload with that config file (not systemctl restart etc, this removes the file), also works, but is ofc not robust.
I am still playing with finding a "nice" (less bad) way to configure the flags that the EKS-configured service starts with as part of the create, but no joy yet. awslabs/amazon-eks-ami does not seem to be the only thing used to build the GPU images, I think whatever is used is not open yet.

So, what can be done?:

Work around for 0.35.0:
- a) the solution given by the OP above should get users past this
- b) "sed -i 's/^OPTIONS=\"/&--exec-opt native.cgroupdriver=systemd /' /etc/sysconfig/docker" will also work in preBootstrapCommands if users want the systemd driver
Short term options:
- a) Revert that PR (this could mess things up for others Revert "Change cgroup driver to systemd (#521)" awslabs/amazon-eks-ami#587)
- b) When the instance type is GPU, run a step somewhere in nodebootstrap which adds that flag to /etc/sysconfig/docker
- c) When the instance type is GPU, don't set the cgroup driver in kubelet conf to be systemd when writing file
Long term:
- a) Find out if there is a nice a way to configure the cgroup driver (and other things) that are set in these AMIs (talk to Amazon)
- b) Ask AWS to set the cgroup driver as systemd in those AMIs (if systemd available)

(note: Docker 20.01 has the cgroupDriver set to systemd by default, so this problem may be solved in future k8s versions)

DanielAmmar added the kind/bug label Dec 30, 2020

Callisto13 added the priority/critical Should be investigated as soon as possible label Dec 31, 2020

Callisto13 self-assigned this Dec 31, 2020

Callisto13 mentioned this issue Dec 31, 2020

Set cgroupdriver to systemd on GPU nodes #3007

Merged

5 tasks

Callisto13 closed this as completed in #3007 Jan 4, 2021

henripal mentioned this issue Jan 15, 2021

create cluster hangs on waiting for nodes to become ready #1482

Closed

tejaschumbalkar mentioned this issue Jan 19, 2021

[bug] eksctl version upgrade v0.36 aws/deep-learning-containers#795

Closed

vishalbollu mentioned this issue Mar 11, 2021

Change cgroupDriver from systemd to cgroupfs cortexlabs/cortex#1947

Merged

6 tasks

ndmikkelsen mentioned this issue Apr 16, 2021

nodeGroups:overrideBootstrapCommand: /etc/eks/bootstrap.sh --docker-config-json returns 1 on line 414 of bootstrap.sh #3579

Closed

isaac-ward mentioned this issue Dec 9, 2021

Can't create cluster with GPU instances - "waiting for at least 1 node(s) to become ready in <nodegroup>" #4532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

DanielAmmar commented Dec 30, 2020 •

edited

Loading

DanielAmmar commented Dec 31, 2020 •

edited

Loading

Callisto13 commented Dec 31, 2020 •

edited

Loading

kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

Comments

DanielAmmar commented Dec 30, 2020 • edited Loading

DanielAmmar commented Dec 31, 2020 • edited Loading

Callisto13 commented Dec 31, 2020 • edited Loading

DanielAmmar commented Dec 30, 2020 •

edited

Loading

DanielAmmar commented Dec 31, 2020 •

edited

Loading

Callisto13 commented Dec 31, 2020 •

edited

Loading