Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelite process takes huge amount of CPU #2186

Open
muxi opened this issue Apr 20, 2021 · 51 comments
Open

kubelite process takes huge amount of CPU #2186

muxi opened this issue Apr 20, 2021 · 51 comments

Comments

@muxi
Copy link

muxi commented Apr 20, 2021

After getting the 1.21 package from Debian 10 repo, the kubelite process is eating up a lot of CPU. The process keep using >100% CPU in top and the average uptime load is ~2 instead of <1 in previous versions.

inspection-report-20210420_092101.tar.gz

@navodissa
Copy link

I'm facing the same issue and I'm using Ubuntu 18.04.

@balchua
Copy link
Collaborator

balchua commented Apr 20, 2021

@muxi from the tarball you provided, i can see that kubelite is not healthy, leading it to crashloop. That's why it is using up that much cpu. But i couldn't determine what is causing it to crashloop.

@muxi
Copy link
Author

muxi commented Apr 20, 2021

@balchua thanks for looking into it. For now I am just reverting to v1.20.5 which works for me. If anything else I can provide to help with debugging this problem let me know.

@balchua
Copy link
Collaborator

balchua commented Apr 21, 2021

When you install 1.20.5 did you purge the snap?
For example: sudo snap remove microk8s --purge
Just curious.

@muxi
Copy link
Author

muxi commented Apr 21, 2021

When you install 1.20.5 did you purge the snap?

Did you mean 1.20.5 or 1.21? The install of 1.21 was (surprisingly) automatically done by snap. The rollback to 1.20.5 was done by snap revert. No purge was ever run.

@balchua
Copy link
Collaborator

balchua commented Apr 21, 2021

Hi @muxi i can see that you are using the channel latest/stable. It seems like you have a long lasting cluster. Its highly recommended to stick to a channel example 1.20/stable or 1.21/stable to avoid unexpected incompatibility coming from different versions of kubernetes.
I was wondering if you can get the chance for a clean install of MicroK8s. Like remove with purge then install with 1.21/stable channel.
Thanks.

@Aaron-Ritter
Copy link

i face the same after upgrading to 1.21/edge especially the CPU usage is concerning with averaging around 10% in test with much less pods then our productin system (1.19/stable) kube-apiserver with averaging on 2% cpu load. Memory consumption is around 20% higher too.

k8s-test-n2   Ready    <none>   112d   v1.21.0-3+dc123ff2da727a

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 2462 root      20   0 3304.4m 939.7m  99.1m S   9.3   9.6  53:25.36 kubelite
k8s-test-n1   Ready    <none>   217d   v1.21.0-3+dc123ff2da727a

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
21136 root      20   0 3311.2m 908.5m  99.1m S   8.3   9.3  51:36.99 kubelite
k8s-test-m    Ready    <none>   214d   v1.21.0-3+dc123ff2da727a

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
31545 root      20   0 3444.9m   1.0g 101.5m S  10.6  10.5  94:01.05 kubelite

@balchua
Copy link
Collaborator

balchua commented May 5, 2021

Its hard to compare kubelite with the kube-apiserver alone. Kubelite packs in all the kubernetes components into 1 binary.
Kube-controller-manager, scheduler, proxy, apiserver, kubelet and dqlite.
Running the kubernetes as go routines vs each as standalone process do reduce the overall resource usage cpu and memory.

@ktsakalozos thoughts?

@ktsakalozos
Copy link
Member

@balchua is right, in terms of CPU load it is the sum of all k8s services and the datastore (dqlite). In terms of memory usage it should be about 200MB less than the total memory used by a setup where each service starts by its own.

@Aaron-Ritter
Copy link

@balchua @ktsakalozos having a look for the last 7 days on to the avg and max CPU consumption of our test setup it actually looks like that overall the CPU consumption slightly dropped. Its at first without the explenation from @balchua surprising to see kubelite in top poping up, but overall there seems to be no real visible increase and maybe even a slight drop. Once 1.21.1 is released we will upgrade it in our prod environment.

@krichter722
Copy link
Contributor

krichter722 commented May 14, 2021

I'm also seeing the CPU usage which I understand is a crash loop. It prevents me from starting microk8s after rebooting and stopping it.

The issue is probably that in snap stable/1.21 has revision 2128 and stable/1.20 has 2143 making the higher version number pointing to an older revision. Maybe that causes an (partial/incomplete) incompatible downgrade.

inspection-report-20210514_123833.tar.gz

@balchua
Copy link
Collaborator

balchua commented May 14, 2021

@krichter722 yes you are right, kubelite is crashlooping with this error.

Mai 14 12:38:23 mereet.com microk8s.daemon-kubelite[181888]: Error: start node: raft_start(): io: load closed segment 0000000021384298-0000000021384473: entries batch 176 starting at byte 5799224: entries count in preamble is zero

Is this a new setup or its an upgrade from previous version?

Maybe @ktsakalozos or @MathieuBordere can shed more light on this one.

@krichter722
Copy link
Contributor

krichter722 commented May 14, 2021

The issue occurred after upgrading to 1.21 with a long running instance running 1.20 which might have been upgraded before. A fresh install of 1.21 as well and 1.20 works (smoke test microk8s.status and kubectl get pods --all-namespaces which wasn't possible with crashlooping kubelite before) as well as an upgrade from 1.20 to 1.21 (same smoke test).

So, the issue is "resolved" for me by purging the microk8s snap installation and reinstalling, however light should be shed on the revision numbers as well as issues with an upgrade of an installation which is not freshly installed from 1.20 to 1.21.

@balchua
Copy link
Collaborator

balchua commented May 14, 2021

Thank you @krichter822 for the information. The revision number is normal. I think the v 1.20.6 came out after the 1.21/stable was cut.
There is an upgrade test in the CI, ranging from 1.17 (i think) up to the latest.

@tsipo
Copy link

tsipo commented Jun 3, 2021

I am facing similar - and worse - issue. Not only the CPU consumption of kubelite is between 70%-130%, it takes microk8s minutes to get to start (after which it stabilizes on ~70% CPU), AND kubelite consumes 22-23GB which is >70% of my machine.
inspection-report-20210603_173003.tar.gz

@balchua
Copy link
Collaborator

balchua commented Jun 6, 2021

@tsipo thank you for providing the inspect tarball.
IMHO, the high CPU is caused by kubelite crashing. I saw these logs.

Jun 03 16:49:48 rreshef-linux microk8s.daemon-kubelite[9911]: E0603 16:49:48.005191    9911 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": context deadline exceeded
Jun 03 16:49:48 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:48.005245    9911 leaderelection.go:278] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
Jun 03 16:49:48 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:48.905442    9911 event.go:291] "Event occurred" object="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="rreshef-linux_653179cd-8cc7-40dc-8b98-db77e0586259 stopped leading"
Jun 03 16:49:49 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:49.266364    9911 garbagecollector.go:160] Shutting down garbage collector controller
Jun 03 16:49:49 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:49.366886    9911 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
Jun 03 16:49:49 rreshef-linux microk8s.daemon-kubelite[9911]: F0603 16:49:49.366929    9911 server.go:205] leaderelection lost

Can you try adding the following

--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s

to these files /var/snap/microk8s/current/args/kube-controller-manager and /var/snap/microk8s/current/args/kube-scheduler and then restart MicroK8s.
I am not sure if this will resolve the issue.

@tsipo
Copy link

tsipo commented Jun 6, 2021

@balchua Thanks for your prompt reply, it's unfortunately too late. I have already purged the previous installation of microk8s and installed the same version (I'm on latest/edge channel) anew. Now no problems - CPU consumption is reasonable, same is memory consumption (down to a few hundreds of MBs). So the issue was surely with the upgrade path from previous version to current on that channel.
BTW my biggest problem was not the CPU consumption, though 70%-130% is a bit high (but I have 8 cores). It's the memory consumption that killed me - it got up to 22-23GB, and that's my dev machine that runs more apps (and has 32GB in total).

@balchua
Copy link
Collaborator

balchua commented Jun 6, 2021

Hi @tsipo thanks for giving us an update. If ever you find something strange feel free to create an issue.

@vdavy
Copy link

vdavy commented Jun 19, 2021

Hi, I got the same problem with microk8s 1.21 running on debian 11 testing : kubelite eats all my CPU
Here is the report : inspection-report-20210618_210554.tar.gz

When it autoupgraded to 1.21, I had to upgrade to debian 11 otherwise it wouldn't start. Kernel version now is 5.10.0-7-amd64 #1 SMP Debian 5.10.40-1 (2021-05-28) x86_64 GNU/Linux
The node is very slow to start and pretty unusable, so I gonna revert to 1.20 branch, waiting for a fix.

I'm running only one node and already tried to purge the cluster as mentioned above (pain in the ... to reinstall everything and what a pitty, didn't solve the problem).

Please note I'm open to try and test fixes.

@thenets
Copy link

thenets commented Jul 5, 2021

Same problem here. I'm using a fresh Ubuntu 20.04 install.

This is my version (snap):

Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.1-3+ba118484dd39df", GitCommit:"ba118484dd39df570e55e47f082e523cda7583e5", GitTreeState:"clean", BuildDate:"2021-06-11T05:09:28Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.1-3+ba118484dd39df", GitCommit:"ba118484dd39df570e55e47f082e523cda7583e5", GitTreeState:"clean", BuildDate:"2021-06-11T05:06:35Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

@krichter722
Copy link
Contributor

I experienced rather high CPU consumption with a DNS forward loop as well on 1.21 after the crash loop mentioned above was no longer an issue for me. I set up calico and a IPv4+6 dual stack, but I don't see why this might not happen with other setups as well, e.g. if your ISP provides strange DNS settings.

Therefore I needed to add --resolv-conf=/run/systemd/resolve/resolv.conf to /var/snap/microk8s/current/args/kubelet and setup the coredns configmap accordingling, see https://coredns.io/plugins/loop/ and https://microk8s.io/docs/addon-dns for a detailed explanation.

For me the issue was a huge amount of probe failures which also contributed to the high CPU usage which then probably lead to even more probe failures, but this issue might be worth taking a look at when investigating high CPU usage.

@AmilaDevops
Copy link

AmilaDevops commented Jul 28, 2021

hi, is any one knows after upgrading to my k8s cluster into v1.21 was this kubelite process issue have to deal with the kernel versions I'm having in my Ubuntu O/S ? Because i'm getting this cpu 100% issue only in a one node of my microk8s cluster (which has a higher kernel version when comparing with other nodes). All other nodes are fine.

tnx

@YpeKingma
Copy link

Just installed microk8s 1.22 stable on ubuntu via snap, and top reports 10-30% cpu for kubelite on an i3, 4 core, 2GHz.
It's tolerable, but that much should not be needed to keep a single node available.

Would it make sense to try the above suggestion:
--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s
?

@YpeKingma
Copy link

For the record, both files referred to above, kube-controller-manager and kube-scheduler in directory /var/snap/microk8s/current/args,
have these lines:
--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=30s

@bttger
Copy link

bttger commented Mar 31, 2022

I also came across this issue because I was wondering about relatively high CPU usage from the Kubelite process. I have about 20-30% load on idle after a fresh install. (dns, storage, ingress enabled)

@AlexsJones
Copy link
Contributor

@YpeKingma the resource usage will be related to the intensity of workloads running in Kubernetes.
Are you saying this is running nothing or are there workloads scheduled?

@YpeKingma
Copy link

@AlexsJones There were no workloads scheduled.

@ktsakalozos
Copy link
Member

@YpeKingma it would also help us know the hardware specs and the version of kubernetes you run. Would you be able to attach a microk8s inspect tarball? Thank you.

@castellanoj
Copy link

duplicate #3026

@barrettj12
Copy link
Contributor

I also came across this issue because I was wondering about relatively high CPU usage from the Kubelite process. I have about 20-30% load on idle after a fresh install. (dns, storage, ingress enabled)

I'm also experiencing this. I have microk8s running but with literally nothing to do (idle), so it's CPU/memory usage should be extremely minimal. Nonetheless, I find it consistently using 20% of my CPU.

As a temporary fix, you can stop microk8s and restart it when you actually need it. It goes without saying that this should not be necessary for a user to do.

@barrettj12
Copy link
Contributor

Looking at microk8s kubectl top pod -A, it seems that calico-node is accounting for most of the CPU/memory usage.

@bttger
Copy link

bttger commented Apr 29, 2022

Looking at microk8s kubectl top pod -A, it seems that calico-node is accounting for most of the CPU/memory usage.

Have you run top on the host machine? I also see that calico-node is using the most regarding running pods, but overall the usage of the kubelite process is much higher (ranging between 3-10x higher than calico)

@barrettj12
Copy link
Contributor

Looking at microk8s kubectl top pod -A, it seems that calico-node is accounting for most of the CPU/memory usage.

Have you run top on the host machine? I also see that calico-node is using the most regarding running pods, but overall the usage of the kubelite process is much higher (ranging between 3-10x higher than calico)

So kubelite is excluded from the processes listed in microk8s kubectl top pod -A? Was not aware, you are probably right.

@alexmarshallces
Copy link

I also came across this issue because I was wondering about relatively high CPU usage from the Kubelite process. I have about 20-30% load on idle after a fresh install. (dns, storage, ingress enabled)

I'm also experiencing this. I have microk8s running but with literally nothing to do (idle), so it's CPU/memory usage should be extremely minimal. Nonetheless, I find it consistently using 20% of my CPU.

As a temporary fix, you can stop microk8s and restart it when you actually need it. It goes without saying that this should not be necessary for a user to do.

Adding to this, I'm also seeing similar metrics: 20-40% CPU usage with nothing actually running on the cluster, calico taking up the majority of the processing being done with similar items enabled: dns, storage, ingress. Is there any progress on this ? Are there any other discussion threads where the performance issue is discussed and, ideally, resolved ?

@Atem18
Copy link

Atem18 commented Nov 11, 2022

Same here with Ubuntu 22.04 on Raspberrypi 4 model B 4GB with three nodes HA enabled.

@jglick
Copy link

jglick commented Nov 11, 2022

For purposes of local development and testing I have switched from Microk8s to Kind, for this reason among others.

@Atem18
Copy link

Atem18 commented Nov 11, 2022

Yes but what about people wanting to use it in production ?

@AlexsJones
Copy link
Contributor

AlexsJones commented Nov 12, 2022

Same here with Ubuntu 22.04 on Raspberrypi 4 model B 4GB with three nodes HA enabled.

HA enabled, you mean a control-plane on two nodes and third as worker or all three running as the control-plane?

@Atem18
Copy link

Atem18 commented Nov 12, 2022

Same here with Ubuntu 22.04 on Raspberrypi 4 model B 4GB with three nodes HA enabled.

HA enabled, you mean a control-plane on two nodes and third as worker or all three running as the control-plane?

Hi, I used the following tutorial : https://ubuntu.com/tutorials/getting-started-with-kubernetes-ha?&_ga=2.187560111.665053589.1668255625-380658086.1668255625#1-overview

So I think the control plane is running on all the three nodes

@mikezerosix
Copy link

I have same problem, kubelite is running near 100% CPU. Ok, I was cheap and ran master Rasbi 3. I just set it up running version 1.23 as 1.25 did not work with it claiming cgroups were in enabled. There are no containers deployed and it can barely cope with 1 node joining.

kubelite at 75% and next ones are containerd 15+10% , sometime dblite jumps up.
BTW can I kill disable contianerd on master ? Isn't that unnecessary on master ?

@neoaggelos
Copy link
Contributor

neoaggelos commented Dec 17, 2022

Hi @mikezerosix

BTW can I kill disable contianerd on master ? Isn't that unnecessary on master ?

Control plane nodes are also registered as workers by default, a work-around to prevent workloads from being scheduled there are to drain and taint the control plane nodes with:

microk8s kubectl drain $node
microk8s kubectl taint node $node key1=value1:NoSchedule

Also, can you share an inspection report so that we can have a look at it?

@augusto
Copy link

augusto commented Jun 14, 2023

I have the same issue. Installed recently on Ubuntu 22.04. Running on a VM with 6 cores and 32gb of ram.
I'm running microk8s from snap 1.27/stable. I have disabled ha and enabled dns; no pods running.

The 2 main processes eating cpu are kubelite and etcd and they use roughtly 20% of it.

Any idea how to resolve this?

@ktsakalozos
Copy link
Member

Hi @augusto Kubernetes services (API server, proxy, kubelet, scheduler, controller manager) will always produce in idle. For example the K8s services (all under kubelite) will constantly query the state of the cluster so as to figure out if there is work for them to do. Depending on the hardware you are running MicroK8s on 20% may be expected. Could you share a microk8s inspect tarball so that we check if this is the case or if there is a problem with the cluster?

@fybmain
Copy link

fybmain commented Jul 21, 2023

@ktsakalozos How frequent will the polling be? May the user adjust the frequency of polling?
I assigned all six P-cores of a Core-i7 12650H to my VM running microk8s. kubelite consumes about 10% of CPU on average.

@masterkain
Copy link

masterkain commented Jul 22, 2023

I see the same thing on microk8s 1.27 on ubuntu 23

  • /snap/microk8s/5372/kubelite --scheduler-args-file=/var/snap/microk8s/5372/args/kube-scheduler --controller-manage... => 10% cpu average
  • /snap/microk8s/5372/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/5372/var/kubernetes/backe... => 7% cpu average

this is on a i5-12600KF

must say that using ~20% cpu 24/7 is not really ideal

@ktsakalozos
Copy link
Member

@fybmain @masterkain I am not aware of any way to tune the frequency k8s services (many) check for state changes. But it seems a reasonable ask. Maybe upstream kubernetes has an answer. For sure if there is a way to tune this, MicroK8s you can do it in MicroK8s.

The percentages we are talking about here or on one core, right? What tool do you use to measure them?

@masterkain
Copy link

@ktsakalozos thanks for the reply -- nothing fancy, just a quick glance at top so the 20% is on all cores, this is on my baremetal ubuntu server homelab with just microk8s on it - in an effort to reduce wattage/billing costs I started to review some stuff and just happened to stumble upon this kind of usage

@augusto
Copy link

augusto commented Jul 30, 2023

Sorry for the late reply @ktsakalozos ! I've torn down the microk8s vm I had. Not sure if this is of any use, but I installed K3s and out of the box the CPU utilisation was ~10% (compared to ~20% in microK8s after removing HA). Somehow K3s manages to use 50% less CPU than microk8s.

@luispabon
Copy link

luispabon commented Dec 20, 2023

Here's a comparison of k3s vs microk8s (HA off) on identical VMs on the same host with a similar amount of pods running:

image

Microk8s is a new install on a pristine system and k3s has an uptime of over 2 months

@BxL221
Copy link

BxL221 commented Apr 28, 2024

When you install 1.20.5 did you purge the snap? For example: sudo snap remove microk8s --purge Just curious.

Great, thanks, I had the same problem with ubuntu server 2204 LTS , and purge solved it.

Saw that microk8s used more than 500% cpu on htop.

@Azbesciak
Copy link

Hi, any update on that topic? It is quite sad that problem has more than 3 years and it is still not solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests