New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microk8s in permanent failed state on reboot ( raspberry pi 4, affects single and multi node cluster ) #2204
Comments
Hi @horvatic, could you share an inspection tarball? |
I reinstalled microk8's so let me attempt to trigger a failed state. |
Here are three tarballs from two different nodes: Node Is Cluster And Reboot: inspection-report-20210426_174255.tar.gz Node in Cluster: inspection-report-20210426_173935.tar.gz NOTE: Node one did come back online after a few mins |
Another ball from another node: I uninstalled my current setup. Deleted my snap/microk8's folder. Then reinstalled. Once reinstalled I made sure it was working, and rebooted. Once rebooted I have a failed state. NOTE: ha was setup, but the issue also happens when ha is not setup |
Looking at the logs of the containerd (
Could you confirm that |
What info will that tell us? Is it just a issue with ceder, as I attached redwood and another one |
When containerd starts it tries to recover any running containers. It does so by looking at the containers state kept in |
Ah I see, do you want me to run the command on all my nodes: redwood, cesder, and willow? |
I am fairly certain you will get Maybe this line [1] needs to be updated with a better way to detect the boot time. Is there anything special about your setup kernel/hardware/distribution? I need to reproduce this. Maybe [1] https://github.com/ubuntu/microk8s/blob/master/microk8s-resources/actions/common/utils.sh#L683 |
Here are the results
I also did about five reboots to confirm these number are correct, and checked the status of microk8's each time. If it was not running I would run start, then check the status. It looks like if you keep restarting and run start it will come back online about 2/5 times NOTE: This is also not a waiting issue, as I had a node in this state for six days |
@ktsakalozos My setup is 3 raspberry pis, 4 GB of ram, 64 GB SD, Ubuntu 20.04 LTS. Using wired connection. Nothing else is installed expected microk8s |
@horvatic, referenced in this issue you will find a PR with a fix. As soon as the PR gets merged it should be on As a mitigation for now you can either:
[1] https://github.com/ubuntu/microk8s/pull/2207/checks?check_run_id=2442807158 |
@ktsakalozos ill install the microk8s.snap package tonight, and test it out. Will post results once testing is done! |
I am having this error: Can you link me the arm build? Install using: |
Unfortunately you need an ARM64 build and the github action produces a AMD64 snap. |
@ktsakalozos I will be unable to test until an arm build is produced. Should I just wait until the edge release? |
@horvatic just merged the fix. We should have an arm64 within the day. |
@ktsakalozos kk i'll test it tomorrow at 5 pm CST and report the results |
@ktsakalozos All tested Restarted 3 times, and all times microk8's started on reboot successfully! NOTE: There is a delay of about 10 secs if you reboot. So if immediately run microk8's status after a reboot it will show microk8's isn't running. If this issue get logged again the user may just need to wait 10 secs or so. I do not count this as a bug, as it takes time to start up the services. |
Running on a raspberry pi 4, ubuntu 20.04 LTS fresh install. When rebooting there is a possibility of a permanent failed state. When this happens this message will appear: microk8s is not running. Use microk8s inspect for a deeper inspection.
Running: microk8s inspect shows no errors
Running: microk8s start or microk8s stop; microk8s start will not solve the issues
Running: microk8s reset will also not solve the issue
Running microk8s stop before the reboot seems to help, but still fails on some snap update
Rebooting will also not solve the issue
This issue seems to happen at random. I can reboot once, or twice with no failed state. Once I get a failed state there is no way to recover unless microk8's is re-installed
The text was updated successfully, but these errors were encountered: