Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zsysctl service gc doesn't clean up bpool #155

Open
malachid opened this issue Jul 7, 2020 · 34 comments
Open

zsysctl service gc doesn't clean up bpool #155

malachid opened this issue Jul 7, 2020 · 34 comments

Comments

@malachid
Copy link

malachid commented Jul 7, 2020

Describe the bug
During my daily apt update, I got this message:

ERROR couldn't save system state: Minimum free space to take a snapshot and preserve ZFS performance is 20%.
Free space on pool "bpool" is 17%.
Please remove some states manually to free up space. 

Checking bpool, I can see that sudo du -h /boot shows 330M while sudo zfs list -t all -r bpool shows the dataset taking up 1.56G of the available 1.88G. Some of the auto snapshots are from 3 months ago.

I noticed there was no gc timer; so tried running it manually instead.

I ran sudo zsysctl service gc --all -vv which cleaned up rpool a bit, but didn't touch bpool.

I thought about manually deleting them, but saw that others reported having update-grub problems after doing so.

If it is intentional that gc shouldn't clean up the bpool as well; then maybe some documentation on the correct way to clean it up would be ideal.

To Reproduce
Steps to reproduce the behavior:

  1. apt update regularly
  2. eventually, bpool is too full

Expected behavior
Either via timer or manually, able to gc the bpool

For ubuntu users, please run and copy the following:

  1. ubuntu-bug zsys --save=/tmp/report
  2. Copy paste below /tmp/report content:
note: /tmp/report too long - attaching
[report.txt](https://github.com/ubuntu/zsys/files/4886252/report.txt)

Screenshots
If applicable, add screenshots to help explain your problem.

Installed versions:

  • OS: (/etc/os-release)
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Zsysd running version: (zsysctl version output)
zsysctl	0.4.6
zsysd	0.4.6

Additional context
Add any other context about the problem here.

@didrocks
Copy link
Member

Thanks for reporting this bug and for the logs.
From what I saw, bpool isn’t ignored by the GC, you have a reasonable number of snapshots, but holds 8 kernels (count 90M per kernel).
However, your /boot contains 1.56-8*90 = 700M (equivalent to 8 other kernels) which is surprising. Can you look in your /boot and reporting du -h here? (ubuntu should only keep 2 kernels).
There is always ways to removing snapshots manually if you need it via zsysctl state remove .

As a side note, I see you are using docker. We are in discussion with the docker team to change the default layout so that we are not using snapshots. I strongly suggest you to follow the instructions on https://github.com/ubuntu/zsys/wiki/Performance-issue-with-docker-on-ubuntu-20.04-LTS with docker daemon stopped. Also, you can remove then some stopped containers as docker is filing up many datasets.

@malachid
Copy link
Author

I'll take a look at the docker change.

Here's the /boot usage:

root@TN00687:/boot# du -h
3.4M	./grub/x86_64-efi
2.3M	./grub/fonts
8.0M	./grub
3.4M	./efi/grub/x86_64-efi
2.3M	./efi/grub/fonts
8.0M	./efi/grub
21M	./efi/EFI/ubuntu/fw
25M	./efi/EFI/ubuntu
3.7M	./efi/EFI/BOOT
20M	./efi/EFI/Dell/Bios/Recovery
20M	./efi/EFI/Dell/Bios
20M	./efi/EFI/Dell
49M	./efi/EFI
57M	./efi
330M	

@malachid
Copy link
Author

Thank you for the docker script. Looks like I can't run it out of the box (ifs/greps/etc don't match up to my system) but I can scavenge some of the purging logic from it.

@muiga
Copy link

muiga commented Jul 28, 2020

I am also experiencing the same.
Screenshot_20200728_112125

@JKDingwall
Copy link

JKDingwall commented Sep 23, 2020

I have experienced this too. In my case during system setup using an ansible playbook there are several calls to apt. As the initrd gets rebuilt multiple times with each version getting a new snapshot I quickly ran out of space on my 1Gb bpool even with only two kernels. (I can't remember if I manually set that size or if it was the automatic partitioning from the 20.04 installer.)

@ddnexus
Copy link

ddnexus commented Oct 13, 2020

Yes, the following is the output of update-grub for just 4 kernels (one of which has already been purged) and usage of just a coupole of weeks.

It looks really huge to me:

$ sudo update-grub                                     
[sudo] password for dd: 
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_1gi7xq
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_1gi7xq
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_1gi7xq
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_1gi7xq
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_1gi7xq
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_1gi7xq
Found linux image: vmlinuz-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q
Found initrd image: initrd.img-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_wptoxo
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_wptoxo
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_wptoxo
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_wptoxo
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_wptoxo
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_wptoxo
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_r5es0v
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_r5es0v
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_r5es0v
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_r5es0v
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_r5es0v
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_r5es0v
dpkg: warning: version 'grub/vmlinuz-5.6.0-1028-oem' has bad syntax: version number does not start with digit
dpkg: warning: version 'grub/vmlinuz-5.4.0-48-generic' has bad syntax: version number does not start with digit
dpkg: warning: version 'grub/vmlinuz-5.4.0-42-generic' has bad syntax: version number does not start with digit
dpkg: warning: version 'grub/vmlinuz-5.6.0-1028-oem' has bad syntax: version number does not start with digit
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@mirror-grub
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@mirror-grub
dpkg: warning: version 'grub/vmlinuz-5.4.0-42-generic' has bad syntax: version number does not start with digit
dpkg: warning: version 'grub/vmlinuz-5.4.0-48-generic' has bad syntax: version number does not start with digit
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@mirror-grub
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@mirror-grub
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@mirror-grub
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@mirror-grub
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_1rdx59
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_1rdx59
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_1rdx59
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_1rdx59
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_1rdx59
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_1rdx59
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_h73coe
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_h73coe
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_h73coe
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_h73coe
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_h73coe
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_h73coe
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_thw000
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_thw000
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_thw000
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_thw000
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_thw000
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_thw000
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_0w2ewl
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_0w2ewl
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0w2ewl
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0w2ewl
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0w2ewl
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0w2ewl
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_kilrbs
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_kilrbs
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_kilrbs
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_kilrbs
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_kilrbs
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_kilrbs
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_0qvhtz
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_0qvhtz
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0qvhtz
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0qvhtz
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0qvhtz
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_0qvhtz
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_txu9yj
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_txu9yj
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_txu9yj
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_txu9yj
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_txu9yj
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_txu9yj
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_61mfcq
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_61mfcq
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_61mfcq
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_61mfcq
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_61mfcq
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_61mfcq
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_zqe85r
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_zqe85r
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_zqe85r
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_zqe85r
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_zqe85r
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_zqe85r
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_8fejv3
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_8fejv3
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8fejv3
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8fejv3
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8fejv3
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8fejv3
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_8io7lt
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_8io7lt
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8io7lt
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8io7lt
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8io7lt
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_8io7lt
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_dbqn85
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_dbqn85
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_dbqn85
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_dbqn85
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_dbqn85
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_dbqn85
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_lrwugm
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_lrwugm
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_lrwugm
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_lrwugm
Found linux image: vmlinuz-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_lrwugm
Found initrd image: initrd.img-5.4.0-42-generic in rpool/ROOT/ubuntu_y3331q@autozsys_lrwugm
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_igxah5
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_igxah5
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_igxah5
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_igxah5
Found linux image: vmlinuz-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q@autozsys_yz0gmj
Found initrd image: initrd.img-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q@autozsys_yz0gmj
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_yz0gmj
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_yz0gmj
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_yz0gmj
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_yz0gmj
Found linux image: vmlinuz-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q@autozsys_063x8p
Found initrd image: initrd.img-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q@autozsys_063x8p
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_063x8p
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_063x8p
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_063x8p
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_063x8p
Found linux image: vmlinuz-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q@autozsys_z5svpo
Found initrd image: initrd.img-5.6.0-1030-oem in rpool/ROOT/ubuntu_y3331q@autozsys_z5svpo
Found linux image: vmlinuz-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_z5svpo
Found initrd image: initrd.img-5.6.0-1028-oem in rpool/ROOT/ubuntu_y3331q@autozsys_z5svpo
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_z5svpo
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_y3331q@autozsys_z5svpo
Adding boot menu entry for UEFI Firmware Settings
done

@naisanzaa
Copy link

I'm having this same problem.

justin@timberlake:~/antsable$ zfs list bpool
NAME    USED  AVAIL     REFER  MOUNTPOINT
bpool  1.74G  6.00M       96K  /boot

@xavier83ar
Copy link

I'm having the same issue.

javier@javier-note:/boot$ du -h /boot/
3,4M    /boot/grub/x86_64-efi
2,3M    /boot/grub/fonts
8,0M    /boot/grub
4,2M    /boot/efi/EFI/ubuntu
3,7M    /boot/efi/EFI/BOOT
12M     /boot/efi/EFI/Dell/Bios/Recovery
12M     /boot/efi/EFI/Dell/Bios
12K     /boot/efi/EFI/Dell/logs
12M     /boot/efi/EFI/Dell
20M     /boot/efi/EFI
3,4M    /boot/efi/grub/x86_64-efi
2,3M    /boot/efi/grub/fonts
8,0M    /boot/efi/grub
28M     /boot/efi
371M    /boot/
javier@javier-note:/boot$ zpool list bpool
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
bpool  1,88G  1,63G   246M        -         -    45%    87%  1.00x    ONLINE  -

When I run apt autoremove there is a lot of entries like this:

...
Found linux image: vmlinuz-5.4.0-54-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found initrd image: initrd.img-5.4.0-54-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found linux image: vmlinuz-5.4.0-53-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found initrd image: initrd.img-5.4.0-53-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found linux image: vmlinuz-5.4.0-51-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found initrd image: initrd.img-5.4.0-51-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found linux image: vmlinuz-5.4.0-48-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found initrd image: initrd.img-5.4.0-48-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found linux image: vmlinuz-5.4.0-47-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found initrd image: initrd.img-5.4.0-47-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found linux image: vmlinuz-5.4.0-45-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found initrd image: initrd.img-5.4.0-45-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_9kqbmg
Found linux image: vmlinuz-5.4.0-58-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_6qenhr
Found initrd image: initrd.img-5.4.0-58-generic in rpool/ROOT/ubuntu_lj59zg@autozsys_6qenhr
...

but all those kernels have been already removed.
As I understand, there are datasets/snapshots holding already removed image kernels, is there a way of manually remove them? or any other workaround for this?

@sdelrio
Copy link

sdelrio commented Jan 7, 2021

Try with zsysctl to remove state, it worked for me.

zsysctl state remove lj59zg --system

@xavier83ar
Copy link

It didn't work for me.

ERROR couldn't remove system state lj59zg: Removing current system state isn't allowed 

@sdelrio
Copy link

sdelrio commented Jan 7, 2021

Sorry the right part of the autozsys_ (perhaps needs sudo):

zsysctl state remove 9kqbmg --system

You can list the boot snaphots anytime with

zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT   

@xavier83ar
Copy link

xavier83ar commented Jan 7, 2021

@sdelrio Thanks! Now it worked. I had to remove more than 10 states! but now I have plenty of free space with a only a few kernels installed.

@stephen-mw
Copy link

Here's the log lines that will probably bring people here to fix this issue:

ERROR couldn't save system state: Minimum free space to take a snapshot and preserve ZFS performance is 20%.
Free space on pool "bpool" is 19%.
Please remove some states manually to free up space.

If you update your zfs-on-root ubuntu system often (which it looks like all of us do), you'll probably run into this issue. The solution is what @sdelrio recommends:

  1. gather a list of snapshots that are old
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT   
  1. Manually delete these snapshots:
# Replace 9kqbmg with the ID of your snapshot.
zsysctl state remove 9kqbmg --system

@sdelrio
Copy link

sdelrio commented Jan 12, 2021

if it helps I modify the zsys.conf, because in my case the computer is a workstation, not a server, and the default values are too much snapshots for me, and I do frequent apt update because some extra apt repos. I copy/paste my file and the blog I read to understand all this zysys thing.

  • /etc/zsys.conf
history:
  # https://didrocks.fr/2020/06/04/zfs-focus-on-ubuntu-20.04-lts-zsys-state-collection/
  # Keep at least n history entry per unit of time if enough of them are present
  # The order condition the bucket start and end dates (from most recent to oldest)
  gcstartafter: 1
  keeplast: 5 # Minimum number of recent states to keep.
  #    - name:             Abitrary name of the bucket
  #      buckets:          Number of buckets over the interval
  #      bucketlength:     Length of each bucket in days
  #      samplesperbucket: Number of datasets to keep in each bucket
  gcrules:
    - name: PreviousDay
      buckets: 1
      bucketlength: 1
      samplesperbucket: 2
    - name: PreviousWeek
      buckets: 5
      bucketlength: 1
      samplesperbucket: 1
    - name: PreviousMonth
      buckets: 2
      bucketlength: 14
      samplesperbucket: 1
general:
  # Minimal free space required before taking a snapshot
  minfreepoolspace: 18
  # Daemon timeout in seconds
  timeout: 60

@mcarifio
Copy link

mcarifio commented Feb 1, 2021

I'm still somewhat confused. I did the @sdelrio + @stephen-mw recipe to remove unnecessary intermediate "states" and end up with:

# du -sh /boot
241M	/boot
# zfs list -t all -r bpool -o space
NAME                                       AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
bpool                                       955M   837M        0B     96K             0B       837M
bpool@mksnapshot-focal                         -     0B         -       -              -          -
bpool/BOOT                                  955M   836M        0B     96K             0B       836M
bpool/BOOT@focal                               -     0B         -       -              -          -
bpool/BOOT@mksnapshot-focal                    -     0B         -       -              -          -
bpool/BOOT/ubuntu_4wysfp                    955M   836M      618M    217M             0B         0B
bpool/BOOT/ubuntu_4wysfp@focal                 -    64K         -       -              -          -
bpool/BOOT/ubuntu_4wysfp@mksnapshot-focal      -    64K         -       -              -          -
bpool/BOOT/ubuntu_4wysfp@autozsys_ewktga       -    72K         -       -              -          -
bpool/BOOT/ubuntu_4wysfp@autozsys_sp4q3t       -    72K         -      

Which seem close to each other (zfs has some kind of bookkeeping overhead).

Did this "work"? How do I know? Will I have to do this exercise again in a few months?

Is /etc/zsys.conf "incremental"? Meaning if my only change is history.keeplast:5, can I state just that, i.e.:

history:
  keeplast: 5

I understand you're not responsible for my lack of zfs + zsys knowledge, but I get a little squeemish when I get "boot" related error messages because I don't understand grub + initrd + kernels all that much better.

@sdelrio
Copy link

sdelrio commented Feb 1, 2021

Exact, zfs will only keep the changes. If you only want you change the value, you should get the default config and modify that value. Take a look on https://didrocks.fr/2020/06/04/zfs-focus-on-ubuntu-20.04-lts-zsys-state-collection/ for the default policy.

So if your /boot has small changes, you could keep a lot of history.keeplast. But if you install kernels, drivers (like some beta nightly builds),each snapshots take more space, and if the /boot partition is small it can get full.
I guess you should keep your history depending on the usage. The same happens for general.minfreepoolspace.
With in freepoolsapce: 20:

  • 128GB SSD a minfreepoolspace: 20 -> 25GB
  • 1TB SSD a minfreepoolspace: 20 -> 200GB

Depending on the case you could need to update this values so you can get more snapshots or less.

@mkohler
Copy link

mkohler commented Feb 5, 2021

In case it is useful, here are a few more details about how it occurred in my case. I am using a new install of Ubuntu 20.04. This is my first experience with zfs and I hadn't made any changes to the default zfs configuration, or issued any zfs commands at all, for that matter.

From what I saw, bpool isn’t ignored by the GC, you have a reasonable number of snapshots, but holds 8 kernels (count 90M per kernel).

When I noticed the "ERROR couldn't save system state: Minimum free space to take a snapshot..." errors, my system had 3 kernels installed in /boot: 5.8.0-38-generic, 5.8.0-40-generic, and 5.8.0-41-generic. I had let the installer do its own partitioning, which resulted in a /boot partition of 2147MB. Here's what df -h reported.

$ df -h
Filesystem                                        Size  Used Avail Use% Mounted on
udev                                              3.8G     0  3.8G   0% /dev
tmpfs                                             785M  1.9M  783M   1% /run
rpool/ROOT/ubuntu_j8fkpf                          578G  4.8G  574G   1% /
tmpfs                                             3.9G     0  3.9G   0% /dev/shm
tmpfs                                             5.0M  4.0K  5.0M   1% /run/lock
tmpfs                                             3.9G     0  3.9G   0% /sys/fs/cgroup
bpool/BOOT/ubuntu_j8fkpf                          364M  293M   71M  81% /boot
rpool/ROOT/ubuntu_j8fkpf/srv                      574G  128K  574G   1% /srv
rpool/ROOT/ubuntu_j8fkpf/var/games                574G  128K  574G   1% /var/games
rpool/ROOT/ubuntu_j8fkpf/var/snap                 574G  1.2M  574G   1% /var/snap
rpool/USERDATA/root_z76wa2                        574G  128K  574G   1% /root
rpool/ROOT/ubuntu_j8fkpf/var/log                  574G  220M  574G   1% /var/log
/dev/sda1                                         511M  7.1M  504M   2% /boot/efi
rpool/USERDATA/mk_z76wa2                          867G  294G  574G  34% /home/mk
rpool/ROOT/ubuntu_j8fkpf/var/mail                 574G  128K  574G   1% /var/mail
rpool/ROOT/ubuntu_j8fkpf/usr/local                574G  256K  574G   1% /usr/local
rpool/ROOT/ubuntu_j8fkpf/var/lib                  576G  2.2G  574G   1% /var/lib
rpool/ROOT/ubuntu_j8fkpf/var/spool                574G  256K  574G   1% /var/spool
rpool/ROOT/ubuntu_j8fkpf/var/www                  574G  128K  574G   1% /var/www
/dev/loop0                                        136M  136M     0 100% /snap/chromium/1461
/dev/loop3                                        136M  136M     0 100% /snap/chromium/1466
/dev/loop2                                        163M  163M     0 100% /snap/gnome-3-28-1804/145
rpool/ROOT/ubuntu_j8fkpf/var/lib/NetworkManager   574G  256K  574G   1% /var/lib/NetworkManager
rpool/ROOT/ubuntu_j8fkpf/var/lib/AccountsService  574G  128K  574G   1% /var/lib/AccountsService
rpool/ROOT/ubuntu_j8fkpf/var/lib/dpkg             574G   57M  574G   1% /var/lib/dpkg
/dev/loop4                                         56M   56M     0 100% /snap/core18/1932
/dev/loop1                                         56M   56M     0 100% /snap/core18/1944
/dev/loop5                                        218M  218M     0 100% /snap/gnome-3-34-1804/60
rpool/ROOT/ubuntu_j8fkpf/var/lib/apt              574G   75M  574G   1% /var/lib/apt
/dev/loop6                                        162M  162M     0 100% /snap/gnome-3-28-1804/128
/dev/loop7                                         32M   32M     0 100% /snap/snapd/10492
/dev/loop8                                        219M  219M     0 100% /snap/gnome-3-34-1804/66
/dev/loop9                                         32M   32M     0 100% /snap/snapd/10707
/dev/loop10                                        52M   52M     0 100% /snap/snap-store/498
/dev/loop11                                        52M   52M     0 100% /snap/snap-store/518
/dev/loop12                                        63M   63M     0 100% /snap/gtk-common-themes/1506
/dev/loop13                                        65M   65M     0 100% /snap/gtk-common-themes/1514
tmpfs                                             785M   24K  785M   1% /run/user/1000
$

I've attached the output from ubuntu-bug zsys.
report.txt

I was able to recover disk space using the commands that @sdelrio posted. I just kept deleting the oldest snapshot. After deleting 18 snapshots, the size of the filesystem, as reported by df -h had grown 364M to 1.4G, and I was able to use apt again without getting the "couldn't save system state" errors.

@sdelrio
Copy link

sdelrio commented Feb 5, 2021

The default minfreepoolspace is 20% and you have 81% usage on /boot, perhaps you need to remove old kernel or just your 364mb size for that partition is really low.

I have 3 kernels and manual compiled vga drivers and I have 353Mb used.

> zfs list -t all -r bpool -o space
NAME                                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
bpool                                     1,40G   353M        0B     96K             0B       353M
bpool/BOOT                                1,40G   352M        0B     96K             0B       352M
bpool/BOOT/ubuntu_awq93l                  1,40G   352M      392K    351M             0B         0B
bpool/BOOT/ubuntu_awq93l@autozsys_v0xf8i      -    80K         -       -              -          -
bpool/BOOT/ubuntu_awq93l@autozsys_8elb4o      -    80K         -       -              -          -
bpool/BOOT/ubuntu_awq93l@autozsys_vanixu      -    80K         -       -              -          -
bpool/BOOT/ubuntu_awq93l@autozsys_kgszdx      -    80K         -       -              -          -
bpool/BOOT/ubuntu_awq93l@autozsys_fvavlx      -    72K         -       -              -          -

@mkohler
Copy link

mkohler commented Feb 5, 2021

I may be just re-stating what @sdelrio already said (thanks!), but putting all of this together, it looks to me like there is an incompatibility between the default size of the /boot partition, 2GB, and the default keeplast value of 20.

Let's consider the worst case for bpool, in terms of state storage. I think the worst case would be that every time zsys stores a new state, there is a new kernel in /boot that must be stored. Even though zfs is only storing the differences, each state will require ~100MB of new storage. Since the default policy of zsys is to keep at least 20 states, that would require 20 * 100MB = 2GB of storage. That means we're using our entire partition for snapshots. From what I saw, zsys stopped creating snapshots before the partition was actually full, which is good, but I think this explains why people are seeing problems with not having enough space to store new snapshots.

@Venomtek
Copy link

Just came across this problem on 20.10.
It's also disconcerting that apt would continue to upgrade/install like nothing is wrong instead of exiting with an error and stating to use a -Force switch to continue without zsys snapshots.

@thomasesr
Copy link

I am also getting this message. I installed using the default ZFS settings on the Ubuntu Installer on the entire disk and it created a small partition to store the bpool. It should either make a bigger partition or limit the max number of snapshots to a lower number, like 5 or 10. Where can I configure the max autosnapshots (saved states) to keep?

@Venomtek
Copy link

Where can I configure the max autosnapshots (saved states) to keep?

Can we tweak that manually?

We described the default policy embedded in the binary. However, as an experiment (and because all those rules are not set in stone yet), you can define your own policy manually by copying the configuration policy file zsys.conf as /etc/zsys.conf and tweaking it here. This change will be effective after calling zsysctl service reload or on daemon restart.

Source: https://didrocks.fr/2020/06/04/zfs-focus-on-ubuntu-20.04-lts-zsys-state-collection/

@Venomtek
Copy link

Venomtek commented Mar 4, 2021

I've made a script to do manual garbage collection.

Right now it's interactive, if I'm feeling less lazy I might make it so you can pass CLI variables to it.

Unless anyone else wants to tackle that.

Cheers!

DISCLAIMER: User at your own risk, I accept no responsibility or liability whatsoever. You have been warned.

@berenddeboer
Copy link

Here's a simple script to remove the first 5 entries:

zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT | head -n 5 | cut -c 35-40 | xargs -n 1 zsysctl state remove --system

@mcarifio
Copy link

mcarifio commented Jun 6, 2021

I'm about to upgrade to hirstute using do-release-upgrade. If something goes south, will the automatic snapshotting save me? If so, how? Please advise. Thanks.

@didrocks
Copy link
Member

didrocks commented Jun 7, 2021

Yes, (if your system didn’t hit the threadshold as explained in this bug) please see https://didrocks.fr/2020/05/28/zfs-focus-on-ubuntu-20.04-lts-zsys-general-principle-on-state-management/ on the rollback principle.

@mcarifio
Copy link

mcarifio commented Jun 7, 2021

@didrocks, slick! I upgraded two "zfs rooted" machines without a lot of backup ceremony.
In the last few cycles, I have always done a full disk image backup up in case the upgrade failed.
Which it has a few times in the last few years. I've also replaced all my nvidia graphics cards with amd ati ones, which perhaps might confound these results as well. Nvidia was always one of the blockers. No more.

Here are my commands for posterity. I have a few questions below.

zsysctl state save --system before-21.04 # manual state save in case upgrade gets confused
do-release-upgrade --allow-third-party
# answer questions, install stuff, reboot

# after reboot
lsb_release -sc
hirsute

zsysctl show

Name:           rpool/ROOT/ubuntu_jwoh13
ZSys:           true
Last Used:      current
History:        
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_ugazw0
    Created on: 2021-06-07 11:20:12
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_zr26da
    Created on: 2021-06-07 11:20:11
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_8jcolo
    Created on: 2021-06-07 11:20:11
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_c8tzhz
    Created on: 2021-06-07 10:55:04
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_4t2hsk
    Created on: 2021-06-07 10:55:04
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_tj0b24
    Created on: 2021-06-07 10:55:03
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_lwy3w8
    Created on: 2021-06-07 10:53:32
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_ovtmvf
    Created on: 2021-06-07 10:53:31
  - Name:       rpool/ROOT/ubuntu_jwoh13@autozsys_e4w0vr
    Created on: 2021-06-07 10:53:31
  - Name:       rpool/ROOT/ubuntu_jwoh13@before-21.04
    Created on: 2021-06-07 10:37:01
Users:
  - Name:    mcarifio
    History: 
     - rpool/USERDATA/mcarifio_f8h849@autozsys_tb5ooj (2021-06-07 11:25:50)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_zr26da (2021-06-07 11:20:12)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_ugazw0 (2021-06-07 11:20:12)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_8jcolo (2021-06-07 11:20:11)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_1m5lby (2021-06-07 11:19:28)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_c8tzhz (2021-06-07 10:55:04)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_4t2hsk (2021-06-07 10:55:04)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_tj0b24 (2021-06-07 10:55:03)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_lwy3w8 (2021-06-07 10:53:32)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_ovtmvf (2021-06-07 10:53:31)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_e4w0vr (2021-06-07 10:53:31)
     - rpool/USERDATA/mcarifio_f8h849@before-21.04 (2021-06-07 10:37:01)
     - rpool/USERDATA/mcarifio_f8h849@autozsys_7v2ng6 (2021-06-07 10:18:28)

... many lines removed ...

- Name:    root
    History: 
     - rpool/USERDATA/root_f8h849@autozsys_zr26da (2021-06-07 11:20:12)
     - rpool/USERDATA/root_f8h849@autozsys_ugazw0 (2021-06-07 11:20:12)
     - rpool/USERDATA/root_f8h849@autozsys_8jcolo (2021-06-07 11:20:11)
     - rpool/USERDATA/root_f8h849@autozsys_c8tzhz (2021-06-07 10:55:04)
     - rpool/USERDATA/root_f8h849@autozsys_4t2hsk (2021-06-07 10:55:04)
     - rpool/USERDATA/root_f8h849@autozsys_tj0b24 (2021-06-07 10:55:03)
     - rpool/USERDATA/root_f8h849@autozsys_lwy3w8 (2021-06-07 10:53:32)
     - rpool/USERDATA/root_f8h849@autozsys_ovtmvf (2021-06-07 10:53:31)
     - rpool/USERDATA/root_f8h849@autozsys_e4w0vr (2021-06-07 10:53:31)
     - rpool/USERDATA/root_f8h849@before-21.04 (2021-06-07 10:37:01)
     - rpool/USERDATA/root_f8h849@autozsys_5ddk02 (2021-06-06 16:07:37)
     - rpool/USERDATA/root_f8h849@autozsys_z63w7x (2021-06-06 16:00:07)
     - rpool/USERDATA/root_f8h849@autozsys_qr1gda (2021-06-05 06:09:13)
     - rpool/USERDATA/root_f8h849@autozsys_mtgsww (2021-06-04 06:26:02)
     - rpool/USERDATA/root_f8h849@autozsys_jcv2s0 (2021-06-03 06:07:08)
     - rpool/USERDATA/root_f8h849@autozsys_4t9it7 (2021-06-02 06:03:46)
     - rpool/USERDATA/root_f8h849@autozsys_dil5hr (2021-05-31 18:40:02)
     - rpool/USERDATA/root_f8h849@autozsys_7hl69w (2021-05-26 06:30:29)

As you can see I "reaped" all the older system snapshots and then cut one of my own: rpool/ROOT/ubuntu_jwoh13@before-21.04. Where did _jwoh13 come from? I assume it was generated
on the first zfs rooted install? What is rpool/ROOT/ubuntu_jwoh13 actually? zfs list labels it a name. What's a zfs name? Is the triple significant? Or just a naming convention?

If I look at the system snapshot rpool/ROOT/ubuntu_jwoh13, inferring from the timestamps, it appears the do-release-upgrade took six snapshots during the upgrade process.
Why six? Is it always six or just the luck of this draw?

I assume I can take both snapshots and clones of any zfs file system on my machine. It looks like I should stay clear of any name with an autozsys_ prefix but otherwise I can
have free reign (yes?).

Finally looking at the /boot/grub/grub.cfg boot entry, I see a menu item like:

	menuentry 'Ubuntu 21.04, with Linux 5.8.0-55-generic' --class ubuntu --class gnu-linux --class gnu --class os ${menuentry_id_option} 'gnulinux-rpool/ROOT/ubuntu_jwoh13-5.8.0-55-generic' {
		recordfail
		load_video
		gfxmode ${linux_gfx_mode}
		insmod gzio
		if [ "${grub_platform}" = xen ]; then insmod xzio; insmod lzopio; fi
		insmod part_gpt
		insmod zfs
		set root='hd0,gpt3'
		if [ x$feature_platform_search_hint = xy ]; then
		  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,gpt3 --hint-efi=hd0,gpt3 --hint-baremetal=ahci0,gpt3  147d6c672377d269
		else
		  search --no-floppy --fs-uuid --set=root 147d6c672377d269
		fi
		echo Loading Linux 5.8.0-55-generic ...
		linux	"/BOOT/ubuntu_jwoh13@/vmlinuz-5.8.0-55-generic" root=ZFS="rpool/ROOT/ubuntu_jwoh13" ro 
		echo 'Loading initial ramdisk ...'
		initrd	"/BOOT/ubuntu_jwoh13@/initrd.img-5.8.0-55-generic"
	}

My understanding of this logic is that you're using the uuid of rpool/ROOT/ubuntu_jwoh13 to set grub2's root.
But the value seems wrong:

zfs list rpool/ROOT/ubuntu_jwoh13 -o guid
 GUID
15224702881960400867

which is not 147d6c672377d269. I can't find this uuid anywhere:

zfs list -o guid| grep 147d6c672377d269
blkid | grep 147d6c672377d269

Where is it?

@didrocks
Copy link
Member

didrocks commented Jun 8, 2021

I suggest reading the whole blog post suite on https://didrocks.fr/tags/zfs/, which will answer most of your question (in the repo README, this is referenced), but report are not really the place to ask for user support outside of bugs, let’s try to keep the bugs focused please.

@mcarifio
Copy link

mcarifio commented Jun 8, 2021

Yes, of course. Got carried away in my enthusiasm.

@Venotek, ty for that "reaping script". Saved me some time.

@lots0logs
Copy link

When I try to remove older snapshots using sudo zsysctl state remove c5rjin --system command it just hangs. I get no output. I can see the process is using CPU though. How long should it take? It's already been 5 minutes and still no response or output from the command.

@lots0logs
Copy link

Here is the debug output. I can't tell if its actually making progress or stuck looping. In my previous attempt I let it run for 5 minutes. This output is just the first 20 seconds of another attempt to run it.

zsysctl.zip

@1MachineElf
Copy link

1MachineElf commented Jun 16, 2021

I have attempted to remedy this problem on a Ubuntu 21.04 system by using the /etc/zsys.conf from @sdelrio #155 (comment) and a few invocations of the bash one-liner by @berenddeboer #155 (comment). Performing these and applying an update has resulted in my system being unable to boot without manually entering in the linux and initrd commands into grub. What is the best venue for me to seek help with this?

EDIT: This seems to have fixed my problem: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1867007/comments/6

@Venomtek
Copy link

My script works great.

@almereyda
Copy link

almereyda commented Dec 6, 2022

Looking at this from the distance of a running 22.04 installation whose rpool and bpool arrived from an Ubuntu LTS release upgrade, which means I am still affected by this on the current LTS (which doesn't install zsys by default anymore), )it becomes apparent that the ZSYS bpool integration is initially very invisible in zsysctl show. Promoting its appearance from -v and -vv couldfor a start already give visibility to the ZSYS managed items in the bpool to its maintainer. With a little querying of the .zfs filesystem, we could even correlate (system) snapshots to available Kernel versions, and create a revision matrix.

$ ls -l /boot/.zfs/*/*/config* | cut -d"_" -f2 | sed -e 's/config-//' -e 's/-generic//' -e 's|/| |' | awk '{ print $2" "$1 }'
5.4.0-21 pw55db
5.4.0-21 q0ofwa
5.4.0-21 xthef3
5.15.0-43 q0ofwa
5.15.0-43 xthef3
5.15.0-46 q0ofwa
5.15.0-46 xthef3
5.15.0-53 pw55db
5.15.0-56 pw55db

This output was generated after application of the procedure below. Note how the newest snapshot also has the oldest Kernel, but not the intermittent ones. That was due to some bpool mount issues, where it wasn't present in the ZFS cache before, while the intermediary versions were installed. That could become interesting to clean up at some point, as apt will have no recollection of 5.4.0-21 and will thus be unable to autoremove.

Here the chosen way to clean up system and associated user states, leaving only the last two recent ones, is:

zsysctl show | grep -P '(?<=rpool/ROOT/ubuntu_......@autozsys_)(.*)' -o | tail -n +3 | tac | xargs -L 1 zsysctl state remove -s --dry-run
Caution: Destructive.

The invisibility of the bpool in user facing components (above -v and -vv) of ZSYS is astonishing here. After such a sweep, a dry run to clean the remaining states will not even report about its implicit plan to also remove bpool snapshots They are used for booting the system Kernel, why they may be of significance to a system administrator. -v and -vv are too noisy to be human readable, which only gets worse with a lot of snapshots, users and Docker.

$ zsysctl show | grep -P '(?<=rpool/ROOT/ubuntu_......@autozsys_)(.*)' -o | tac | xargs -L 1 zsysctl state remove -v -s --dry-run                  
INFO Anforderung den Systemzustand "q0ofwa" zu entfernen 
Zustand rpool/USERDATA/yala_......@autozsys_q0ofwa löschen
Zustand rpool/USERDATA/root_......@autozsys_q0ofwa löschen
Zustand rpool/ROOT/ubuntu_......@autozsys_q0ofwa löschen
INFO Anforderung den Systemzustand "xthef3" zu entfernen 
Zustand rpool/USERDATA/root_......@autozsys_xthef3 löschen
Zustand rpool/USERDATA/yala_......@autozsys_xthef3 löschen
Zustand rpool/ROOT/ubuntu_......@autozsys_xthef3 löschen

The below demonstrates how we go from a cluttered bpool:

$ zfs list -r -t all bpool     
NAME                                       USED  AVAIL     REFER  MOUNTPOINT
bpool                                     1.48G   278M       96K  /boot
bpool/BOOT                                1.47G   278M       96K  none
bpool/BOOT/ubuntu_......                  1.47G   278M      311M  /boot
bpool/BOOT/ubuntu_......@autozsys_bn5c0r    72K      -      307M  -
bpool/BOOT/ubuntu_......@autozsys_yxnexc     0B      -      307M  -
bpool/BOOT/ubuntu_......@autozsys_ftf950     0B      -      307M  -
bpool/BOOT/ubuntu_......@autozsys_tmihy8     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_fq1qaz     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_w9gf5c     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_8k4fif     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_ranw95     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_5f9i8r    72K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_jkwu4v   113M      -      433M  -
bpool/BOOT/ubuntu_......@autozsys_thqqx6    72K      -      433M  -
bpool/BOOT/ubuntu_......@autozsys_uj7p3w    64K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_nyom7g    72K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_u9qosw    64K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_ujtu0w     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_u3tku7     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_6b6vae    72K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_uz70ms     0B      -      435M  -
bpool/BOOT/ubuntu_......@autozsys_05yezg     0B      -      435M  -
bpool/BOOT/ubuntu_......@autozsys_hw6cvn     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_v8iv4c     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_aakwy2     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_y4ntwf     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_q0ofwa    64K      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_xthef3    72K      -      310M  -

Via an already more balanced, but also more recent representation of captured and bootable Kernel and system history, as seen meanwhile the execution of the sweep command:

$ zfs list -r -t all bpool
NAME                                       USED  AVAIL     REFER  MOUNTPOINT
bpool                                     1006M   785M       96K  /boot
bpool/BOOT                                1003M   785M       96K  none
bpool/BOOT/ubuntu_......                  1003M   785M      311M  /boot
bpool/BOOT/ubuntu_......@autozsys_uj7p3w    72K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_nyom7g    72K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_u9qosw    64K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_ujtu0w     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_u3tku7     0B      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_6b6vae    72K      -      308M  -
bpool/BOOT/ubuntu_......@autozsys_uz70ms     0B      -      435M  -
bpool/BOOT/ubuntu_......@autozsys_05yezg     0B      -      435M  -
bpool/BOOT/ubuntu_......@autozsys_hw6cvn     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_v8iv4c     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_aakwy2     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_y4ntwf     0B      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_q0ofwa    64K      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_xthef3    72K      -      310M  -

Into an oblivious state of recency, yet ultimately also preparedness for snapshotting of system states (bpool>20%) and also installation of newer Kernels (they are small, this was still possible before, but sometimes this issue can be daunting for that, too).

$ zfs list -r -t all bpool
NAME                                       USED  AVAIL     REFER  MOUNTPOINT
bpool                                      611M  1.15G       96K  /boot
bpool/BOOT                                 608M  1.15G       96K  none
bpool/BOOT/ubuntu_......                   608M  1.15G      311M  /boot
bpool/BOOT/ubuntu_......@autozsys_q0ofwa    72K      -      310M  -
bpool/BOOT/ubuntu_......@autozsys_xthef3    72K      -      310M  -

It is now possible to create system states with zsysctl state save -s again, which is in return useful for the apt integration. Given the version compatibility matrix from above, and apt's shared responsibilities for (1) creating system states and (2) managing Kernel versions in the same time, could render it an optimal candidate for implementing the hereby outlined sweeping strategy.

This adds another level of indirection to potentially solve the earlier described regressions, which are potentially an upstream issue (to another package of ZSYS).

  • What seems a conventional amount of system snapshots to keep around in the tiny bpool that comes with Ubuntu?
    • Would we want to upstream this to the installer, and request larger bpools in general, say 4 GiB?
  • Which graph would we see, if we correlated (a) the amount of snapshots (incl. Kernels) with (b) the overall bpool's USED:AVAIL:REFER ratio? As this looks like an optimisation problem, we can also ask for the sweet spot of number of available Kernels vs. free space > 20% in the bpool.

Ultimately we could declare an interval of minimum (general.minfreepoolspace) and introduce a maximum free space (similar to the 50% in the script above) to stay within. Then a garbage collection policy can be formulated and implemented. Unfortunately the maximum free space in the bpool would then also determine the maximum amount of ZSYS maintainable system snapshots, and could end up well below the user's intended retention periods (history.keeplast in /etc/zsys.conf).

Eventually a sparse deletion strategy could keep older point-in-time recoveries, while newer one's are selected for deletion on the basis of a "thinning" factor. This "thinning" factor appears to be directly proportional to the free space of the bpool, why it might be another useful attempt for tuning Ubuntu's ZFS system ¹.

Or we only keep system snapshots that have at least one of the last three Kernel versions available, and destroy all all others.

How do you cope with this regression in the long run, did you find other procedures that are well established?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests