Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshoting dom0 (root LVM) under QubesOS? #53

Closed
tlaurion opened this issue Feb 9, 2020 · 11 comments
Closed

Snapshoting dom0 (root LVM) under QubesOS? #53

tlaurion opened this issue Feb 9, 2020 · 11 comments

Comments

@tlaurion
Copy link
Contributor

tlaurion commented Feb 9, 2020

Is that even possible?

Following wyng logic, I understand that it would be as simple of asking it to add root, monitor it and send it to backup?

@tlaurion
Copy link
Contributor Author

tlaurion commented Feb 9, 2020

[user@dom0 ~]$ sudo wyng-backup-master/wyng add root
Volume root added to archive config.
[user@dom0 ~]$ sudo wyng-backup-master/wyng monitor root
Preparing snapshots...
No new data.
[user@dom0 ~]$ sudo wyng-backup-master/wyng send root
Preparing snapshots...
  Initial snapshot created for root

Sending backup session 20200209-134447 to qubes://backup

Volume : root
  0.1%   41MB 
  0.3%   164MB 
  52.3%   505MB 
  100%   1439.2MB

@tasket
Copy link
Owner

tasket commented Feb 9, 2020

This falls under regular Linux admin practices... There is an integration between LVM and fs layer that tells the fs to finish transactions and pause just before LVM creates its snapshot. I take this to be safe as far as not corrupting the fs goes, but it could conceivably cause problems at the app level.

Personally, I added root vol to backups long ago and the few times I had to pull something out of the snapshot there were no problems.

I also have added this to '/lib/systemd/system-shutdown' to generate a root snapshot each shutdown:

#!/bin/sh

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

I could backup this 'root-autosnap' volume instead of 'root', and that would remove essentially all the risk of snapshotting app data part way through a routine. But dom0 is so light on stateful apps/utils that I don't bother. However, there is a good amount of log-related churn so backing up 'root-autosnap' instead of the live snapshot would reduce the amount of churn that gets backed up.


There is issue #25 that mentions a possible enhancement where you can configure backup volumes in pairs, a live volume and periodic snapshot the user or system updates... along with a preference stating which one to prefer. This aligns with how Qubes manages volumes: The 'real' volume actually becomes a static snap when starting its VM and a user might prefer to have wyng grab the live volume instead if it exists (VM is running).

@tlaurion
Copy link
Contributor Author

tlaurion commented Feb 10, 2020 via email

@tasket
Copy link
Owner

tasket commented Feb 10, 2020

IIRC this was discussed in a Qubes issue where a couple of us were advocating keeping a root snapshot. I got the impression Marek didn't like the idea bc he wants to move root fs back to a non-thin LV for stability (remember a while back when a lot of people had their thin pools meltdown bc the default metadata space was too small). I'm pretty sure the 4.0.3 installer doubles the pool metadata size, but I think Marek considers that a stopgap. His priority there is to keep the system boot-able, which leaves you in a much better position to rectify pool problems if they occur.

So we hit on the one thing that is really "wrong" with LVM thin pools... they require vigilance to avoid having the pool go offline. My angle in these related Qubes issues is we could have the disk space widget or daemon take action before either metadata or data ran out.

BTW, if you haven't reinstalled with 4.0.3 or otherwise haven't touched your pool tmeta volume, you should consider doubling or tripling it. I tripled mine and its a small cost for a much more solid pool.

@tlaurion
Copy link
Contributor Author

tlaurion commented Feb 10, 2020

IIRC this was discussed in a Qubes issue where a couple of us were advocating keeping a root snapshot. I got the impression Marek didn't like the idea bc he wants to move root fs back to a non-thin LV for stability (remember a while back when a lot of people had their thin pools meltdown bc the default metadata space was too small).

Would be awesome if you pointed to the issue, since what I've read over there states that they are not going to remove it from LVM, at least from what i've read?

I'm pretty sure the 4.0.3 installer doubles the pool metadata size, but I think Marek considers that a stopgap. His priority there is to keep the system boot-able, which leaves you in a much better position to rectify pool problems if they occur.

Linked other tickets here.

So we hit on the one thing that is really "wrong" with LVM thin pools... they require vigilance to avoid having the pool go offline. My angle in these related Qubes issues is we could have the disk space widget or daemon take action before either metadata or data ran out.

BTW, if you haven't reinstalled with 4.0.3 or otherwise haven't touched your pool tmeta volume, you should consider doubling or tripling it. I tripled mine and its a small cost for a much more solid pool.

Instructions on that matter would be awesome. Asked for the widget to actually do that task, since user are upgrading from 4+ without knowing that reinstalling is currently the only fix or doing this manually (How?)
Edit: Fix it like this?

@tasket
Copy link
Owner

tasket commented Feb 17, 2020

Yes, that's how I remember doing it.

BTW, there was an earlier Qubes issue (probably dealing directly with thin pool errors) where Marek says he'd like to move dom0 root to a non-thin LV, so it would still be in LVM. I understand his reasoning even if I prefer a more unusual approach (guard and adjust pool metadata automatically). Wyng may even be able to handle non-thin LVs in the future (issue exists).

@tasket
Copy link
Owner

tasket commented Aug 31, 2020

I'm closing this since its currently practicable with root.

For /boot, I've thought about it and the best approach for the time being is to make a backup dir in root, such as /boot-bak; then use cp --update or rsync from /boot to /boot-bak as needed. One extra step, using dd to copy the boot block into /boot-bak is also recommended.

@tasket tasket closed this as completed Aug 31, 2020
@tlaurion
Copy link
Contributor Author

tlaurion commented Jun 30, 2021

@tasket seems like the behavior changed recently for lvcreate?

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

puts the snapshot in inactive mode, so no operation is possible on that volume, but from wyng, which happily sends the volume. But no receive/verify/diff is possible, resulting in FileNotFoundError: [Errno 2] No such file of directory: '/dev/qubes_dom0/root-autosnap'

So I changed it to something similar wyng creates when calling lvcreate:

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -ay -pr -kn -s qubes_dom0/root -n root-autosnap

Then verify/diff works.

But when doing a simple receive (0.3.0rc2 20210622), i get error on

line 3014 in <module> 
  save_path=options.saveto if options.saveto else ""
line 2365, in receive_volume
  if not sparse_write: do exec ([[CP.blkdiscard, save_path]])
line 1149, in do_exec
  raise subprocess.CalledProcessError(err.returncode, err.args)
Subprocess.CalledError: Command '[' /sbin/blkdiscard', '/dev/qubes_dom0/root_autosnap']' returned non-zero exit status 1

EDIT:

where doing receive with --sparse-write:
same 3014

line 2496 in receive_volume
  volf_seek(addr)   ; volf_write(buf)  ; diff_count += len (buf)
PermissionError: [ErrNo 1] Operation not permitted
write stdin: Broken pipe

I was playing around with the intention of receiving backup under root-autosnap, and hoping i could do a lvconvert --merge call to restore dom0 in state of received autosnap backup into root (dom0) upon reboot.

But maybe I'm dreaming here and that would be the wrong approach. Insights?

@tlaurion
Copy link
Contributor Author

tlaurion commented Jun 30, 2021

@tasket :
My bad: was created with -pr so read only.

Receive now works with the following.

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-revert || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -ay -prw -kn -s qubes_dom0/root -n root-revert

Since backuping works with:

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

I've worked around creating a script that permits to lvconvert --merge in another LVM RW snapshot (root-revert)
Pretty effective and permits to validate detached signed digest integrity prior of restoring OEM wyng-backups, including the root-autosnap backup into root-autosnap and lvconvert --merge it on next boot.

Working into pushing compressed dd of /boot over root-autosnap so that /boot can be synced to status of root... and there will live the first version of OEM revert PoC, while imperfect. (would be far better to be able to do this from recovery shell, but we are not there yet with libssh missing under Heads other missing pieces...)

@tlaurion
Copy link
Contributor Author

@tasket

I also have added this to '/lib/systemd/system-shutdown' to generate a root snapshot each shutdown:

#!/bin/sh

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

On q4.2, systemd changed since we discussed this. On q4.1 pools have separated between dom0 and vm.

What is your suggestion into creating autosnap volume that is accessible for dom0, including boot file backup/fs backup that coukd easily be updated and restored?

@tasket
Copy link
Owner

tasket commented Sep 23, 2023

I don't have an easy suggestion for comprehensive OS backup/restoration.

If you take the most ideal example, a Mac OS system which will have a very limited matrix of boot configurations, even that is fraught with complex problems. It used to be that 3rd party cloning utilities could handle such a task, which they proudly advertised, but that is no longer the case and those cloning utils now say that Time Machine is your best/only option.

Qubes install options are much like Fedora or other Linux-based systems: very open-ended. If you as an integrator want to make some assumptions about a proper config (the Qubes default, for instance) and make that an explicit requirement, then the problem becomes more tractable. For example, start with sfdisk --dump output as the backup/restore basis of your partitioning scheme and then work upward from there. If you want all the components independently accessible in a Wyng archive, you can now backup the sfdisk output file directly by treating it like a volume using --import-other-from, and you can do the same as needed with efi and boot volumes directly. You would also have to backup the LVM layout and restore that after the partition table was restored, but before the root volume is restored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants