Snapshoting dom0 (root LVM) under QubesOS? #53

tlaurion · 2020-02-09T18:51:00Z

Is that even possible?

Following wyng logic, I understand that it would be as simple of asking it to add root, monitor it and send it to backup?

tlaurion · 2020-02-09T18:55:32Z

[user@dom0 ~]$ sudo wyng-backup-master/wyng add root
Volume root added to archive config.

[user@dom0 ~]$ sudo wyng-backup-master/wyng monitor root
Preparing snapshots...
No new data.

[user@dom0 ~]$ sudo wyng-backup-master/wyng send root
Preparing snapshots...
  Initial snapshot created for root

Sending backup session 20200209-134447 to qubes://backup

Volume : root
  0.1%   41MB 
  0.3%   164MB 
  52.3%   505MB 
  100%   1439.2MB

tasket · 2020-02-09T20:02:51Z

This falls under regular Linux admin practices... There is an integration between LVM and fs layer that tells the fs to finish transactions and pause just before LVM creates its snapshot. I take this to be safe as far as not corrupting the fs goes, but it could conceivably cause problems at the app level.

Personally, I added root vol to backups long ago and the few times I had to pull something out of the snapshot there were no problems.

I also have added this to '/lib/systemd/system-shutdown' to generate a root snapshot each shutdown:

#!/bin/sh

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

I could backup this 'root-autosnap' volume instead of 'root', and that would remove essentially all the risk of snapshotting app data part way through a routine. But dom0 is so light on stateful apps/utils that I don't bother. However, there is a good amount of log-related churn so backing up 'root-autosnap' instead of the live snapshot would reduce the amount of churn that gets backed up.

There is issue #25 that mentions a possible enhancement where you can configure backup volumes in pairs, a live volume and periodic snapshot the user or system updates... along with a preference stating which one to prefer. This aligns with how Qubes manages volumes: The 'real' volume actually becomes a static snap when starting its VM and a user might prefer to have wyng grab the live volume instead if it exists (VM is running).

tlaurion · 2020-02-10T04:25:55Z

Any reason why your systemd change is not upstreamed in QubesOS?

…

On February 9, 2020 8:02:52 PM UTC, tasket ***@***.***> wrote: This falls under regular Linux admin practices... There is an integration between LVM and fs layer that tells the fs to finish transactions and pause just before LVM creates its snapshot. I take this to be safe as far as not corrupting the fs goes, but it could conceivably cause problems at the app level. Personally, I added root vol to backups long ago and the few times I had to pull something out of the snapshot there were no problems. I also have added this to '/lib/systemd/system-shutdown' to generate a root snapshot each shutdown: ``` #!/bin/sh /usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true /usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap ``` I could backup this 'root-autosnap' volume instead of 'root', and that would remove essentially all the risk of snapshotting app data part way through a routine. But dom0 is so light on stateful apps/utils that I don't bother. However, there is a good amount of log-related churn so backing up 'root-autosnap' instead of the live snapshot would reduce the amount of churn that gets backed up. --- There is issue #25 that mentions a possible enhancement where you can configure backup volumes in pairs, a live volume and periodic snapshot the user or system updates... along with a preference stating which one to prefer. This aligns with how Qubes manages volumes: The 'real' volume actually becomes a static snap when starting its VM and a user might prefer to have wyng grab the live volume instead if it exists (VM is running). -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #53 (comment)

-- Sent from /e/ Mail

tasket · 2020-02-10T05:18:33Z

IIRC this was discussed in a Qubes issue where a couple of us were advocating keeping a root snapshot. I got the impression Marek didn't like the idea bc he wants to move root fs back to a non-thin LV for stability (remember a while back when a lot of people had their thin pools meltdown bc the default metadata space was too small). I'm pretty sure the 4.0.3 installer doubles the pool metadata size, but I think Marek considers that a stopgap. His priority there is to keep the system boot-able, which leaves you in a much better position to rectify pool problems if they occur.

So we hit on the one thing that is really "wrong" with LVM thin pools... they require vigilance to avoid having the pool go offline. My angle in these related Qubes issues is we could have the disk space widget or daemon take action before either metadata or data ran out.

BTW, if you haven't reinstalled with 4.0.3 or otherwise haven't touched your pool tmeta volume, you should consider doubling or tripling it. I tripled mine and its a small cost for a much more solid pool.

tlaurion · 2020-02-10T21:13:34Z

IIRC this was discussed in a Qubes issue where a couple of us were advocating keeping a root snapshot. I got the impression Marek didn't like the idea bc he wants to move root fs back to a non-thin LV for stability (remember a while back when a lot of people had their thin pools meltdown bc the default metadata space was too small).

Would be awesome if you pointed to the issue, since what I've read over there states that they are not going to remove it from LVM, at least from what i've read?

I'm pretty sure the 4.0.3 installer doubles the pool metadata size, but I think Marek considers that a stopgap. His priority there is to keep the system boot-able, which leaves you in a much better position to rectify pool problems if they occur.

Linked other tickets here.

So we hit on the one thing that is really "wrong" with LVM thin pools... they require vigilance to avoid having the pool go offline. My angle in these related Qubes issues is we could have the disk space widget or daemon take action before either metadata or data ran out.

BTW, if you haven't reinstalled with 4.0.3 or otherwise haven't touched your pool tmeta volume, you should consider doubling or tripling it. I tripled mine and its a small cost for a much more solid pool.

Instructions on that matter would be awesome. Asked for the widget to actually do that task, since user are upgrading from 4+ without knowing that reinstalling is currently the only fix or doing this manually (How?)
Edit: Fix it like this?

tasket · 2020-02-17T23:04:52Z

Yes, that's how I remember doing it.

BTW, there was an earlier Qubes issue (probably dealing directly with thin pool errors) where Marek says he'd like to move dom0 root to a non-thin LV, so it would still be in LVM. I understand his reasoning even if I prefer a more unusual approach (guard and adjust pool metadata automatically). Wyng may even be able to handle non-thin LVs in the future (issue exists).

tasket · 2020-08-31T00:03:51Z

I'm closing this since its currently practicable with root.

For /boot, I've thought about it and the best approach for the time being is to make a backup dir in root, such as /boot-bak; then use cp --update or rsync from /boot to /boot-bak as needed. One extra step, using dd to copy the boot block into /boot-bak is also recommended.

tlaurion · 2021-06-30T16:29:17Z

@tasket seems like the behavior changed recently for lvcreate?

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

puts the snapshot in inactive mode, so no operation is possible on that volume, but from wyng, which happily sends the volume. But no receive/verify/diff is possible, resulting in FileNotFoundError: [Errno 2] No such file of directory: '/dev/qubes_dom0/root-autosnap'

So I changed it to something similar wyng creates when calling lvcreate:

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -ay -pr -kn -s qubes_dom0/root -n root-autosnap

Then verify/diff works.

But when doing a simple receive (0.3.0rc2 20210622), i get error on

line 3014 in <module> 
  save_path=options.saveto if options.saveto else ""
line 2365, in receive_volume
  if not sparse_write: do exec ([[CP.blkdiscard, save_path]])
line 1149, in do_exec
  raise subprocess.CalledProcessError(err.returncode, err.args)
Subprocess.CalledError: Command '[' /sbin/blkdiscard', '/dev/qubes_dom0/root_autosnap']' returned non-zero exit status 1

EDIT:

where doing receive with --sparse-write:
same 3014

line 2496 in receive_volume
  volf_seek(addr)   ; volf_write(buf)  ; diff_count += len (buf)
PermissionError: [ErrNo 1] Operation not permitted
write stdin: Broken pipe

I was playing around with the intention of receiving backup under root-autosnap, and hoping i could do a lvconvert --merge call to restore dom0 in state of received autosnap backup into root (dom0) upon reboot.

But maybe I'm dreaming here and that would be the wrong approach. Insights?

tlaurion · 2021-06-30T16:44:03Z

@tasket :
My bad: was created with -pr so read only.

Receive now works with the following.

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-revert || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -ay -prw -kn -s qubes_dom0/root -n root-revert

Since backuping works with:

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

I've worked around creating a script that permits to lvconvert --merge in another LVM RW snapshot (root-revert)
Pretty effective and permits to validate detached signed digest integrity prior of restoring OEM wyng-backups, including the root-autosnap backup into root-autosnap and lvconvert --merge it on next boot.

Working into pushing compressed dd of /boot over root-autosnap so that /boot can be synced to status of root... and there will live the first version of OEM revert PoC, while imperfect. (would be far better to be able to do this from recovery shell, but we are not there yet with libssh missing under Heads other missing pieces...)

tlaurion · 2023-09-22T18:32:07Z

@tasket

I also have added this to '/lib/systemd/system-shutdown' to generate a root snapshot each shutdown:

#!/bin/sh

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

On q4.2, systemd changed since we discussed this. On q4.1 pools have separated between dom0 and vm.

What is your suggestion into creating autosnap volume that is accessible for dom0, including boot file backup/fs backup that coukd easily be updated and restored?

tasket · 2023-09-23T04:44:05Z

I don't have an easy suggestion for comprehensive OS backup/restoration.

If you take the most ideal example, a Mac OS system which will have a very limited matrix of boot configurations, even that is fraught with complex problems. It used to be that 3rd party cloning utilities could handle such a task, which they proudly advertised, but that is no longer the case and those cloning utils now say that Time Machine is your best/only option.

Qubes install options are much like Fedora or other Linux-based systems: very open-ended. If you as an integrator want to make some assumptions about a proper config (the Qubes default, for instance) and make that an explicit requirement, then the problem becomes more tractable. For example, start with sfdisk --dump output as the backup/restore basis of your partitioning scheme and then work upward from there. If you want all the components independently accessible in a Wyng archive, you can now backup the sfdisk output file directly by treating it like a volume using --import-other-from, and you can do the same as needed with efi and boot volumes directly. You would also have to backup the LVM layout and restore that after the partition table was restored, but before the root volume is restored.

This was referenced Feb 10, 2020

Qubes Disk Widget - please inform about/alert on pool metadata space QubesOS/qubes-issues#5053

Closed

Snapshot dom0 root during startup or shutdown QubesOS/qubes-issues#5094

Open

tlaurion mentioned this issue Feb 11, 2020

Upgrading from QubesOS 4.x doesn't result in the same end user configuration (discards and tpool) QubesOS/qubes-issues#5643

Closed

tlaurion mentioned this issue May 5, 2020

Funding proposition of work #65

Open

tasket closed this as completed Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshoting dom0 (root LVM) under QubesOS? #53

Snapshoting dom0 (root LVM) under QubesOS? #53

tlaurion commented Feb 9, 2020

tlaurion commented Feb 9, 2020 •

edited

Loading

tasket commented Feb 9, 2020

tlaurion commented Feb 10, 2020 via email

tasket commented Feb 10, 2020

tlaurion commented Feb 10, 2020 •

edited

Loading

tasket commented Feb 17, 2020

tasket commented Aug 31, 2020

tlaurion commented Jun 30, 2021 •

edited

Loading

tlaurion commented Jun 30, 2021 •

edited

Loading

tlaurion commented Sep 22, 2023

tasket commented Sep 23, 2023

Snapshoting dom0 (root LVM) under QubesOS? #53

Snapshoting dom0 (root LVM) under QubesOS? #53

Comments

tlaurion commented Feb 9, 2020

tlaurion commented Feb 9, 2020 • edited Loading

tasket commented Feb 9, 2020

tlaurion commented Feb 10, 2020 via email

tasket commented Feb 10, 2020

tlaurion commented Feb 10, 2020 • edited Loading

tasket commented Feb 17, 2020

tasket commented Aug 31, 2020

tlaurion commented Jun 30, 2021 • edited Loading

tlaurion commented Jun 30, 2021 • edited Loading

tlaurion commented Sep 22, 2023

tasket commented Sep 23, 2023

tlaurion commented Feb 9, 2020 •

edited

Loading

tlaurion commented Feb 10, 2020 •

edited

Loading

tlaurion commented Jun 30, 2021 •

edited

Loading

tlaurion commented Jun 30, 2021 •

edited

Loading