Feature request: Support BTRFS and XFS Reflink source volumes #75

tlaurion · 2021-04-11T17:21:57Z

https://btrfs.wiki.kernel.org/index.php/Incremental_Backup

tasket · 2021-04-12T01:04:40Z

The two interesting parts about this are the suggestion that Thin LVM is less reliable than Btrfs (this might be accurate), and the point about providing authentication (which might not be accurate).

I could make a point about perceived efficiency and speed for Thin LVM vs Btrfs, the main one being that no one ever seems to actually compare them with benchmarks, not even @michaellarabel. My experience says that Btrfs would lag behind Thin LVM in overall use, but that is just my impression. I also saw a tendency for Btrfs to "blow up" where metadata use would suddenly skyrocket when reflinking large image files in combination with snapshotting the parent (sub)volume; this was with the late 3.x kernels so ymmv.

Its worth noting WRT the future of Linux storage, Red Hat appears to actively dislike both Thin LVM and Btrfs and they are reported to be building a successor flexible storage system called Stratis

Since interest in backups on Qubes (at least incremental backups) is not high, a change to using Btrfs as the Qubes default would not impact Wyng greatly. But also, adding Btrfs support to Wyng should not be a huge undertaking if people want it.

tasket · 2021-04-12T01:59:48Z

A quick note about Stratis...

It appears to be a configuration management system for "storage pools", where a pool is an XFS filesystem spanning one or more block devices. XFS is used in reflink mode to manage disk image files and "snapshots" containing online shrink-capable filesystems. Red Hat claims to be doing this bc Btrfs code tree was supposedly not maintainable for enterprise environments. The only tangible benefit I'd expect is a performance advantage over Btrfs (it would be interesting to compare Xfs and Btrfs for hosting large reflinked disk image files).

DemiMarie · 2021-04-15T12:15:29Z

@tasket @tlaurion Would you be willing to comment on QubesOS/qubes-issues#6476? That is a mere proposal, not a final decision, and commentary (including by those who are not QubesOS users!) would be greatly appreciated. I am no expert whatsoever on the Linux storage stack.

tasket · 2021-05-20T12:25:59Z

I am still going to wait for detailed benchmark comparisons before supporting this. As it stands now, the general wisdom and experience is that Btrfs can be slow, and large disk image files with snapshots is exactly its worst performance case.

Even ZFS created a special mode (ZVOLs) to handle disk images efficiently.

I would wager that the best way to wring performance from Btrfs with disk image snapshots is to flag them nodatacow and add them to separate subvolumes, instead of using reflinks. If that's the case, it would mean a) Qubes getting a refactored Btrfs driver, b) quite different coding details when adding Btrfs to Wyng.

DemiMarie · 2021-05-20T13:24:52Z

I would wager that the best way to wring performance from Btrfs with disk image snapshots is to flag them nodatacow and add them to separate subvolumes, instead of using reflinks. If that's the case, it would mean a) Qubes getting a refactored Btrfs driver, b) quite different coding details when adding Btrfs to Wyng.

Snapshots automatically turn CoW back on, so nodatacow will not help.

tasket · 2021-05-20T18:57:25Z

IIRC nodatacow can be set for individual disk image files that are sitting in a subvolume. So the files only experience a data CoW-like event after a subvol snapshot, not on a second-by-second basis whenever any data is written.

DemiMarie · 2021-05-20T19:29:10Z

IIRC nodatacow can be set for individual disk image files that are sitting in a subvolume. So the files only experience a data CoW-like event after a subvol snapshot, not on a second-by-second basis whenever any data is written.

In Qubes OS, all persistent volumes have at least one snapshot, by default. So the only difference would be second and further writes to the same extent after qube startup.

DemiMarie · 2021-05-20T19:29:52Z

A quick note about Stratis...

It appears to be a configuration management system for "storage pools", where a pool is an XFS filesystem spanning one or more block devices. XFS is used in reflink mode to manage disk image files and "snapshots" containing online shrink-capable filesystems. Red Hat claims to be doing this bc Btrfs code tree was supposedly not maintainable for enterprise environments. The only tangible benefit I'd expect is a performance advantage over Btrfs (it would be interesting to compare Xfs and Btrfs for hosting large reflinked disk image files).

Stratis uses device-mapper thin volumes (without LVM) to store its XFS filesystems.

tasket · 2021-05-21T10:50:57Z

In Qubes OS, all persistent volumes have at least one snapshot, by default. So the only difference would be second and further writes to the same extent after qube startup.

Yes, so the difference in performance should be somewhere between the cases shown in these benchmarks. We still need benchmarks that are performed in a Qubes environment.

In relation to Wyng, Stratis mapping should be very similar since the current thin-pool method is to ask LVM what the dm-thin device ID is, then use the dm-thin tools on that device.

get_reflink_deltas() and update_delta_digest_reflink()

tasket · 2023-02-16T03:18:51Z

Work has begun on Btrfs reflink volume support. The algorithms needed to obtain metadata and find differences between two snapshots were added, however at present the code needed to recognize and snapshot reflink vols still needs to be written to make this usable.

A side-effect of the approach I took (using simple FIEMAP tables obtained via filefrag) is that other filesystems that report this data, such as XFS, will also be supported.

Remove or mark unconverted lvm code issue #75

tasket · 2023-02-16T21:18:58Z

To continue a line of thought from code comments:

Its worth noting that file extent maps have 4KB blocks, which is an order of magnitude more detail than the most detailed thin lvm map with 64KB chunks. So 'do it in Python' is a big maybe here, as even Python libs tend to fall down on either speed or memory requirements. Using Linux commands to pre-process the maps gives me delta lists (to use in Python) that are much smaller than the input maps, and they're fast and work on data streams instead of in memory. Python's difflib does look interesting, though. I would love to see an alternate implementation using that or something similar to see how it performs.

Right now the Wyng alpha work in progress is balancing different qualities like low dependency count, CPU portability (as in use cp and its ported!), efficiency and overall speed. Some of the choices I'm making (for now, at least) to move forward and retain those qualities means code that is less aesthetically pleasing or in the case of sed just plain harder to read. (I do respond to requests to add comments to segments of code.)

I'd also like to note that our systems are based on the same Linux commands that I'm invoking from Wyng, and I'm being pretty conservative in my choices. I would consider custom re-implemention of those commands' functions or replacement with 3rd-party libs to be as much or more of a security risk.

tasket · 2023-02-17T18:02:56Z

~~Major problem:~~

The Linux FIEMAP ioctl output doesn't carry block device numbers, ~~which are needed when a Btrfs volume spans more than one device. With a multi-device fs, the returned data looks OK but won't be correct.~~ This does not affect XFS because that fs doesn't have multi-device maps.

Edit: On further inspection, Btrfs may be synthesizing its own singular address space to account for multiple devices. So we are seeing the numbers from Btrfs' internal raid. If this is true, then the resulting FIEMAP data may be good enough to reliably show where reflinked files have the same blocks.

Edit 2: The issue/solution is explained in a Linux bugzilla record.

issue #75

tasket · 2023-02-20T13:01:16Z

I've added close checking of the column layout to the sed script; any significant change should raise an error.

Also checked the filefrag source code. The basic format hasn't changed for well over a decade and the last change ~11ya was minor, adding dots and colons after numbers.

The next hurdle will be getting Wyng to recognize & access regular files as logical volumes. At that point, this feature will be ready to test.

tasket · 2023-02-22T11:16:39Z

OK, so over in filefrag land, a prominent Linux dev doesn't want me to use filefrag with Btrfs because:

the FIEMAP ioctl wasn't intended for this use

Egads. The FIEMAP describes the data composition of the file. But he is implying the ioctl strips something important from FIEMAP data (it doesn't because Btrfs virtual addresses encompass multiple devices).

Plus meaningless hand waving about Btrfs subvolumes (as if this were the debate about Btrfs inodes) and total lack of concern about filefrag used on other raid-like storage, and I get the impression Btrfs is not exactly TT's area. IOW, this looks like get-off-my-lawn bs. Unless a Btrfs dev says an extent address is not unique within a Btrfs filesystem, I consider the question settled.

tasket · 2023-02-24T01:09:12Z

Update: Since I've been lured into combing Btrfs dev notes and source code to address spurious claims about the supposed deep, dark messy pit that is Btrfs internals, I keep seeing details that are actually reassuring. Btrfs does indeed use logical extent addresses (claiming it doesn't is weird), they are a crucial part of the disk format itself, and – the really good part – they are one of the higher-level abstractions in the format. What the Btrfs design is telling me so far is that they wanted to insulate extent ~~addresses~~ organization and mundane file I/O from the vicissitudes of low-level RAID maintenance. (Edit– addresses can change due to internal maint. functions, but not without incrementing the fs or subvol generation id.) The chart at the bottom of this page gives a general overview. I think a more abstract extent concept makes reading them from a source like FIEMAP even less worrisome than usual, if all you want are extent addresses and sizes. We should just accept that what comes out of the "physical" fields in that ioctl is virtual in most cases, regardless of the filesystem used. TL;dr all we care about is two files pointing to the same extent are pointing to the same data, and whether its mdraid/lvm etc or Btrfs providing ultimate translation and access to physical data blocks is of no concern.

All this is making me eager to start testing Wyng on multi-device Btrfs setups. And if big issues do arise, there is still XFS as a way to do reflink snapshots.

tasket · 2023-03-07T23:04:52Z

DemiMarie · 2023-03-07T23:08:28Z

@tasket: what advantages will Wyng have over e.g. btrfs send?

tasket · 2023-03-07T23:29:34Z

@DemiMarie

Wyng archive requires only a traditional fs (or semantics that encompass a Unix fs, like sftp and s3) on the backup destination, instead needing specifically Btrfs on the destination. The only way to get around this with btrfs receive is to stack up the send streams like cordwood, which leaves you with a very inefficient/tedious restore process and no archive maintenance functions.
The 'cordwood' scenario is probable if encryption functions must remain in the local admin env.
Wyng can work with other snapshot-capable local storage, and the user isn't tied to restoring to the same type of local fs as what they originally had... they can restore directly to non-COW storage if desired.

Edit: One could tongue-in-cheek say that the reasons for using Wyng are the reasons why qvm-backup doesn't use btrfs send. :)

Edit:
4. Wyng's monitor function lowers disk-space consumption for snapshots because snapshots (both reflinked img files and subvol snaps) are deleted after a delta map is made from them. So Wyng enables continuous rotation of snapshots, even when backups aren't being sent. btrfs-send requires that local snapshots stay in-place where the disk space they consume will keep growing in size until the next backup.

Issue #75

Btrfs: Tested. LVM: UNtested.

tasket · 2023-03-17T03:38:02Z

@tlaurion @DemiMarie Wyng now has basically a full implementation of reflink support and is ready to try out on Btrfs for anyone curious enough at this stage (note: it still has not yet returned to alpha).

The prerequisite for using Wyng with Btrfs is to make the --local directory a subvolume, such as sudo btrfs subvolume create /var/lib/qubes or use whichever dir_path your Qubes Btrfs pool uses:

$ qvm-pool info btrpool
name                btrpool
dir_path            /mnt/btrpool/libqubes
driver              file-reflink
ephemeral_volatile  False
revisions_to_keep   1

Since we are now accessing local filesystem objects, you must be mindful of directory structure. In fact, the current implementation treats subdirectories as part of the Archive volume's name. To demonstrate, send-ing a Qubes VM's disk image file to the archive looks like this:

sudo wyng --local=/mnt/btrpool/libqubes send appvms/untrusted/private.img

You don't have to specify --local if the archive already has that local setting. But showing it this way demonstrates:

--local can now be specified at any time (not just with arch-init)
reflink mode is automatically detected
reflink mode accesses disk images simply by using --local as a base path and the volume name as the rest. Your system configuration determines how messy or neat the volume naming will be (but, yes, wyng-util-qubes will cope with this automatically).

It also raises the question of whether users might want to set aside a special dir where they create symlinks to the image files they want to back up, and then point Wyng at that special dir. This would be interesting to try.

Allow multi-vol receive with reflink --local Update Readme

Optimize: do not init_dedup_index if no vol changes

tasket · 2023-03-20T02:46:50Z

Btrfs reflink and LVM have now been tested and are working.

Convert remove_local_metadata() issue #75

tlaurion mentioned this issue Apr 17, 2021

Switch default pool from LVM to BTRFS-Reflink QubesOS/qubes-issues#6476

Open

tasket changed the title ~~QubesOS to switch to BRTFS-Reflink~~ Feature request: Support BTRFS-Reflink source volumes May 20, 2021

tasket added enhancement New feature or request help wanted Extra attention is needed labels Dec 31, 2021

tasket added this to the v0.4 milestone Dec 31, 2021

tasket mentioned this issue Aug 27, 2022

Very cool project. Support for ZFS volumes? #110

Open

tlaurion mentioned this issue Oct 14, 2022

[Contribution] qubes-incremental-backup-poc OR Wyng backup QubesOS/qubes-issues#858

Open

tasket mentioned this issue Jan 29, 2023

v0.8 release timeline #117

Open

12 tasks

tasket changed the title ~~Feature request: Support BTRFS-Reflink source volumes~~ Feature request: Support BTRFS and XFS Reflink source volumes Feb 14, 2023

tasket added a commit that referenced this issue Feb 15, 2023

Begin Btrfs and XFS reflink support, issue #75

7a4e790

get_reflink_deltas() and update_delta_digest_reflink()

tasket added a commit that referenced this issue Feb 16, 2023

Fix sed columns, save /tmp space with pipe into sort, gzip

3619091

Remove or mark unconverted lvm code issue #75

tasket added a commit that referenced this issue Feb 20, 2023

Check format of filefrag output

b024671

issue #75

tasket added a commit that referenced this issue Mar 11, 2023

Continue reflink implementation

ca6fbcd

Issue #75

tasket added a commit that referenced this issue Mar 12, 2023

Use new storage classes for receive/verify/diff

ad71418

Issue #75

tasket added a commit that referenced this issue Mar 13, 2023

Test and fix receive, diff, rename, local path parsing

67bdee3

Issue #75

tasket added a commit that referenced this issue Mar 17, 2023

Implement reflink for monitor-send functions, issue #75

60580bc

Btrfs: Tested. LVM: UNtested.

tasket added a commit that referenced this issue Mar 18, 2023

Finish #75 incl arch-check, lvm fixes

a0d8261

Allow multi-vol receive with reflink --local Update Readme

tasket added a commit that referenced this issue Mar 20, 2023

Test LVM and apply fixes, issue #75

a59b967

Optimize: do not init_dedup_index if no vol changes

tasket closed this as completed Mar 20, 2023

tasket added a commit that referenced this issue Mar 21, 2023

Auto-retrieve remote metadata, issue #60

f6f937c

Convert remove_local_metadata() issue #75

tasket mentioned this issue May 6, 2024

Account for unicode in meta strings and path names #190

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Support BTRFS and XFS Reflink source volumes #75

Feature request: Support BTRFS and XFS Reflink source volumes #75

tlaurion commented Apr 11, 2021 •

edited

tasket commented Apr 12, 2021

tasket commented Apr 12, 2021

DemiMarie commented Apr 15, 2021

tasket commented May 20, 2021

DemiMarie commented May 20, 2021

tasket commented May 20, 2021

DemiMarie commented May 20, 2021

DemiMarie commented May 20, 2021

tasket commented May 21, 2021

tasket commented Feb 16, 2023

tasket commented Feb 16, 2023

tasket commented Feb 17, 2023 •

edited

tasket commented Feb 20, 2023

tasket commented Feb 22, 2023

tasket commented Feb 24, 2023 •

edited

tasket commented Mar 7, 2023 •

edited

DemiMarie commented Mar 7, 2023

tasket commented Mar 7, 2023 •

edited

tasket commented Mar 17, 2023

tasket commented Mar 20, 2023

Feature request: Support BTRFS and XFS Reflink source volumes #75

Feature request: Support BTRFS and XFS Reflink source volumes #75

Comments

tlaurion commented Apr 11, 2021 • edited

tasket commented Apr 12, 2021

tasket commented Apr 12, 2021

DemiMarie commented Apr 15, 2021

tasket commented May 20, 2021

DemiMarie commented May 20, 2021

tasket commented May 20, 2021

DemiMarie commented May 20, 2021

DemiMarie commented May 20, 2021

tasket commented May 21, 2021

tasket commented Feb 16, 2023

tasket commented Feb 16, 2023

tasket commented Feb 17, 2023 • edited

tasket commented Feb 20, 2023

tasket commented Feb 22, 2023

tasket commented Feb 24, 2023 • edited

tasket commented Mar 7, 2023 • edited

DemiMarie commented Mar 7, 2023

tasket commented Mar 7, 2023 • edited

tasket commented Mar 17, 2023

tasket commented Mar 20, 2023

tlaurion commented Apr 11, 2021 •

edited

tasket commented Feb 17, 2023 •

edited

tasket commented Feb 24, 2023 •

edited

tasket commented Mar 7, 2023 •

edited

tasket commented Mar 7, 2023 •

edited