blkid reports disk as zfs_member if it has a zfs_member partition #918

sschmitz · 2019-12-29T23:48:03Z

This bug has previously been reported to the mailing list, however the author later concluded it was user error. However, I can reproduce the erroneous behavior consistently starting from an empty virtual disk. This may also result in the behavior described in this report.

Reproduction

I tested this in a virtual machine running the Xubuntu 19.10 Live ISO with util-linux 2.34 (but I have also tested it with the current master). The VM was configured to have an empty virtual 10737514496-byte disk. This is a little over 10 GiB; a nice round power-of-2 size will not reproduce the issue. The virtual disk was hosted on an ext4 filesystem; I have read that ZFS does not like to be self-hosted, at least not inside a ZVOL.

# fdisk -l /dev/sda
Disk /dev/sda: 10 GiB, 10737514496 bytes, 20971708 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Verify the disk does not contain any old signatures

# blkid /dev/sda
# wipefs /dev/sda
# hd /dev/sda # In fact, the disk is completely empty (will take a few seconds)
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
280017800

Then, initialize the disk and create a partition spanning the whole disk (in fact, it only needs to be aligned to the end of the disk).

# fdisk -l /dev/sda
Disk /dev/sda: 10 GiB, 10737514496 bytes, 20971708 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 90CB2A9F-2026-4622-83EA-318F2BE6C275

Device     Start      End  Sectors Size Type
/dev/sda1   2048 20971674 20969627  10G Linux filesystem

blkid will now correctly identify the disk as GPT-formatted.

# blkid /dev/sda
/dev/sda: PTUUID="90cb2a9f-2026-4622-83ea-318f2be6c275" PTTYPE="gpt"

Now, create a zpool on the partition.

# zpool create tank /dev/sda1

Voila, blkid is now confused:

# blkid /dev/sda
/dev/sda: LABEL="tank" UUID="11388703897118653008" UUID_SUB="2779192954700749898" TYPE="zfs_member" PTUUID="90cb2a9f-2026-4622-83ea-318f2be6c275" PTTYPE="gpt"

Cause

ZFS stores four copies of its uberblock on every backing device, two at the front and two at the back. The ones at the back are aligned to a 256 KiB boundary. Therefore, if the end of the last uberblock in the partition is less than 256 KiB from the end of the disk, it is stored in the same position as the last uberblock of a whole-disk ZFS would be (and likewise for the second-to-last copy).

libblkid will find these two uberblocks at the end of the disk and treat them as evidence of the whole disk being ZFS-formatted. Looking at the code for probe_zfs in superblocks/zfs.c, I'm not entirely clear on the extent to which this is intentional. While a comment does say "Look for at least 4 uberblocks to ensure a positive match", the function description also says "Check only some of [the 4 uberblock copies]". Anyway, what does happen is that the function checks the four locations for uberblocks, however because there are several slots at each location that all contain the magic number. find_uberblocks returns the number of such slots at the given location, and the results are summed.

https://github.com/karelzak/util-linux/blob/9418ba6d05feed6061f5343741b1bc56e7bde663/libblkid/src/superblocks/zfs.c#L288

As soon as this sum reaches 4 (ZFS_WANT) or more, the loop is exited early and it is determined that a ZFS filesystem is present. This number may already be reached when a single uberblock location is found, as may be the case for an end-of-disk-aligned partition as described above.

As I said above, I'm not sure how the probe_zfs function is supposed to work, but when I edited the line above to found++ instead, blkid no longer reported the whole disk as a zfs_member but still did so for the partition. It will then require the uberblock magic number to be present at all four expected locations, and not four copies of the magic number at only one of them. However, I have only looked at the code for half an hour, so I have no idea if that is actually the solution or if that would break everything ;-)

Robustness of `find_uberblocks`

Also, I noticed that the code seems to assume that at every location, there are 128 uberblock slots spaced at 1 KiB. I believe that, while this was true several years ago, this is no longer necessarily the case. I could not quickly find an up to date specification of the on-disk format, but when I first noticed this bug on my actual system (with a ~500 GB ZFS partition with ashift=12), the blkid debug output included 32 uberblock slots per location spaced at 4 KiB (offsets 128, 132, 136 ...). In the test system described above, the debug output only contains 3 slots, spaced at 1 KiB (offsets 132, 133, 134). When the test zpool is created with ashift=12, I can see again 3 slots, this time spaced at 4 KiB (offsets 144, 148, 152). So I guess the format has become more flexible than it used to be. The "fixed" code was able to work correctly in all three cases, but without an authoritative source for the format, it is hard to say if it will work for every possible ZFS. In any case, the comments also assign some special significance to the uberblock slot "#4 (@ 132kB)". I cannot really see that the code actually does anything special with this slot (unless this where the value of ZFS_WANT comes from?), but seeing as there can be pools which do not have four slots per location, and that the offset might no longer be 132 KiB, at least the comments should be updated.

The text was updated successfully, but these errors were encountered:

sschmitz · 2019-12-30T12:44:24Z

Note that I partitioned the virtual disk using GPT, which writes a backup partition table to the end of the disk. That is why I needed to carefully craft the size of the disk. When you use the MBR scheme instead, a partition can go right to the last sector of the disk, causing the bug to surface with disks of any size.

karelzak · 2020-01-02T13:40:22Z

I have doubts that any find_uberblocks() related optimization is the right way how to fix it. It all sounds like an issue we had with Linux RAID where is/was possible to create superblock at the end of the whole-device. The core of the problem is that probe_zfs() reads within an area covered by valid partition. It sounds like mistake.

Maybe we need to add blkid_probe_is_covered_by_pt() to skip uberblocks within partitions, something like:

probe_zfs() {
       ...
       for (label_no = 0; label_no < 4; label_no++) {
          ...
          if ((S_ISREG(pr->mode) || blkid_probe_is_wholedisk(pr)) &&
             blkid_probe_is_covered_by_pt(pr,  offset, VDEV_LABEL_SIZE))
                  /* ignore this area, it's within any partition and
                   * we are working with whole-disk now */
                   continue;
        
          label = blkid_probe_get_buffer(pr, offset, VDEV_LABEL_SIZE);
      }
      ...
}

karelzak · 2020-01-02T13:42:10Z

Can you try it?

sschmitz · 2020-01-03T13:02:33Z

I will try it next week.

karelzak · 2020-01-09T10:19:43Z

ping ;-) I'd like to add some usable bugfix to the final v2.35.

sschmitz · 2020-01-09T15:34:01Z

Oh, it's Thursday already? 😅 I finally got around to testing it today, and it seems to be working. The whole find_uberblocks() thing was more of an afterthought anyway, and while it does still smell fishy to me, I'm happy with this fix. (I guess in a perfect world, blkid would use the same logic as zpool import does, but I haven't looked at that.)

Addresses: #918 Signed-off-by: Karel Zak <kzak@redhat.com>

karelzak · 2020-01-13T12:02:51Z

Fixed. Thanks for the test and report!

yan12125 · 2020-02-09T14:08:50Z

Hi, blkid is still confused if I do the following steps:

Create several partitions on a disk. For example,

0-100GB /dev/sda1
100-200GB /dev/sda2
200-300GB /dev/sda3
300-400GB /dev/sda4

Create a zpool on the last partition (e.g. /dev/sda4)
Delete the last partition

Now blkid is confused again lol

$ sudo /usr/bin/blkid /dev/sda
/dev/sda: LABEL="zfs_data" UUID="3969014677530013818" UUID_SUB="5766627027699155424" BLOCK_SIZE="4096" TYPE="zfs_member" PTUUID="8325ffb7-6b52-4114-af04-2c55312e79b6" PTTYPE="gpt"

If I understand it correctly, the condition blkid_probe_is_covered_by_pt() in f6e1820 is no longer true after deletion of the partition, so the blkid reports incorrect result.

UPDATE: I think I got everything OK. After deleting sda4 (formerly a zfs vdev), I enlarged sda3 to cover remaining disk space, and the run wipefs:

sudo wipefs --types=zfs_member --all --backup /dev/sda3
sudo wipefs --types=zfs_member --all --backup /dev/sda

Now blkid is working again.

UPDATE 2: wipefs on /dev/sda is also needed.

karelzak · 2020-02-18T08:53:37Z

@yan12125 this is a generic problem for all on-disk stuff. It's always necessary to really delete relevant bytes from the devices to avoid misinterpretations. And yes, we have wipefs for this purpose and some mkfs-like tools also delete old stuff before it creates new superblocks.

karelzak added a commit that referenced this issue Jan 13, 2020

libblkid: (zfs) don't probe whole-disk areas covered by partitions

f6e1820

Addresses: #918 Signed-off-by: Karel Zak <kzak@redhat.com>

karelzak closed this as completed Jan 13, 2020

This was referenced Mar 1, 2024

critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition openzfs/zfs#9105

Open

blkid reports whole disk is zfs pool openzfs/zfs#9069

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blkid reports disk as zfs_member if it has a zfs_member partition #918

blkid reports disk as zfs_member if it has a zfs_member partition #918

sschmitz commented Dec 29, 2019 •

edited

sschmitz commented Dec 30, 2019

karelzak commented Jan 2, 2020 •

edited

karelzak commented Jan 2, 2020

sschmitz commented Jan 3, 2020

karelzak commented Jan 9, 2020

sschmitz commented Jan 9, 2020

karelzak commented Jan 13, 2020

yan12125 commented Feb 9, 2020 •

edited

karelzak commented Feb 18, 2020

blkid reports disk as zfs_member if it has a zfs_member partition #918

blkid reports disk as zfs_member if it has a zfs_member partition #918

Comments

sschmitz commented Dec 29, 2019 • edited

Reproduction

Cause

Robustness of find_uberblocks

sschmitz commented Dec 30, 2019

karelzak commented Jan 2, 2020 • edited

karelzak commented Jan 2, 2020

sschmitz commented Jan 3, 2020

karelzak commented Jan 9, 2020

sschmitz commented Jan 9, 2020

karelzak commented Jan 13, 2020

yan12125 commented Feb 9, 2020 • edited

karelzak commented Feb 18, 2020

sschmitz commented Dec 29, 2019 •

edited

Robustness of `find_uberblocks`

karelzak commented Jan 2, 2020 •

edited

yan12125 commented Feb 9, 2020 •

edited