New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blkid reports disk as zfs_member if it has a zfs_member partition #918
Comments
Note that I partitioned the virtual disk using GPT, which writes a backup partition table to the end of the disk. That is why I needed to carefully craft the size of the disk. When you use the MBR scheme instead, a partition can go right to the last sector of the disk, causing the bug to surface with disks of any size. |
I have doubts that any find_uberblocks() related optimization is the right way how to fix it. It all sounds like an issue we had with Linux RAID where is/was possible to create superblock at the end of the whole-device. The core of the problem is that probe_zfs() reads within an area covered by valid partition. It sounds like mistake. Maybe we need to add blkid_probe_is_covered_by_pt() to skip uberblocks within partitions, something like:
|
Can you try it? |
I will try it next week. |
ping ;-) I'd like to add some usable bugfix to the final v2.35. |
Oh, it's Thursday already? 😅 I finally got around to testing it today, and it seems to be working. The whole find_uberblocks() thing was more of an afterthought anyway, and while it does still smell fishy to me, I'm happy with this fix. (I guess in a perfect world, blkid would use the same logic as zpool import does, but I haven't looked at that.) |
Addresses: #918 Signed-off-by: Karel Zak <kzak@redhat.com>
Fixed. Thanks for the test and report! |
Hi, blkid is still confused if I do the following steps:
Now blkid is confused again lol
If I understand it correctly, the condition UPDATE: I think I got everything OK. After deleting sda4 (formerly a zfs vdev), I enlarged sda3 to cover remaining disk space, and the run wipefs:
Now blkid is working again. UPDATE 2: wipefs on /dev/sda is also needed. |
@yan12125 this is a generic problem for all on-disk stuff. It's always necessary to really delete relevant bytes from the devices to avoid misinterpretations. And yes, we have wipefs for this purpose and some mkfs-like tools also delete old stuff before it creates new superblocks. |
This bug has previously been reported to the mailing list, however the author later concluded it was user error. However, I can reproduce the erroneous behavior consistently starting from an empty virtual disk. This may also result in the behavior described in this report.
Reproduction
I tested this in a virtual machine running the Xubuntu 19.10 Live ISO with util-linux 2.34 (but I have also tested it with the current master). The VM was configured to have an empty virtual 10737514496-byte disk. This is a little over 10 GiB; a nice round power-of-2 size will not reproduce the issue. The virtual disk was hosted on an ext4 filesystem; I have read that ZFS does not like to be self-hosted, at least not inside a ZVOL.
Verify the disk does not contain any old signatures
Then, initialize the disk and create a partition spanning the whole disk (in fact, it only needs to be aligned to the end of the disk).
blkid
will now correctly identify the disk as GPT-formatted.Now, create a zpool on the partition.
Voila, blkid is now confused:
Cause
ZFS stores four copies of its uberblock on every backing device, two at the front and two at the back. The ones at the back are aligned to a 256 KiB boundary. Therefore, if the end of the last uberblock in the partition is less than 256 KiB from the end of the disk, it is stored in the same position as the last uberblock of a whole-disk ZFS would be (and likewise for the second-to-last copy).
libblkid will find these two uberblocks at the end of the disk and treat them as evidence of the whole disk being ZFS-formatted. Looking at the code for
probe_zfs
insuperblocks/zfs.c
, I'm not entirely clear on the extent to which this is intentional. While a comment does say "Look for at least 4 uberblocks to ensure a positive match", the function description also says "Check only some of [the 4 uberblock copies]". Anyway, what does happen is that the function checks the four locations for uberblocks, however because there are several slots at each location that all contain the magic number.find_uberblocks
returns the number of such slots at the given location, and the results are summed.https://github.com/karelzak/util-linux/blob/9418ba6d05feed6061f5343741b1bc56e7bde663/libblkid/src/superblocks/zfs.c#L288
As soon as this sum reaches 4 (
ZFS_WANT
) or more, the loop is exited early and it is determined that a ZFS filesystem is present. This number may already be reached when a single uberblock location is found, as may be the case for an end-of-disk-aligned partition as described above.As I said above, I'm not sure how the
probe_zfs
function is supposed to work, but when I edited the line above tofound++
instead, blkid no longer reported the whole disk as a zfs_member but still did so for the partition. It will then require the uberblock magic number to be present at all four expected locations, and not four copies of the magic number at only one of them. However, I have only looked at the code for half an hour, so I have no idea if that is actually the solution or if that would break everything ;-)Robustness of
find_uberblocks
Also, I noticed that the code seems to assume that at every location, there are 128 uberblock slots spaced at 1 KiB. I believe that, while this was true several years ago, this is no longer necessarily the case. I could not quickly find an up to date specification of the on-disk format, but when I first noticed this bug on my actual system (with a ~500 GB ZFS partition with ashift=12), the blkid debug output included 32 uberblock slots per location spaced at 4 KiB (offsets 128, 132, 136 ...). In the test system described above, the debug output only contains 3 slots, spaced at 1 KiB (offsets 132, 133, 134). When the test zpool is created with ashift=12, I can see again 3 slots, this time spaced at 4 KiB (offsets 144, 148, 152). So I guess the format has become more flexible than it used to be. The "fixed" code was able to work correctly in all three cases, but without an authoritative source for the format, it is hard to say if it will work for every possible ZFS. In any case, the comments also assign some special significance to the uberblock slot "#4 (@ 132kB)". I cannot really see that the code actually does anything special with this slot (unless this where the value of
ZFS_WANT
comes from?), but seeing as there can be pools which do not have four slots per location, and that the offset might no longer be 132 KiB, at least the comments should be updated.The text was updated successfully, but these errors were encountered: