critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105

olivier-klein · 2019-07-31T16:22:30Z

System information

Type	Version/Name
Distribution Name	MANJARO
Distribution Version	18.0.4
Linux Kernel	4.19.60-1
Architecture	amd64
ZFS Version	0.8.1-1
SPL Version	0.8.1-1

Describe the problem you're observing

Critical bug: zpool has modified the signature table and occupy now the whole disk /dev/nvme0m1 instead of being constrained to one partition: /dev/nvme0m1p5

#lsblk -f
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
nvme0m1 zfs_member tank 959334055019102200
├─nvme0m1p1 vfat ESP F8E8-2918 738,1M 5% /boot/efi
├─nvme0m1p2 vfat OS 5224-C2FA
├─nvme0m1p3 ext4 UBUNTU bed2f845-754b-477b-8bdb-3cba7d56fae3
├─nvme0m1p4 ext4 MANJARO 3134ceb0-795e-4f51-a6fb-ba172fac0312 75,5G 16% /
└─nvme0m1p5 zfs_member tank 9593340550191022900

#lsblk -a
nvme0m1 259:0 0 953,9G 0 disk
├─nvme0m1p1 259:1 0 780M 0 part /boot/efi
├─nvme0m1p2 259:2 0 5G 0 part
├─nvme0m1p3 259:3 0 97,7G 0 part
├─nvme0m1p4 259:4 0 97,7G 0 part /
└─nvme0n1p5 259:5 0 752,8G 0 part

#blkid /dev/nvme0m1
/dev/nvme0m1: LABEL="tank" UUID="9593340550191022900" UUID_SUB="541976190045946664" TYPE="zfs_member" PTUUID="e7762bd0-453e-4900-b428-26f1b11c22b5" PTTYPE="gpt"

Describe how to reproduce the problem

Followed instructions on https://wiki.archlinux.org/index.php/ZFS
zpool was created with id from ls -lh /dev/disk/by-id/

sudo zpool create -f -o ashift=13 -m /mnt/tank tank nvmePC401_NVMe_SK_hynix_1TB_MI93T003810403E62-part5

NOTE that zpool was mounted as default to occupy the whole partition (i.e. w/o redundancy or raid0)

Interestingly gparted (thus disk signature) showed correctly the partition table after installation. Everything got messed up after enabling zfs.target zfs-mount zfs-import.target zfs-import-cache
and reboot.

Include any warning/errors/backtraces from the system logs

This is a critical issue.
boot log now messed up

juil. 31 07:07:37 XPS13 systemd[1]: systemd-firstboot.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start First Boot Wizard.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-sysusers.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Create System Users.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-fsck-root.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start File System Check on Root Device.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-binfmt.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Set Up Additional Binary Formats.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-guest-user.service: Failed with result 'start-limit-hit'.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start systemd-guest-user.service.
juil. 31 07:07:37 XPS13 systemd[1]: systemd-hwdb-update.service: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to start Rebuild Hardware Database.
juil. 31 07:07:37 XPS13 systemd[1]: sys-fs-fuse-connections.mount: Start request repeated too quickly.
juil. 31 07:07:37 XPS13 systemd[1]: Failed to mount FUSE Control File System.
juil. 31 07:07:37 XPS13 systemd-udevd[300]: Process '/usr/bin/alsactl restore 0' failed with exit code 99.

olivier-klein · 2019-07-31T16:38:38Z

There is a serious bug affecting zfs 0.8-1.1 (tested on latest manjaro running on linux kernel 4.19). This bug has been reported in different forums under different context.

https://gitlab.gnome.org/GNOME/gparted/issues/14

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114

https://bbs.archlinux.org/viewtopic.php?id=206121

https://bbs.archlinux.org/viewtopic.php?id=202587

Any help on how to wipe clean the signature block information on /dev/nvme0m1 would be welcome. I have not tried yet zpool clearlabel as, as far I understand, it would wipe out the entire disk.

GregorKopka · 2019-08-05T23:44:36Z

The problem is that blkid, when looking at the whole disk, sees the ZFS uberblocks at the end of the nvme0m1p5 partition (which also is the end of the disk) and then thinks that the whole disk must be a zfs member. It's wrong with that.

It's also a problem for zpool import which could fall for the same problem: seeing the uberblocks at the end when looking as /dev/nvme0m1, then failing to import as these point to garbage when one counts sectors from the beginning of the drive instead of the beginning of the partition.

The solution to this is a small, empty partition (some 10 MiB) at the end of the drive (´zpool create`, when given whole drives, does this by creating a small 'partition 9' at the end) so blkid and zpool import won't see the uberblocks at the end of the actual zfs partition when looking at the whole disk (as they'll instead see the empty space of partition 9).

Do not operate zpool labelclear on the whole drive, it will not solve the problem (as the uberblocks will be rewritten, round-robin, on every txg) but has a fair chance to destroy your pool and even the whole partition table (including the backup at the end of the drive).

Best option is to backup the contents of the pool, destroy it, reduce the size of the last partition by some 10-20 MiB, create a partition at the end that protects this free space (and dd if=/dev/zero that one, to get rid of the uberblocks in that area), then recreate the pool and restore the backup.

olivier-klein · 2019-08-13T22:19:21Z

Do you think that
dd if=/dev/zero of=/dev/nvme0n1p6 bs=512 count=50
will be enough to wipe out the uberblocks of the last empty partition?

GregorKopka · 2019-08-14T06:55:55Z

In case nvme0n1p6 is the new (protection) partition you just created the dd dosn't need the count option, it'll stop when reaching the end of the partition (= when done filling it completely with zeros, getting rid of whatever might have been there before). Just make sure to specify the right of ;)

dankamongmen · 2020-01-03T17:37:00Z

I also ran into this problem in my growlight project: dankamongmen/growlight#4

I filed a bug against upstream, but have heard nothing (filed 2019-08): https://www.spinics.net/lists/util-linux-ng/msg15811.html

I detail how I worked around it here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888114 (last comment) and in the growlight issue linked above

stale · 2021-01-02T19:48:29Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

GregorKopka · 2021-01-04T12:47:46Z

Has this been fixed?

stale · 2022-01-05T02:24:44Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

GregorKopka · 2022-01-05T14:39:02Z

Stale bot should not close defects.

stale · 2023-01-07T19:43:50Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

GregorKopka · 2023-01-08T15:09:33Z

@behlendorf defect or not?

ZLima12 · 2024-02-29T18:01:29Z

Bumping this because it seems like a terrible data corruption bug that needs to be fixed.

mfleetwo · 2024-03-01T15:54:10Z

Upstream root cause and fix:

util-linux issue 918 - blkid reports disk as zfs_member if it has a zfs_member partition
Fixed by commit libblkid: (zfs) don't probe whole-disk areas covered by partitions
First included in util-linux v2.35 released Jan 21 2020

ZLima12 · 2024-03-01T18:17:21Z

I see, so zfs never actually touched the partition table at all. Either way, glad it's fixed, and this issue should be closed.

olivier-klein changed the title ~~zpool messes up partition table~~ zpool messes up partition table : zfs occupy the whole disk Jul 31, 2019

olivier-klein changed the title ~~zpool messes up partition table : zfs occupy the whole disk~~ zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition Jul 31, 2019

olivier-klein changed the title ~~zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition~~ critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition Jul 31, 2019

olivier-klein mentioned this issue Jul 31, 2019

blkid reports whole disk is zfs pool #9069

Open

behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 2, 2020

stale bot added the Status: Stale No recent activity for issue label Jan 2, 2021

stale bot removed the Status: Stale No recent activity for issue label Jan 4, 2021

stale bot added the Status: Stale No recent activity for issue label Jan 5, 2022

stale bot removed the Status: Stale No recent activity for issue label Jan 5, 2022

stale bot added the Status: Stale No recent activity for issue label Jan 7, 2023

stale bot removed the Status: Stale No recent activity for issue label Jan 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105

critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105

olivier-klein commented Jul 31, 2019 •

edited

Loading

olivier-klein commented Jul 31, 2019 •

edited

Loading

GregorKopka commented Aug 5, 2019

olivier-klein commented Aug 13, 2019

GregorKopka commented Aug 14, 2019

dankamongmen commented Jan 3, 2020

stale bot commented Jan 2, 2021

GregorKopka commented Jan 4, 2021

stale bot commented Jan 5, 2022

GregorKopka commented Jan 5, 2022

stale bot commented Jan 7, 2023

GregorKopka commented Jan 8, 2023

ZLima12 commented Feb 29, 2024

mfleetwo commented Mar 1, 2024

ZLima12 commented Mar 1, 2024

critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105

critical bug: zpool messes up partition table : zfs_member occupy the whole disk instead of being constrained to a partition #9105

Comments

olivier-klein commented Jul 31, 2019 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

olivier-klein commented Jul 31, 2019 • edited Loading

GregorKopka commented Aug 5, 2019

olivier-klein commented Aug 13, 2019

GregorKopka commented Aug 14, 2019

dankamongmen commented Jan 3, 2020

stale bot commented Jan 2, 2021

GregorKopka commented Jan 4, 2021

stale bot commented Jan 5, 2022

GregorKopka commented Jan 5, 2022

stale bot commented Jan 7, 2023

GregorKopka commented Jan 8, 2023

ZLima12 commented Feb 29, 2024

mfleetwo commented Mar 1, 2024

ZLima12 commented Mar 1, 2024

olivier-klein commented Jul 31, 2019 •

edited

Loading

olivier-klein commented Jul 31, 2019 •

edited

Loading