Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
RAIDZ1: unable to replace a drive with itself #2076
Trying to simulate failure scenarios with a 3+1 RAIDZ1 array in order to prepare for eventualities.
manually pull out /dev/sdc without shutting anything down. As expected, zpool status shows the drive in a bad state:
This status doesn't change when I re-insert the drive. So I want to simulate re-introducing a drive that's extremely incoherent relative to the state of the ZFS pool. So, making sure that the drive is "offline", I introduce a raft of changes:
102 MB of changes, to be exact. Now, I want to re-introduce the drive to the pool and get ZFS to work it out. At this point, the status of the drive is:
I try to replace the drive with itself:
I was able to "fix" this with:
During the scrub, the status of the drive changes:
There doesn't seem to be a way to "replace" a known incoherent drive with itself.
You didn't corrupt the disk enough. The dd left the 3rd and 4th copies of the labels intact so it's still being recognized as a part of the pool. All you need to do in this case is to
This may be better mailing list fodder but I'm noticing similar behavior as @mcrbids and I believe this is on topic. I hope you don't mind.
Here is the zpool configuration:
I have attempted to "borrow" a disk from one of the N+2 vdevs (raidz2-1) to the vdev at N (raidz2-2) by offline'ing A1 and zero'ing the the first few hundred megs.
I then edited my
I then removed A1 and C2 and placed A1 in C2's drive tray. I reconnected the new C2. udev triggers and
When I attempt to replace the offline'd C2 with the new C2 however, I get a message that C2 is busy and the disk is automatically partitioned. By zfs I assume.
Note, the "corrupt primary EFI label" message is always present even with brand new disks that have never touched the system. Not sure what that is about. I always have to use
I think there might actually be a bug here as of
As seen in the post above, the system automatically partitions the drive without my intervention. There must be some signaling beyond the zfs label on the drive that informs zfs that this disk is/was a member of this pool.
After I zero'd the drive fully with dd, I was able to use it as a replacement disk.
zpool labelclear scsi-SATA_ST3000DM001-1CH_XXXXXXX-part1 complains about the disk being part of an active pool too. Tried that after a zpool offline /dev/disk/by/id/scsi-SATA_ST3000DM001-1CH_XXXXXXX
To workaround this I moved the disk to another system and did the zpool labelclear there.
After that 'zpool replace -f tank scsi-SATA_ST3000DM001-1CH_XXXXXXX /dev/disk/by-id/scsi-SATA_ST3000DM001-1CH_XXXXXXX got me to resilvering.
i'm running in to this as well. i don't understand, how is this not considered a bug any more?
labelclear is clearly broken: it's impossible to clear a partition that was created as part of a whole-disk pool.
also, 'labelclear -f'ing the drive doesn't do enough to prevent the error 'does not contain an EFI label but it may contain information in the MBR'
why is it even necessary for the user to reason about partitions that they didn't create?
I believe I'm running into this problem.
Cache is a logical volume on a LUKs drive. I must have done something wrong with the setup and it is not properly recognized on reboot.
Any insights greatly appreciated.
EDIT: I should clarify that the cache seems to be in use, which explains why the device is busy. So it's maybe just a minor annoyance that the old cache is unable to be removed?
EDIT: sorry I must have just been being dumb about the paths.. I was able to remove the degraded device with
Gives a bunch of offsets:
Clout these offsets with
Here's your free firearm. You'll find what remains of your foot somewhere near the end of your leg.