Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBUSY on /dev/zfs for replace #440

Closed
bill-mcgonigle opened this issue Nov 7, 2011 · 24 comments
Closed

EBUSY on /dev/zfs for replace #440

bill-mcgonigle opened this issue Nov 7, 2011 · 24 comments
Milestone

Comments

@bill-mcgonigle
Copy link

zpool is getting EBUSY on /dev/zfs when I try to replace a disk. The new disk isn't actually in use and I've rebooted a few times and zeroed out the disk.


        NAME                      STATE     READ WRITE CKSUM
        storage                   DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            10094699726010659595  UNAVAIL      0     0     0  was /dev/sdd1
            sdg                   ONLINE       0     0     0
        logs
          mirror-1                ONLINE       0     0     0
            sdi1                  ONLINE       0     0     0
            sdf1                  ONLINE       0     0     0
        cache
          sdh1                    ONLINE       0     0     0
[root@librescu ~]# strace zpool replace storage 10094699726010659595 sde 2>&1| egrep '(sde|sdg|= 4|ioctl)'
....
open("/dev/zfs", O_RDWR)                = 4
ioctl(4, 0x5a04, 0x7fffe5a0b430)        = 0
ioctl(4, 0x5a12, 0x7fffe5a0b430)        = 0
ioctl(4, 0x5a05, 0x7fffe5a06e10)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a06e00)        = -1 ESRCH (No such process)
ioctl(4, 0x5a2a, 0x7fffe5a03c70)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = -1 ESRCH (No such process)
ioctl(4, 0x5a12, 0x7fffe5a0b430)        = 0
ioctl(4, 0x5a05, 0x7fffe5a06e10)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = -1 ENOMEM (Cannot allocate memory)
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a06e00)        = -1 ESRCH (No such process)
ioctl(4, 0x5a2a, 0x7fffe5a03c70)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = -1 ENOMEM (Cannot allocate memory)
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = 0
ioctl(4, 0x5a14, 0x7fffe5a0b400)        = -1 ESRCH (No such process)
open("/proc/filesystems", O_RDONLY)     = 4
open("/usr/lib/locale/locale-archive", O_RDONLY) = 4
open("/proc/modules", O_RDONLY)         = 4
open("/dev/zfs", O_RDWR)                = 4
ioctl(4, 0x5a05, 0x7fffe5a07340)        = 0
access("/dev/sde", F_OK)                = 0
open("/dev/sde", O_RDWR|O_EXCL|O_DIRECT) = 7
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
readlink("/proc/self", "8083", 4095)    = 4
readlink("/proc/8083/fd/7", "/dev/sde", 4095) = 8
lstat("/dev/sde", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 64), ...}) = 0
open("/dev/sdes0", O_RDONLY|O_EXCL)     = -1 ENOENT (No such file or directory)
open("/dev/sde", O_RDWR|O_EXCL|O_DIRECT) = 7
readlink("/proc/self", "8083", 4095)    = 4
readlink("/proc/8083/fd/7", "/dev/sde", 4095) = 8
lstat("/dev/sde", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 64), ...}) = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
stat("/dev/sde1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 65), ...}) = 0
open("/dev/sde1", O_RDONLY)             = 9
stat("/dev/sde9", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 73), ...}) = 0
open("/dev/sde9", O_RDONLY)             = 9
lstat("/dev/sde", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 64), ...}) = 0
ioctl(4, 0x5a2a, 0x7fffe5a01b10)        = 0
readlink("/proc/self", "8083", 4095)    = 4
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 3000591450112)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 3000591450112)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 3000591450112)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 3000591450112)   = 0
open("/dev/sdg1", O_RDWR|O_DIRECT)      = 7
readlink("/proc/self", "8083", 4095)    = 4
readlink("/proc/8083/fd/7", "/dev/sdg1", 4095) = 9
lstat("/dev/sdg1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 97), ...}) = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000389479936)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000389479936)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000389479936)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000389479936)   = 0
open("/dev/sde", O_RDWR|O_DIRECT)       = 7
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
readlink("/proc/self", "8083", 4095)    = 4
readlink("/proc/8083/fd/7", "/dev/sde", 4095) = 8
lstat("/dev/sde", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 64), ...}) = 0
readlink("/proc/self", "8083", 4095)    = 4
readlink("/proc/8083/fd/7", "/dev/sde", 4095) = 8
lstat("/dev/sde", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 64), ...}) = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKFLSBUF, 0)                  = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKFLSBUF, 0)                  = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKFLSBUF, 0)                  = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKFLSBUF, 0)                  = 0
ioctl(7, BLKRRPART, 0x3920195238)       = 0
stat("/dev/sde1", {st_mode=S_IFBLK|0600, st_rdev=makedev(8, 65), ...}) = 0
open("/dev/sde", O_RDWR|O_DIRECT)       = 7
readlink("/proc/self", "8083", 4095)    = 4
readlink("/proc/8083/fd/7", "/dev/sde", 4095) = 8
lstat("/dev/sde", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 64), ...}) = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
ioctl(7, BLKSSZGET, 512)                = 0
ioctl(7, BLKGETSIZE64, 2000398934016)   = 0
stat("/dev/sde1", {st_mode=S_IFBLK|0600, st_rdev=makedev(8, 65), ...}) = 0
open("/dev/sde1", O_WRONLY|O_EXCL)      = 7
write(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
ioctl(4, 0x5a0e, 0x7fffe5a06f70)        = -1 EBUSY (Device or resource busy)
write(2, "cannot replace 10094699726010659"..., 58cannot replace 10094699726010659595 with sde: sde is busy

similar for trying to attach:

[root@librescu ~]# strace zpool attach storage sdg sde 2>&1| egrep '(sde|sdg|= 4|ioctl)'

stat("/dev/sde1", {st_mode=S_IFBLK|0600, st_rdev=makedev(8, 65), ...}) = 0
open("/dev/sde1", O_WRONLY|O_EXCL)      = 7
write(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
access("/dev/sdg", F_OK)                = 0
ioctl(4, 0x5a0e, 0x7fff8c13d8f0)        = -1 EBUSY (Device or resource busy)
write(2, "cannot attach sde to sdg: sde is"..., 38cannot attach sde to sdg: sde is busy

spl-0.6.0-rc6.x86_64
spl-modules-0.6.0-rc6_2.6.40_4.fc15.x86_64
spl-modules-devel-0.6.0-rc5_2.6.40_4.fc15.x86_64
zfs-0.6.0-rc6.x86_64
zfs-devel-0.6.0-rc6.x86_64
zfs-dracut-0.6.0-rc6.x86_64
zfs-modules-0.6.0-rc6_2.6.40_4.fc15.x86_64
zfs-modules-devel-0.6.0-rc6_2.6.40_4.fc15.x86_64
zfs-test-0.6.0-rc6.x86_64

@behlendorf
Copy link
Contributor

This is a duplicate of issue #250.

The issue is that zfs opens the devices with the O_EXCL flag as a safety measure to prevent other system utilities from mistakenly using the disk. Unfortunately, several of the zfs utilities themselves try and open the devices as well resulting in this issue. It's something we know about just haven't gotten fixed yet.

@bill-mcgonigle
Copy link
Author

Thanks, Brian. I'm missing something here, though - in Reinis's thread (referenced on issue #250) he says that if the device is out of the pool, he can replace:

http://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/thread/9b90dc6b0e41f3c4#

But in my case the device is out of the pool and I can't replace. I just want to make sure I'm thoroughly reporting what I'm seeing here.

@prakashsurya
Copy link
Member

Are you still seeing this issue? I'm trying to reproduce it in a test environment but am not have much luck. Briefly looking at the strace output you posted, it's complaining about the sde drive (which isn't yet managed by zfs). I assume that drive isn't being used by another program..

Looking at your packages, you have the rc6 version of everything except spl-modules-devel is at rc5. Can you update all the packages to a consistent version, and see if the issue still occurs? I'm curious if building an rc6 ZFS against the rc5 SPL would cause weird behavior.

@bill-mcgonigle
Copy link
Author

Right, that sde is otherwise unclaimed. I just tried it again, using a git clone from a few days ago on Linux 3.1 now:

[root@librescu ~]# zpool status storage
  pool: storage
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 scan: resilvered 0 in 0h4m with 0 errors on Fri Sep 23 16:02:18 2011
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            10094699726010659595  UNAVAIL      0     0     0  was /dev/sdd1
            sdg                   ONLINE       0     0     0
        logs
          mirror-1                ONLINE       0     0     0
            sdi1                  ONLINE       0     0     0
            sdf1                  ONLINE       0     0     0
        cache
          sdh1                    ONLINE       0     0     0

errors: No known data errors
[root@librescu ~]# zpool replace storage 10094699726010659595 scsi-SATA_Hitachi_HDS5C30_ML0220F30HSKAD
cannot replace 10094699726010659595 with scsi-SATA_Hitachi_HDS5C30_ML0220F30HSKAD: scsi-SATA_Hitachi_HDS5C30_ML0220F30HSKAD is busy
[root@librescu ~]# zpool replace storage 10094699726010659595 sde
cannot replace 10094699726010659595 with sde: sde is busy
[root@librescu ~]# zpool attach storage sdg sde
cannot attach sde to sdg: sde is busy

(same disk, by-id and sd-name)

Let me know what other information would be helpful.

@prakashsurya
Copy link
Member

Thanks for the update. From what I can tell, the EBUSY error from your strace output corresponds to the ioctl made on line 2396 of libzfs_pool.c:

        if (zcmd_write_conf_nvlist(hdl, &zc, nvroot) != 0)                          
                return (-1);                                                        
                                                                                    
        ret = zfs_ioctl(hdl, ZFS_IOC_VDEV_ATTACH, &zc);                             
                                                                                    
        zcmd_free_nvlists(&zc);                                                     

We're you originally running on a 3.1 kernel? I have a 2.6.32 RHEL6.1 box at the moment, although I don't think the differing kernels would make a difference.

@bill-mcgonigle
Copy link
Author

Originally it was 2.6.38 or 2.6.39 (Fedora 15 updates). Same issue with 3.0 and 3.1, so I agree, not likely version specific.

So, 'zfs replace' works for you? Brian's comment above lead me to believe this is a known design issue. I saw a suggestion on the list to boot with a Solaris Express CD to get a pool back in shape until issue #250 is resolved.

@prakashsurya
Copy link
Member

I don't know if this will help, but could you post the output of zpool history -i storage?

@prakashsurya
Copy link
Member

Yeah, zfs replace replace works just fine for me if I am replacing the bad drive with a drive which is not in the pool.

@bill-mcgonigle
Copy link
Author

I'll send it to you. It has a little bit more info than I'm comfortable posting publicly.

@prakashsurya
Copy link
Member

Understandable. I couldn't get much out of the history, although it looks like there were a few reboots in there (?) so I'm not 100% the drive names are the same in each command (i.e. /dev/sde could correspond to different drives across reboots). The zdev.conf and udev rules could help if you care to use those.

I wasn't able to find the commands you referenced above in the log you sent me, were they performed on a different pool?

@behlendorf
Copy link
Contributor

The link to issue #250 has to do specifically with hot spares which I thought might be related to this since the symptom is the same. However, it might not be I never dug deep enough in to you specific case to say for certain, or I would have just closed this as a duplicate.

Bill since your able to reproduce this, and we're not, I would suggest using systemtap to try and get the kernel trace so we can see where things went wrong. After installing systemtap you should be able to do the following:

$ sudo stap /usr/share/doc/systemtap-1.4/examples/general/para-callgraph.stp 'module("zfs").function("*@module/zfs/*.c")' \
'module.function("zfs_ioc_vdev_attach")' -c "zpool replace storage 10094699726010659595 sde"

You might need to tweak the above command a little bit, this was just from memory. Also make absolutely sure sde really isn't in use. If that disk was ever part of a md device the kernel may have silently opened it O_EXCL which is very hard to detect from user space.

@prakashsurya
Copy link
Member

That's some pretty cool systemtap foo! I just needed to replace module.function with module("zfs").function. So, this should do the trick:

# stap /usr/share/doc/systemtap-1.4/examples/general/para-callgraph.stp \
   'module("zfs").function("*@module/zfs/*.c")' \
   'module("zfs").function("zfs_ioc_vdev_attach")' -c "zpool replace storage 10094699726010659595 sde"

@bill-mcgonigle
Copy link
Author

I'm beyond my knowledge zone here, but stap told me:

Pass 5: run failed. Try again with another '--vp 00001' option.

So, I added that option, and then it generated a 13MB file. I don't see how to do an attachment here, so:

http://www.bfccomputing.com/downloads/zfs/stap-zpool-replace.gz

I checked the drive with mdadm --examine and it just says:

[root@librescu ~]# mdadm --examine /dev/sde
/dev/sde:
   MBR Magic : aa55
Partition[0] :   3907029167 sectors at            1 (type ee)

so, I don't think the kernel is grabbing it for md. Is there any other way to see if anybody else might have O_EXCL on it?

I do think the drive order has changed over time - I'll be using by-id in the future. The above commands were attempted against this pool.

@behlendorf
Copy link
Contributor

The logs are very revealing. They indicate the zfs believes your trying to replace 1009469972601065959 with an already active zfs disk. The sde device appears to have already have a zfs label on. The label which zfs is reading off the disk indicates that the drive is active and is part of the current pool. Now since this isn't a spare device it's preventing you from doing this by returning EBUSY.

Prakash is going to try and reproduce this situation locally. But were there any strange circumstances regarding this device being removed from the pool? Was it done immediately after a crash perhaps? Or ripped out of a running system? I'm sure zeroing out the beginning and end of the device to remove the labels would resolve the issue, but before you can that can you post your pool configuration and labels from sde by running the following:

# Dump the configuration information for the entire pool
$ sudo zdb -C storage
...
# Dump the label information for the replacement drive
$ sudo zdb -l /dev/sde
...

@bill-mcgonigle
Copy link
Author

I'm sorry I don't have great notes on this, but it's fairly likely that this disk was failed out of the pool after a crash (I've managed to find a few of those edge cases...). I have zeroed the disk, though, block 0 to the end.

Of note, however, is that the disk does have a partition table, which I assume zfs put on there when I tried to add it back in (but which did not complete). parted says:

Model: ATA Hitachi HDS5C302 (scsi)
Disk /dev/sde: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2000GB  2000GB               zfs
 9      2000GB  2000GB  8389kB
zdb -l /dev/sde
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
zdb -C storage
MOS Configuration:
        version: 28
        name: 'storage'
        state: 0
        txg: 2751219
        pool_guid: 7011265622648457147
        hostname: 'librescu.bfccomputing.com'
        vdev_children: 2
        vdev_tree:
            type: 'root'
            id: 0
            guid: 7011265622648457147
            children[0]:
                type: 'mirror'
                id: 0
                guid: 6078239822923415079
                whole_disk: 0
                metaslab_array: 31
                metaslab_shift: 33
                ashift: 12
                asize: 2000384688128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 10094699726010659595
                    path: '/dev/sdd1'
                    whole_disk: 1
                    not_present: 1
                    DTL: 237
                    create_txg: 4
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 10893854155280816731
                    path: '/dev/sdg1'
                    whole_disk: 1
                    DTL: 238
                    create_txg: 4
                    resilvering: 1
            children[1]:
                type: 'mirror'
                id: 1
                guid: 7987315353505274420
                whole_disk: 0
                metaslab_array: 234
                metaslab_shift: 26
                ashift: 9
                asize: 10732699648
                is_log: 1
                create_txg: 953043
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 4936997403905323669
                    path: '/dev/sdi1'
                    whole_disk: 0
                    DTL: 243
                    create_txg: 953043
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 18036696030347577095
                    path: '/dev/sdf1'
                    whole_disk: 0
                    DTL: 267
                    create_txg: 953043
                    resilvering: 1

Would it be useful for me to zero the disk again, then run a replace under stap to show the whole process?

@prakashsurya
Copy link
Member

How about zdb -l /dev/sde1?

@bill-mcgonigle
Copy link
Author

ah, interesting:

--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 28
    name: 'storage'
    state: 0
    txg: 960329
    pool_guid: 7011265622648457147
    hostid: 17484033
    hostname: 'librescu.bfccomputing.com'
    top_guid: 6078239822923415079
    guid: 10094699726010659595
    vdev_children: 2
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 6078239822923415079
        whole_disk: 0
        metaslab_array: 31
        metaslab_shift: 33
        ashift: 12
        asize: 2000384688128
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 10094699726010659595
            path: '/dev/sdd1'
            whole_disk: 1
            DTL: 237
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10893854155280816731
            path: '/dev/sdg1'
            whole_disk: 1
            DTL: 238
            create_txg: 4
            resilvering: 1
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 28
    name: 'storage'
    state: 0
    txg: 960329
    pool_guid: 7011265622648457147
    hostid: 17484033
    hostname: 'librescu.bfccomputing.com'
    top_guid: 6078239822923415079
    guid: 10094699726010659595
    vdev_children: 2
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 6078239822923415079
        whole_disk: 0
        metaslab_array: 31
        metaslab_shift: 33
        ashift: 12
        asize: 2000384688128
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 10094699726010659595
            path: '/dev/sdd1'
            whole_disk: 1
            DTL: 237
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 10893854155280816731
            path: '/dev/sdg1'
            whole_disk: 1
            DTL: 238
            create_txg: 4
            resilvering: 1

@prakashsurya
Copy link
Member

And the plot thickens.. /dev/sde's guid appears to be the same as the guid for mirror-0 children[0]. I'm not yet sure if that's significant, but it seems like it could be.

@behlendorf
Copy link
Contributor

Yes, very interesting. So the device guid 10094699726010659595 is stored on the /dev/sde device in the zfs label. This same device guid is also being used by /dev/sdd1 in the configuration. That explains a lot. When zfs attempts to replace /dev/sdd with /dev/sde it reads the guid from the label in /dev/sde. It then checks the pool to make sure it doesn't already have a disk with this guid in the pool. Unfortunately, in your case that check fails because of the /dev/sdd entry and you get EBUSY.

In general this is a good safety feature since it prevents you from accidentally using the wrong disk. For example, say you have two symlinks pointing to a single device. You want zfs to be able to detect that and prevent you from doing something unwise.

Anyway, your system must have gotten in to this state due to some past trauma. As you mention earlier it has seen some crashes! Anyway, can you try zeroing the device to remove the labels and then try again.

@bill-mcgonigle
Copy link
Author

Yep, I'll run it over night and then do a new replace under stap. Just to be clear, this will be the same conditions as when I first filed the bug (other than systemtap and a newer git clone of spl/zfs).

Thanks for all the detailed analysis, guys!

@behlendorf
Copy link
Contributor

If you want to speed up the process you should be able to just dd zeros over that last portion of the /dev/sde1 partition. If zdb -l reports no labels your good to go. Aside from this issue, which looks pretty exotic, and the hot spare issue. Do you know of any other disk replacement issues?

@bill-mcgonigle
Copy link
Author

Well, for Pete's sake, that worked. I know I at least started zeroing the disk before attempting this before and filing this bug report, but it just occurred to me that, since I was being lazy and zeroing the whole disk, I might have seen a crash before the second label got overwritten. Empirically, it seems likely the second label survived here. I thought that because I saw only /dev/sde and not /dev/sde1 in /proc/partitions

from scrollback (to prove my sanity to myself):
...
   8       80   19545624 sdf
   8       81   10485760 sdf1
   8       82    9058840 sdf2
   8       64 1953514584 sde
   8      128   19545624 sdi
   8      129   10485760 sdi1
   8      130    9058840 sdi2
...

that the label must have been destroyed, but perhaps that's a faulty assumption. zdb -l sure did find a label in sde1. I don't know why linux and zdb would disagree here, but relying on linux's interpretation in /proc/partitions is insufficient at least.

:sigh: thanks for all the attention guys, and sorry to bother you with stupid user tricks.

For the next guy: here's how to zero a GPT label quickly: http://www.bfccomputing.com/2011/12/15/zero-a-gpt-label-using-dd/

@behlendorf
Copy link
Contributor

OK, well since you seemed to have a very particular kind of issue and we got it sorted out I'm going to close this issue.

behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 21, 2018
If kernel lock debugging is enabled, the fs_struct structure exceeds the
typical 1024 byte limit of CONFIG_FRAME_WARN and isn't enabled when it
otherwise should be.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#440
@erikruser
Copy link

I come from the future to say thank you to everybody in this thread for not being like DenverCoder9 (https://xkcd.com/979/). Being a zfs noob, I got myself into nearly exactly the same situation as Bill, and the troubleshooting tips posted here got me fixed up perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants