Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create zpool using links in /dev/disk/by-id #3708

Closed
Gregy opened this issue Aug 27, 2015 · 45 comments
Closed

Cannot create zpool using links in /dev/disk/by-id #3708

Gregy opened this issue Aug 27, 2015 · 45 comments
Milestone

Comments

@Gregy
Copy link

Gregy commented Aug 27, 2015

ZFS refuses to create new pool using links in /dev/disk/by-id. Using device nodes directly in /dev/ works fine. See snippet:

for disk in `ls /dev/sd[a-n]`; do dd if=/dev/zero of=$disk bs=1000M count=1; done
reboot
ls -l /dev/disk/by-id/
...
lrwxrwxrwx 1 root root 9 Aug 27 15:51 ata-ST2000VN0001-1SF174_Z4H033G5 -> ../../sdb
lrwxrwxrwx 1 root root 9 Aug 27 15:51 ata-ST2000VN0001-1SF174_Z4H047T6 -> ../../sda
...

zpool create -o ashift=12 -f mainStorage mirror /dev/disk/by-id/ata-ST2000VN0001-1SF174_Z4H047T6 /dev/disk/by-id/ata-ST2000VN0001-1SF174_Z4H033G5
cannot create 'mainStorage': one or more devices is currently unavailable

zpool create -o ashift=12 -f mainStorage mirror /dev/sda /dev/sdb
<finishes ok>

Is there some way to make zpool create more verbose?

I am running debian jessie with zfsutils 0.6.4-1.2-1

dmesg | grep -E 'SPL:|ZFS:'
[   46.699468] SPL: Loaded module v0.6.4-1b
[   46.798113] ZFS: Loaded module v0.6.4-1.2-1, ZFS pool version 5000, ZFS filesystem version 5
[  165.222847] SPL: using hostid 0x00000000

Thank you

@ryao
Copy link
Contributor

ryao commented Aug 27, 2015

Thanks for the bug report. For now, try using the short names, ata-ST2000VN0001-1SF174_Z4H047T6 and ata-ST2000VN0001-1SF174_Z4H033G5.

@rlanyi
Copy link

rlanyi commented Sep 3, 2015

The same is happening for me on Debian Jessie, even with short names, without the "/dev/disk/by-id" prefix.

I also found a workaround: create the pool with device names (eg. sda, sdb), then destroy the pool and recreate it with the previous command. This way I could use disk ids while creating the pool.

@roosteng
Copy link

roosteng commented Sep 4, 2015

This is also happening on OpenSuse 13.2. Does not happen on OpenSuse 13.1. Both compiled with latest git. It creates partition 1 and 9 on the drives but I think maybe they are not showing up fast enough in the /dev/disk/by-id/ directory?

@dracwyrm
Copy link

dracwyrm commented Sep 8, 2015

@rlanyi Instead of destroying the ZPool, you can export and then import it like this:
$ sudo zpool export tank
$ sudo zpool import -d /dev/disk/by-id tank
That will switch all /dev/sdx drives to the full ID.

Cheers.

@atikir
Copy link

atikir commented Oct 8, 2015

I had this similar issue today. It worked via sdb and sdc but not with disk-ids. I realized if I used lower case for the pool name it worked. ie,
zpool create Storage -o autoexpand=on mirror ID1 ID2
did not work , but
zpool create storage -o autoexpand=on mirror ID1 ID2

did work.

@mdsitton
Copy link

I'm having the same issue.the name of the pool probably makes no difference for me since mine was lowercase already.

Could it be an issue with symbolic links? Since thats what the by-id names are for me.

@mdsitton
Copy link

What @dracwyrm suggested seems to work though so thanks!

@siberx
Copy link

siberx commented Nov 7, 2015

Just ran into this bug myself today; wasted an hour fighting to create a new zpool for the first time using ZoL with by-id names until I came across this thread and tried creating using the /dev/sd* names instead. Worked fine, and once my migration is done I'll re-import by-id as described by dracwyrm to clean things up. I'm running under Fedora 22 if that helps.

@jbrodriguez
Copy link

This also happened to me.

Applied the workaround suggested by @dracwyrm and it was ok.

I'm running under Arch Linux (2015.11.01)

@wdennis
Copy link

wdennis commented Dec 16, 2015

Another "one or more devices is currently unavailable" sufferer when trying to create zpool with "by-id" devnames... This on Ubuntu 15.04 (ran into it on two separate systems so far.) Again, @dracwyrm 's workaround solved the problem for me. I did have one case where after a 2nd/3rd attempt to create with by-id devname's did work however (fwiw, used the "-f" flag on create...) Didn't work for me on this last system tho.

@dmaziuk
Copy link

dmaziuk commented Jan 11, 2016

Another one here. Adding raidz1-10 (sdae, sdaf, sdag) in a 36-bay AMD supermicro. Centos 7, kmod-zfs-0.6.5.3-1.el7.centos.x86_64 -- also fixed by @dracwyrm's workaround.

@fractalram
Copy link

Another one here..

root@debian:/dev/disk/by-id# zpool create -f mypool2 /dev/disk/by-id/scsi-35000c5008e59858f /dev/disk/by-id/scsi-35000c5008e5ba103
cannot create 'mypool2': no such pool or dataset

Works with names :+1
root@debian:/home/debian# zpool create -f mypool mirror sdi sdj
root@debian:/home/debian# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
mypool 5.44T 224K 5.44T - 0% 0% 1.00x ONLINE -
root@debian:/home/debian# zfs list
NAME USED AVAIL REFER MOUNTPOINT
mypool 240K 5.27T 96K /mypool
root@debian:/home/debian# zpool status
pool: mypool
state: ONLINE
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
mypool      ONLINE       0     0     0
  mirror-0  ONLINE       0     0     0
    sdi     ONLINE       0     0     0
    sdj     ONLINE       0     0     0

errors: No known data errors

@dmaziuk
Copy link

dmaziuk commented Feb 5, 2016

Got a different error message today:

# zpool add tank -o ashift=12 raidz1 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z84038QC /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z84038TR /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z8403EB5 
cannot add to 'tank': no such pool or dataset

adding sda[h-j]1 worked. I haven't looked closely the last time, but this time it failed after creating -part1 and -part9 on the new disks, and after subsequent export/import:

# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 3.26M in 52h58m with 0 errors on Fri Jan  8 18:19:55 2016
config:

    NAME                                          STATE     READ WRITE CKSUM
    tank                                          ONLINE       0     0     0
      raidz1-0                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EGL4           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EGNK           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EHNP           ONLINE       0     0     0
      raidz1-2                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EJ9F           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EJCD           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EKD5           ONLINE       0     0     0
      raidz1-3                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EKXR           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301EL9Z           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301F0K7           ONLINE       0     0     0
      raidz1-4                                    ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2LVNLYS  ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301DD0E           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301DE6R           ONLINE       0     0     0
      raidz1-5                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301LGR9           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_S301LH9F           ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E4TSKF1D  ONLINE       0     0     0
      raidz1-6                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z3041V61           ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7LLAU1Z  ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z3041VR9           ONLINE       0     0     0
      raidz1-7                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z304NLWM           ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5UXCTN8  ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z304NLJN           ONLINE       0     0     0
      raidz1-8                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z304VZWY           ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NS4ZH4  ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z304X42V           ONLINE       0     0     0
      raidz1-9                                    ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z3057DKN           ONLINE       0     0     0
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0SERR6H  ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z305838A           ONLINE       0     0     0
      raidz1-10                                   ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z3041QD6           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z3041QXP           ONLINE       0     0     0
        ata-ST4000VN000-1H4168_Z3041RG3           ONLINE       0     0     0
      raidz1-11                                   ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z84038QC-part1    ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z84038TR-part1    ONLINE       0     0     0
        ata-ST8000AS0002-1NA17Z_Z8403EB5-part1    ONLINE       0     0     0
    logs
      mirror-1                                    ONLINE       0     0     0
        ata-CT1000BX100SSD1_1506F002DAF8-part3    ONLINE       0     0     0
        ata-CT1000BX100SSD1_1506F002DB7A-part3    ONLINE       0     0     0
    spares
      ata-ST4000VN000-1H4168_S301EHVW             AVAIL

-- note that the latest additions: raidz1-11 got imported as -part1's.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- OK that's my bad, (looking at the command history now) I accidentally added partitions instead of whole devices.

@fusionstream
Copy link

Just thought I'd come in and say @dracwyrm workaround is also working for me on Fedora 22. Thanks!

@Ralithune
Copy link

This is still happening.

I'm trying to write some software that will work generically across different OS's and systems running ZFS, and using the WWN ID to specify the drives is the best way to accomplish that. Is this being looked at at all?

I'm running CentOS 7.

_EDIT_
It appears to actually create partitions on the disks, even though the command failes with "no such pool or dataset".

@jdmaloney
Copy link

I'm seeing this as well on CentOS 7 on zpool creates, but to me more critically on zpool replaces. Managing some large JBODs and have scripts to walk the disks looking for failures and start the rebuild after the new disk is inserted. Can't get consistent success on the replace, both by hand or in the scripts when using anything besides the raw /dev/sdxx device. Same error: "no such pool or dataset"

It's not too painful create zpools, export and re-import /dev/disk/by-vdev, but I need to replace disks without having to take the system down to export and reimport.

@partoneoftwo
Copy link

I just hit this problem myself, on Proxmox 4.1 distrib of Debian on kernel 4.3 with zfs on linux.
The direct solution to the problem is the workaround above. I used this Arch Linux forum post to guide me.

@dmaziuk
Copy link

dmaziuk commented Mar 14, 2016

@thomasfrivold: yes, we know. The problem is downtime, there wasn't supposed to be any.

@Ralithune
Copy link

And secondly, using the workaround in a utility meant to run on different systems of foreign configs is really sloppy and will be prone to failure.

We need the issue fixed. I don't have a lot of experience digging in to open source software (just writing my own convenience stuff) but I suppose I could take a crack at it.

@behlendorf
Copy link
Contributor

As mentioned above the problem is likely that the partition links under /dev/disk/by-id/ aren't being created. When given a block device zpool create|add|replace| will partition the device and expect those partitions to be available fairly promptly. If they're not it will fail as described. This is something udev takes care of on most platforms.

Unfortunately, I wasn't able to easy reproduce this issue so I'm hoping someone who can can answer a few questions for me:

  1. After the command fails are the new partitions visible in the /dev/disk/by-id directory? They will be called something like ata-ST0000VN000-123456_S1234567-part1 and ``ata-ST0000VN000-123456_S1234567-part9`.

  2. If not, are they created after running udevadm trigger?

  3. How long does the zpool create command take to run in the failure case. By default it should wait up to 30 seconds per device for the partitions link to be created by udev.

@dmaziuk
Copy link

dmaziuk commented Mar 16, 2016

I'm pretty sure it's less than 30 seconds (and it's zpool add in my case). I will be adding another set of disks to the system with this behaviour in the next month or so, I'm making a note to make a note. ;)

@jdmaloney
Copy link

  1. Can check tomorrow when I get in to work, I'm not 100% sure about the partitions, though I know the base device at least is in /dev/disk/by-path

  2. Again can double check

  3. I agree with @dmaziuk it's less than 30 seconds, for me it's in the 5-10 second range

My code for replacement is in a repo here on github: https://github.com/jdmaloney/zfs_utils
(zfs_fault_fixer) After placing the new drive in the chassis, I wait 30 seconds to a minute then kick off the script, though I have found waiting longer and longer doesn't increase my chances of success. Making sure the access light is long since off I kick off the script which runs through and creates my new vdev_id.conf file, runs udevadm trigger, sleeps 5 seconds, then builds the command to run the zpool replace and runs it. I echo the pci string to make sure my script can seen it, and it does, so linux should know the drive is there.

Can post some exact output tomorrow as I have this system available for testing.

@dmaziuk
Copy link

dmaziuk commented Mar 21, 2016

Out of order:

  1. 2 seconds
]# date ; zpool add tank -o ashift=12 raidz1 /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EX0H /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EY7V /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EZP0 ; date
Mon Mar 21 16:50:00 CDT 2016
cannot add to 'tank': no such pool or dataset
Mon Mar 21 16:50:02 CDT 2016
  1. yes
# ls /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840E*
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EX0H    /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EY7V    /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EZP0
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EX0H-part1  /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EY7V-part1  /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EZP0-part1
/dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EX0H-part9  /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EY7V-part9  /dev/disk/by-id/ata-ST8000AS0002-1NA17Z_Z840EZP0-part9

So 2) does not apply, of course.

Just for completeness:

# zpool add tank -o ashift=12 raidz1 /dev/sdam /dev/sda /dev/sdal

[root@starfish by-id]# zpool status tank
  pool: tank
...
      raidz1-12                                   ONLINE       0     0     0
        sdam                                      ONLINE       0     0     0
        sda                                       ONLINE       0     0     0
        sdal                                      ONLINE       0     0     0
...

@behlendorf behlendorf added this to the 0.7.0 milestone Mar 22, 2016
@dasjoe
Copy link
Contributor

dasjoe commented Mar 23, 2016

@behlendorf I've seen this bug before, relevant snippet from an IRC discussion:

23:02 < dasjoe> ryao: I *think* by-id/ gets populated too late after zfs puts a GPT on a disk
23:03 < dasjoe> ryao: there are countermeasures against this, but as far as I understand the code ZFS waits until links to /dev/sdX appear, not for -part1 to become available

@ilovezfs
Copy link
Contributor

@behlendorf OK, the problem is the interval between stats needs to be longer here:
https://github.com/zfsonlinux/zfs/blob/master/lib/libzfs/libzfs_pool.c#L4128

diff --git a/lib/libzfs/libzfs_pool.c b/lib/libzfs/libzfs_pool.c
index 9fc4bfc..1e946cd 100644
--- a/lib/libzfs/libzfs_pool.c
+++ b/lib/libzfs/libzfs_pool.c
@@ -4124,8 +4124,8 @@ zpool_label_disk_wait(char *path, int timeout)
     * will exist and the udev link will not, so we must wait for the
     * symlink.  Depending on the udev rules this may take a few seconds.
     */
-   for (i = 0; i < timeout; i++) {
-       usleep(1000);
+   for (i = 0; i < timeout/10; i++) {
+       usleep(10000);

        errno = 0;
        if ((stat64(path, &statbuf) == 0) && (errno == 0))

Not sure what's needed in order to be as robust as possible without being silly but that seems to have been sufficient.

@dasjoe
Copy link
Contributor

dasjoe commented Mar 29, 2016

@behlendorf @ilovezfs I thought the race condition was caused by https://github.com/zfsonlinux/zfs/blob/master/lib/libzfs/libzfs_pool.c#L4282 - we seem to be waiting for the disk's symlinks to get created, not for the disk's first partition.
The problem with this is, the disk's symlink never gets removed, whereas -partX disappear.

Interestingly the zpool create completes successfully after zapping GPT and MBR.

Here's an udevadm monitor --e during a zpool create -f TEST /dev/disk/by-id/usb-SanDisk_Extreme_AA010805141143052101-0\:0, with partitions 1 and 9 already existing:

UDEV  [440370.389410] remove   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440370.392463] remove   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440370.473817] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440370.524642] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440370.527806] remove   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440370.536155] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440370.544679] remove   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440370.617717] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440370.676901] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440370.679961] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440370.756487] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440370.806316] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440370.817451] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440370.872328] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)

Here the same, using a fresh GPT without partitions 1 and 9:

UDEV  [440661.083752] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440661.116216] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440661.126996] remove   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440661.130858] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440661.142613] remove   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440661.226062] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440661.255215] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440661.259926] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440661.335891] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440661.365770] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440661.371126] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440661.390271] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)

And here after zapping all GPT and MBR structures:

UDEV  [440781.063806] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440781.143256] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440781.178342] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440781.178877] add      /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440781.283868] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb (block)
UDEV  [440781.414088] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb9 (block)
UDEV  [440781.432523] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)
UDEV  [440781.498210] add      /devices/virtual/bdi/zfs-8 (bdi)
UDEV  [440781.527729] change   /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/host6/target6:0:0/6:0:0:0/block/sdb/sdb1 (block)

@behlendorf
Copy link
Contributor

@dasjoe I'm not sure I follow you entirely. We should be waiting for the partition symlink (-partX) to be created in the expected place by udev. Are you saying those partition symlinks aren't being created by udev for some reason? Just the device itself?

@dasjoe
Copy link
Contributor

dasjoe commented Mar 31, 2016

@behlendorf Nevermind, I just realized zfs_append_partition gets called before we are waiting for the symlinks, so we are correctly using the partition symlink.

However, it is interesting that zpool create fails for disks with a GPT even when no partitions are present. Zapping the GPT/MBR labels makes zpool create succeed, though.

@dmaziuk
Copy link

dmaziuk commented Mar 31, 2016

Hmm... IIRC when I started playing with zfs, zpool create wouldn't work unless the disk had a GPT label? I mean, there must've been a reason I started creating those instead of just adding completely blank disks.

@dasjoe
Copy link
Contributor

dasjoe commented Mar 31, 2016

@dmaziuk as of now zpool create complains about disks lacking a GPT, you can force the creation with -f.

@dmaziuk
Copy link

dmaziuk commented Mar 31, 2016

Ah, that's what it was. Well, I'm maxed out in the chassis I've been adding disks to & it looks like I won't have an opportunity to test it with -f anytime soon. :(

@jhetrick
Copy link

I think this is the same situation as above so I can test it for you :)

`
[root@itf]# ls /dev/disk/by-id/wwn-0x5000cca23b0e93a8*
/dev/disk/by-id/wwn-0x5000cca23b0e93a8

[root@itf]# zpool create dpool05 wwn-0x5000cca23b0e93a8
invalid vdev specification
use '-f' to override the following errors:

/dev/disk/by-vdev/wwn-0x5000cca23b0e93a8 contains a corrupt primary EFI label.
[root@itf]# zpool create -f dpool05 wwn-0x5000cca23b0e93a8
cannot create 'dpool05': no such pool or dataset

[root@itf]# ls /dev/disk/by-id/wwn-0x5000cca23b0e93a8*
/dev/disk/by-id/wwn-0x5000cca23b0e93a8 /dev/disk/by-id/wwn-0x5000cca23b0e93a8-part9
/dev/disk/by-id/wwn-0x5000cca23b0e93a8-part1

[root@itf]# zpool create -f dpool05 wwn-0x5000cca23b0e93a8
cannot create 'dpool05': no such pool or dataset
`

@emk2203
Copy link

emk2203 commented Apr 19, 2016

I am also affected by this bug. The problem is that zpool replace doesn't work even with an attempted workaround.

I am running Ubuntu 16.04LTS with the official zfs modules, at the moment 0.6.5.6-0ubuntu8.

If I try to exchange a disk from a however imported pool, the offlined and removed disk is known to zfs by id, even if I imported the pool by dev.

An attempt to replace, either with disk id or device, gives the error. It also doesn't matter if I erase the GPT and file system structure or not. It just takes a few seconds longer until the error message appears. Also, forcing with -f has no effect.

To prepare the disk for zfs, it should be enough to erase the first and last 100 MB on it, no?

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 22, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
@behlendorf behlendorf modified the milestones: 0.6.5.7, 0.7.0 Apr 25, 2016
@behlendorf
Copy link
Contributor

This issue has been addressed in master by commit 2d82ea8 and we'll look in to back porting it for 0.6.5.7. As always if you're in a position where you can providing additional verification of the fix applied to master it would be appreciated. This ended up being a subtle timing issue so the more real world validation of the fix the better.

@ierdnah
Copy link

ierdnah commented May 5, 2016

On Ubuntu 16.04 LTS:
root@ubuntu:# zpool create -f -m /var/backups/ backups ata-ST2000DM001-1ER164_Z4Z4BPGL
root@ubuntu:
# zpool destroy backups
root@ubuntu:# zpool create -f -m /var/backups/ backups ata-ST2000DM001-1ER164_Z4Z4BPGL
cannot create 'backups': no such pool or dataset
root@ubuntu:
# dd if=/dev/zero of=/dev/disk/by-id/ata-ST2000DM001-1ER164_Z4Z4BPGL count=100 bs=1M
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.544526 s, 193 MB/s
root@ubuntu:# zpool create -f -m /var/backups/ backups ata-ST2000DM001-1ER164_Z4Z4BPGL
root@ubuntu:
#

nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
@DanEmord
Copy link

DanEmord commented Jun 5, 2016

I'm currently on Ubuntu 16.04 with the latest available ZFS being 0.6.5.6 (0.6.5.7 is available in 16.10, but I don't feel like mucking with that right now :-) ). I was able to create pools on SSDs without issue, but the HDD were too slow and would fail with this error. On a whim, I decided to try maxing out the CPU to induce artificial delay... and it worked! I can reliably create and destroy the pool with "stress -c 16" running in the background. YMMV.

ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
@HinderSting
Copy link

HinderSting commented Jun 18, 2016

The following command failed for me:
zpool add tank mirror /dev/disk/by-id/xxxxxxx /dev/disk/by-id/yyyyyyy

New parts like xxxxxx-part1 xxxxx-part9 were created after the command execution.

This workaround works for me:
zpool add tank mirror /dev/disk/by-id/xxxxxxx-part1 /dev/disk/by-id/yyyyyyy-part1

PS. I'm running Ubuntu 16.04

@alejos
Copy link

alejos commented Mar 1, 2018

hello there,
I had the same probleme but there was just a space in my command check your syntax.
hope help !
Alejo

@NathanaelA
Copy link

NathanaelA commented Jan 21, 2019

@rlanyi Instead of destroying the ZPool, you can export and then import it like this:
$ sudo zpool export tank
$ sudo zpool import -d /dev/disk/by-id tank
That will switch all /dev/sdx drives to the full ID.

Cheers.

This worked awesome; for some reason I was totally confused and thinking I needed to put the actual disk id of one of the devices in the command line so it would find the devices; like so:

$ sudo zpool import -d /dev/disk/by-id/SOME_DISK_ID tank
But that is wrong and fails. Do NOT put the disk_id in; just use the
$ sudo zpool import -d /dev/disk/by-id tank
and zpool is smart enough to figure it all out for you based on your pool name. ;-)

So if anyone else gets confused by the instructions elsewhere; just let zpool do the work, don't over complicate it. ;-)

@OrlandoNative
Copy link

It's not just by-id. Seems the zpool utility can't directly handle anything other than by "standard" devices - eg sdX; other than for import. Once you create a pool; you can then export it and re-import it using just about any format you want - by-id; by-path; by-partid; etc. But if you need to replace a disk; or detach a spare; or whatever; none of those subcommands work using that same disk format. You then have to export and re-import the pool specifying /dev so it gets the "standard" device names for the disks; and then you can run the subcommands. I'm on CentOS 7 using the latest available zfs; which is a higher release than most of what I've seen in this thread; and it still has this "annoyance". Zpool should have some way of determining what format it''s vdevs are using as far as device names go; and allow it's subcommands to use that format.

@altimmons
Copy link

altimmons commented Apr 30, 2022

Disappointing to see this 2016 bug still around.
Adding by ID fails, and Ive tried 1000 ways over 4 days, deleting MBRs, and Partitions and whatever else I can think of.

What worked was adding by /dev/sdx (which we are explicitly advised NOT to do) and then using dracwyrm 7 year work around.

Really in 7 years no one has fixed this?

specifically I get this error

missing link: sdbs was partitioned but /dev/disk/by-vdev/ztest8-part1 is missing

I set up aliases in /etc/zfs/vdev_id.conf
and running udevadm trigger
then:
❯ zpool create -f ztst mirror ztest8 ztest9

But also using the /dev/disk/by-id/ names fails every time, though I think the error differs slightly:

cannot label 'sdbt': failed to detect device partitions on '/dev/sdbt1': 19

System is Slackware (Unraid)
❯ zfs version
zfs-2.1.4-1
zfs-kmod-2.1.4-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests