Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu: Not all /dev/disk/ symlinks are available when mountall runs #2472

Closed
frymaster opened this issue Jul 8, 2014 · 21 comments
Closed
Milestone

Comments

@frymaster
Copy link

I have a pool where the devices were added using their /dev/disk/by-id aliases. When I reboot, one device is always consistently offline with the status UNAVAIL. zpool online always succeeds, and zpool scrub never finds any errors, either after or just before the reboot. There are no indications from the SMART data of any problems (timeouts, reallocated sectors etc.)

If I export the pool and reimport with -d /dev/disk instead of -d /dev/disk-by-id then I can reboot fine without issues.

Putting a sleep statement in mountall.conf has no effect (other than delaying boot), even if values up to a minute or over are used.

The system's root drive is an SSD connected to an add-on card and is (almost) the last disk to be initialised - it starts after the offending disk but completes before. A previous incarnation of this system (Ubuntu 12, ZFS 0.6.2) had the root drive on a USB stick, which was initialised much earlier, and in that system, the pool didn't mount at all, as too many devices were missing. I don't know how the best way to debug before you have a disk to write to, but I was able to output the resuts of ls /dev/disk/by-id to the screen on that system and the symlinks were missing, even though the actual devices in /dev/ were present.

The workaround was to add wait_for_udev in the initramfs scripts, as if I were trying to have ZFS as my root filesystem (or use /dev/ instead of /dev/disk/by-id)

So there seems to be some kind of race condition where the mountall script is run before udev has finished adding the symlinks

Title in #1103 seems related but the description is very different - for one thing, adding a delay makes no difference in my case

root@gregor:~# dpkg -l | grep zfs
ii  dkms                                2.2.0.3-1.1ubuntu5+zfs9~trusty      all          Dynamic Kernel Module Support Framework
ii  libzfs2                             0.6.3-2~trusty                      amd64        Native ZFS filesystem library for Linux
ii  mountall                            2.53-zfs1                           amd64        filesystem mounting tool
ii  ubuntu-zfs                          8~trusty                            amd64        Native ZFS filesystem metapackage for Ubuntu.
ii  zfs-dkms                            0.6.3-2~trusty                      amd64        Native ZFS filesystem kernel modules for Linux
ii  zfsutils                            0.6.3-2~trusty                      amd64        Native ZFS management utilities for Linux
root@gregor:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04 LTS
Release:        14.04
Codename:       trusty
root@gregor:~# uname -a
Linux gregor 3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I added the following to the mountall.conf script, just before the exec mountall line:

exec >>/dev/.initramfs/myjob.log 2>&1
set -x

echo "=== Sleeping for 10 seconds"
/bin/sleep 10 || true
echo "=== Output of /dev"
/bin/ls -lh /dev/sd* || true
/bin/ls -lh /dev/disk/by-id || true

The output of this, along with the zdb output (using both /dev and /dev/disk/by-id) are at https://gist.github.com/frymaster/0f864b579943d53b9107

The devices corresponding to the pool are a,b,d,e,f and g. Note the partrition symlinks for sdg are missing, but are present in dev

i7 920 CPU, 18GB of RAM. The RAM is non-ECC but I am not reporting a data corruption issue. The system is not running under virtualisation.

@frymaster frymaster changed the title Ubuntu: Not all /dev/disk/ symlinks are avilable when mountall runs Ubuntu: Not all /dev/disk/ symlinks are available when mountall runs Jul 9, 2014
@rdlugosz
Copy link

Hi - I opened #1103 and should clarify that the sleep 30 only appeared to work for me; it didn't actually solve the issue (I think I hit a lucky streak where the drives happened to be all ready a couple of times in a row). I've edited my comment over there to prevent future confusion.

I've also run through the various troubleshooting tips listed here to no avail. This is a persistent bug, but tough to track down.

FWIW, I've found that simply running a zpool export tank immediately followed by a zpool import tank will resolve the issue... tempted to just drop that in rc.local but that just feels wrong (and dangerous). Be careful that you don't have other services start by default that may want access to your zpool mounts. I've seen issues where the pool isn't available & some process ends up creating a directory at the expected mount point. Then you'll need to clean that up before you'll be able to mount your zpool again.

@ryao
Copy link
Contributor

ryao commented Jul 11, 2014

@frymaster Adding wait_for_udev() would seem to be the right solution here. Perhaps @dajhorn could add it to the mountall package in the PPA.

@dajhorn
Copy link
Contributor

dajhorn commented Jul 11, 2014

@ryao, mountall already has udev logic to do something like this near https://github.com/zfsonlinux/mountall/blob/master/ubuntu/trusty/src/mountall.c#L2973 and we iterated on similar solutions in zfsonlinux/mountall#1 (plus older tickets in the obsolete repository).

Plus, @rdlugosz reported earlier that he is using duff hardware.

Past that, patches are welcome, but getting a solution without side-effects will be a non-trivial amount of work. This kind of issue should be resolved by implementing #330.

@frymaster
Copy link
Author

@dajhorn - a 10 minute sleep in the upstart script for mountall doesn't resolve this for me. I don't know how upstart and udev interact, but for whatever reason, creating those symlinks is paused while processing of upstart's "startup" signal is underway, so in my situation that code in mountall is of no benefit. I either have to add wait_for_udev() in the initramfs script (before upstart starts), or not mount the zfs until afterwards.

My concern would be that if my pool was mounted later via this hotplug solution it might not come up before services which store data on the pool (database, virtual machines etc.)

@dajhorn
Copy link
Contributor

dajhorn commented Jul 11, 2014

a 10 minute sleep in the upstart script for mountall doesn't resolve this for me

It sounds like you have a secondary problem, like a stale /etc/zfs/zpool.cache file or udev rules that didn't survive the upgrade to trusty. (Ubuntu 14.04 rearranged many things in /dev and /etc.)

If you want to pursue this issue, then please submit the materials bulleted in the FAQ.

My concern would be that if my pool was mounted later via this hotplug solution it might not come up before services which store data on the pool (database, virtual machines etc.)

It is worth noting that an upstart dependency on a mount point can be specified when a storage resource is pathologically slow:

# /etc/init/MyDatabaseServer.conf
start on (local-filesystems and mounted MOUNTPOINT=/tank/MyDatabaseFiles)

Documentation here:

@rdlugosz
Copy link

In the past I was using questionable hardware, but at this point all ZFS-related drives are attached directly to the motherboard. The issue is manageable enough since I rarely reboot, but every once in a while I go on the hunt for a fix. I will say I'm somewhat glad to see another person having the same issue.

@frymaster
Copy link
Author

It sounds like you have a secondary problem, like a stale /etc/zfs/zpool.cache file or udev rules that didn't survive the upgrade to trusty. (Ubuntu 14.04 rearranged many things in /dev and /etc .)

Although this is more or less the same hardware as my 12.04 setup (which exhibited the same symptoms), it was a clean install

Any time I've been exporting the pool in order to troubleshoot, there's been no zpool.cache remaining. I've updated the top post with zdb output from both configurations, and also the output of an "ls" of the /dev and /dev/disk/by-id directories after a 10 second wait in the mountall script.

@dajhorn
Copy link
Contributor

dajhorn commented Jul 12, 2014

@frymaster, please submit the materials bulleted in the FAQ, particularly the full unmodified dmesg.

@frymaster
Copy link
Author

@dajhorn The only mention of the dmesg in the FAQ is in relation to getting stack traces for dealing with hung processes. The rest of the details have been provided. Can you be more specific about what information you require?

@dajhorn
Copy link
Contributor

dajhorn commented Jul 13, 2014

@frymaster
Copy link
Author

@dajhorn ah, sorry, I was looking at http://zfsonlinux.org/faq.html#HowCanIHelp

kern.log
dmesg

Note I still have a 10 second sleep in the mountall.conf script, hence the 10 second jump after drive detection but just before SPL and ZFS output

zpool status - this is after I online'd the UNAVAIL disk (hence the resilver) and then rebooted (hence the UNAVAIL again)
zfs get all

@dajhorn
Copy link
Contributor

dajhorn commented Jul 13, 2014

@frymaster, this is happening because udev is not mapping the partitions on one of the disks. In particular, these device aliases seem to be missing during pool import:

  • /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_Z3GHJ3DGS-part1
  • /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_Z3GHJ3DGS-part9
  • /dev/disk/by-wwn/wwn-0x5000039ff4d5258b-part1
  • /dev/disk/by-wwn/wwn-0x5000039ff4d5258b-part9

Please gist these things for the Z3GHJ3DGS disk:

# smartctl --all /dev/sdg
# fdisk -l /dev/sdg
# parted /dev/sdg print

Unfortunately, you have a novel problem. This could be caused by a udev misconfiguration, a failing disk, a bad SATA connector, or something else. An easy way to shorten the troubleshooting process here is to replace this disk with a spare.

@frymaster
Copy link
Author

smartctl
fdisk
parted

Two comments:

  • The device aliases are created at some point after mountall.conf runs - this is how I am able to online the drive without issues
  • In my 12.04 install using the same motherboard and pool disks, the main difference was I was booting from a USB drive rather than an SSD plugged into a PCIE SATA card. This device got detected much earlier in the boot process, and in that setup the device aliases were missing for three drives during boot, with the result that the pool couldn't mount at all (too many disks missing)

Because of this, I'm not inclined to believe it's an issue with one specific disk or SATA port.

@dajhorn
Copy link
Contributor

dajhorn commented Jul 13, 2014

@frymaster, okay, the next step is to trace udev. Edit the /etc/init/udev.conf file and change the last line to exec /lib/systemd/systemd-udevd --daemon --debug.

Reboot, and post the /var/log/udev and /var/log/kern.log files. Create a new gist for these files. Let the system run long enough so that udev creates all of the device aliases.

@frymaster
Copy link
Author

https://gist.github.com/frymaster/0fb49ead9c7ef4a30d53

Let the system run long enough so that udev creates all of the device aliases.

Certainly they're created by the time rc.local runs.

@dajhorn
Copy link
Contributor

dajhorn commented Jul 13, 2014

@frymaster, I'm stumped. Fixing it will require a local reproducer.

@frymaster
Copy link
Author

I'm confused as to how it works at all now.

  • udev creates the symlinks
  • the udev task is run when virtual-filesystems is emitted
  • the mountall script is what emits that
  • my ls before mountall shows almost all the symlinks created

And in fact if you look at the udev log, the first mention of the partition symlinks is

UDEV  [17.065605] add      /devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata9/host8/target8:0:0/8:0:0:0/block/sdh/sdh1 (block)
ACTION=add
DEVLINKS=/dev/disk/by-id/ata-SanDisk_SDSSDP064G_141251406216-part1 /dev/disk/by-id/wwn-0x5001b44beed26588-part1 /dev/disk/by-uuid/e68ec506-a886-4507-9fcb-124561df6527

...while kern.log shows the pool is being created earlier

Jul 13 20:35:05 gregor kernel: [   16.568653]  zd0: p1 p2

...as does the udev log

KERNEL[16.963564] add      /devices/virtual/block/zd0 (block)

so, at the time mountall is running, none of the symlinks have been created by the udev run by upstart

_Wild-ass-guessing follows. I am _not* an expert in this and it shows*

But... doesn't initramfs have its own udev? What I think could be happening here is the symlinks that do exist for me are created during the initramfs phase, before upstart is run, then, when my boot device becomes available, initramfs runs upstart (stopping its instance of udev from creating the remaining symlinks), which doesn't run its own udev process until too late to create the symlinks in time for zfs to use them. Is it possible the wait code in mountall only waits for the devices themselves, not for the symlinks?

Does this make sense?

@ryao
Copy link
Contributor

ryao commented Jul 13, 2014

@dajhorn Is there any chance #2455 might help?

On Jul 13, 2014, at 4:11 PM, Darik Horn notifications@github.com wrote:

@frymaster, I'm stumped. Fixing it will require a local reproducer.


Reply to this email directly or view it on GitHub.

@dajhorn
Copy link
Contributor

dajhorn commented Jul 13, 2014

Is there any chance #2455 might help?

@ryao, it might.

@frymaster, try putting a udevadm trigger line above the exec in the /etc/init/mountall.conf file. If that doesn't work, then try installing the zfs-initramfs package afterwards.

But... doesn't initramfs have its own udev?

Yes, it does. The systemd-udev upstart job is the second invocation, so the time stamps in the system log might not be what you expect. We wanted to see whether the DEVLINKS= were happening after the regular system started, but they were happening on time.

Having something like /dev/sdg1 appear without udev reliably creating a corresponding /dev/disk/by-id or /dev/disk/by-wwn alias for it is unusual.

@ryao
Copy link
Contributor

ryao commented Jul 13, 2014

On Jul 13, 2014, at 5:38 PM, Darik Horn notifications@github.com wrote:

Is there any chance #2455 might help?

@ryao, it might.

@frymaster, try putting a udevadm trigger line above the exec in the /etc/init/mountall.conf file. If that doesn't work, then try installing the zfs-initramfs package afterwards.

@dajhorn udevadm trigger operates asynchronously, so we could race with udevd's creation of symlinks. I included a call to yield() in that patch to try to close the race. In the case of mountall.conf, I think it might be necessary to place sleep 0.1 after udevadm trigger so that we lose the race.

@behlendorf
Copy link
Contributor

I'm closing this issue out due to age. It's gotten a bit stale and with Ubuntu moving to systemd and shipping a version of ZoL with 16.04 this exact issue seems no longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants