-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu: Not all /dev/disk/ symlinks are available when mountall runs #2472
Comments
Hi - I opened #1103 and should clarify that the I've also run through the various troubleshooting tips listed here to no avail. This is a persistent bug, but tough to track down. FWIW, I've found that simply running a |
@frymaster Adding wait_for_udev() would seem to be the right solution here. Perhaps @dajhorn could add it to the mountall package in the PPA. |
@ryao, mountall already has udev logic to do something like this near https://github.com/zfsonlinux/mountall/blob/master/ubuntu/trusty/src/mountall.c#L2973 and we iterated on similar solutions in zfsonlinux/mountall#1 (plus older tickets in the obsolete repository). Plus, @rdlugosz reported earlier that he is using duff hardware. Past that, patches are welcome, but getting a solution without side-effects will be a non-trivial amount of work. This kind of issue should be resolved by implementing #330. |
@dajhorn - a 10 minute sleep in the upstart script for mountall doesn't resolve this for me. I don't know how upstart and udev interact, but for whatever reason, creating those symlinks is paused while processing of upstart's "startup" signal is underway, so in my situation that code in mountall is of no benefit. I either have to add wait_for_udev() in the initramfs script (before upstart starts), or not mount the zfs until afterwards. My concern would be that if my pool was mounted later via this hotplug solution it might not come up before services which store data on the pool (database, virtual machines etc.) |
It sounds like you have a secondary problem, like a stale If you want to pursue this issue, then please submit the materials bulleted in the FAQ.
It is worth noting that an upstart dependency on a mount point can be specified when a storage resource is pathologically slow:
Documentation here: |
In the past I was using questionable hardware, but at this point all ZFS-related drives are attached directly to the motherboard. The issue is manageable enough since I rarely reboot, but every once in a while I go on the hunt for a fix. I will say I'm somewhat glad to see another person having the same issue. |
Although this is more or less the same hardware as my 12.04 setup (which exhibited the same symptoms), it was a clean install Any time I've been exporting the pool in order to troubleshoot, there's been no |
@frymaster, please submit the materials bulleted in the FAQ, particularly the full unmodified dmesg. |
@dajhorn The only mention of the dmesg in the FAQ is in relation to getting stack traces for dealing with hung processes. The rest of the details have been provided. Can you be more specific about what information you require? |
@dajhorn ah, sorry, I was looking at http://zfsonlinux.org/faq.html#HowCanIHelp Note I still have a 10 second sleep in the mountall.conf script, hence the 10 second jump after drive detection but just before SPL and ZFS output zpool status - this is after I online'd the UNAVAIL disk (hence the resilver) and then rebooted (hence the UNAVAIL again) |
@frymaster, this is happening because udev is not mapping the partitions on one of the disks. In particular, these device aliases seem to be missing during pool import:
Please gist these things for the Z3GHJ3DGS disk:
Unfortunately, you have a novel problem. This could be caused by a udev misconfiguration, a failing disk, a bad SATA connector, or something else. An easy way to shorten the troubleshooting process here is to replace this disk with a spare. |
Two comments:
Because of this, I'm not inclined to believe it's an issue with one specific disk or SATA port. |
@frymaster, okay, the next step is to trace udev. Edit the Reboot, and post the |
https://gist.github.com/frymaster/0fb49ead9c7ef4a30d53
Certainly they're created by the time rc.local runs. |
@frymaster, I'm stumped. Fixing it will require a local reproducer. |
I'm confused as to how it works at all now.
And in fact if you look at the udev log, the first mention of the partition symlinks is
...while kern.log shows the pool is being created earlier
...as does the udev log
so, at the time mountall is running, none of the symlinks have been created by the udev run by upstart _Wild-ass-guessing follows. I am _not* an expert in this and it shows* But... doesn't initramfs have its own udev? What I think could be happening here is the symlinks that do exist for me are created during the initramfs phase, before upstart is run, then, when my boot device becomes available, initramfs runs upstart (stopping its instance of udev from creating the remaining symlinks), which doesn't run its own udev process until too late to create the symlinks in time for zfs to use them. Is it possible the wait code in mountall only waits for the devices themselves, not for the symlinks? Does this make sense? |
@dajhorn Is there any chance #2455 might help?
|
@ryao, it might. @frymaster, try putting a
Yes, it does. The systemd-udev upstart job is the second invocation, so the time stamps in the system log might not be what you expect. We wanted to see whether the Having something like |
|
I'm closing this issue out due to age. It's gotten a bit stale and with Ubuntu moving to systemd and shipping a version of ZoL with 16.04 this exact issue seems no longer relevant. |
I have a pool where the devices were added using their /dev/disk/by-id aliases. When I reboot, one device is always consistently offline with the status UNAVAIL.
zpool online
always succeeds, andzpool scrub
never finds any errors, either after or just before the reboot. There are no indications from the SMART data of any problems (timeouts, reallocated sectors etc.)If I export the pool and reimport with
-d /dev/disk
instead of-d /dev/disk-by-id
then I can reboot fine without issues.Putting a sleep statement in
mountall.conf
has no effect (other than delaying boot), even if values up to a minute or over are used.The system's root drive is an SSD connected to an add-on card and is (almost) the last disk to be initialised - it starts after the offending disk but completes before. A previous incarnation of this system (Ubuntu 12, ZFS 0.6.2) had the root drive on a USB stick, which was initialised much earlier, and in that system, the pool didn't mount at all, as too many devices were missing. I don't know how the best way to debug before you have a disk to write to, but I was able to output the resuts of
ls /dev/disk/by-id
to the screen on that system and the symlinks were missing, even though the actual devices in/dev/
were present.The workaround was to add wait_for_udev in the initramfs scripts, as if I were trying to have ZFS as my root filesystem (or use /dev/ instead of /dev/disk/by-id)
So there seems to be some kind of race condition where the mountall script is run before udev has finished adding the symlinks
Title in #1103 seems related but the description is very different - for one thing, adding a delay makes no difference in my case
I added the following to the
mountall.conf
script, just before theexec mountall
line:The output of this, along with the zdb output (using both /dev and /dev/disk/by-id) are at https://gist.github.com/frymaster/0f864b579943d53b9107
The devices corresponding to the pool are a,b,d,e,f and g. Note the partrition symlinks for sdg are missing, but are present in
dev
i7 920 CPU, 18GB of RAM. The RAM is non-ECC but I am not reporting a data corruption issue. The system is not running under virtualisation.
The text was updated successfully, but these errors were encountered: