Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
system freezes when zfs is waiting for disks to spin up #3785
Comments
gbooker
commented
Sep 18, 2015
|
I'm experiencing the exact same thing, though in my case it is a 2 disk mirror on USB disks. Since upgrading to ZFS 0.6.5, I've had 7 hard-lockups in 36 hours. In my case, the hdparm command to stop the drive sleep seems to not work. Each time the machine is locked up, one drive in the mirror seems to have spun up, but not the second. It's as if it triggered the spin up on one drive, then proceeded to lock CPU cores in a progressive fashion until the machine is unresponsive. Disconnecting the USB doesn't help. This doesn't bode well for ZoL surviving hardware failure. If a drive becomes unresponsive, will it choose to lock the computer rather than degrading the pool? Same OS, ZFS version, and kernel, Core i5-2300, 12G of RAM. Using same HBA on internal pool, though that pool does not seem to be the problem. |
gbooker
commented
Sep 19, 2015
|
Some more info in the hopes it is useful: Zpool import/export work even if the drives are not spinning. Exporting the pool and leaving it that way ends the lockups. I think it is quite definitive the issue is with an imported pool where the drives can spin down. This is a problem with USB disks as they often cannot be stopped from spinning down. I know the argument against using ZFS on USB disks, but I also know people who use ZFS on USB disks over other FSs because it interoperates well between linux and mac and supports today's disk sizes and files over 4G. |
|
@gbooker not sure if it's 100% related to @0xFelix problem since his disks appear to be all internal (?) I've also encountered issues with spinning up/down of disks and USB and ran into problems with external disks that I haven't ran with internal ones (SATA), e.g. the XHCI driver having bugs, leading to hardlocks of the system, or resetting the driver & link - and assigning a different drive letter to the drive (there's additional complexity when using cryptsetup, lvm, etc.); the external harddrive enclosure (or firmware of the HDDs) having a built-in timeout which sends the harddrives to standby ... last time I read something about timeouts of (broken) disks, there was some work done by @ryao it would be interesting to know if there's something that could be improved specifically on Linux to prevent lockups, stalls, etc. of the rest of the system during these kind of situations in conjunction with ZFS usage |
|
I just also encountered an issue with a spun down external USB enclosure: cannot receive: specified fs (HGST5K4000/bak_ext/) does not exist this was shown after it took really long for the harddrive to spin up, then it ended up to fail anyway ... luckily the system didn't lock up - I however agree that there should be a timeout setting to decide how long the drive can take to spin up and respond did any noteworthy change besides the upgrade from ZOL 0.6.4 to 0.6.5 ? the /sys/module/zfs/parameters/zfs_deadman_synctime_ms setting comes to mind, however I don't know if that setting also has the effect to lead the system to "panic" on ZFSonLinux
it could be worth raising the value of that setting, however according to the manual it only would log an zevent (on ZOL only ?) referencing #471 Reduce timeout on disks #471 and some more: https://www.illumos.org/issues/1553 ZFS should not trust the layers underneath regarding drive timeouts/failure http://serverfault.com/questions/682061/very-irregular-disk-write-performance-and-repeated-sata-timeouts Very irregular disk write performance and repeated SATA timeouts [closed] http://www.spinics.net/lists/linux-ide/msg50979.html [PATCH] libata: increase the timeout when setting transfer mode The question is what could have caused this change from 0.6.4 to 0.6.5 |
0xFelix
commented
Sep 21, 2015
|
Hi, yes my disks are all internal SATA3. Besides upgrading ZFS I did not change anything. I will try changing that value later. |
mountassir
commented
Sep 21, 2015
|
Same here, I have been using a script (hdparn -Y ...) to spin down the drives when idle for a couple of years now and it has been working as expected. If the pool is not accessed for a while all the drives spin down and if I access the pool I could hear the drives spinning up one after the other and within few seconds the pool is live and accessible. Few days ago I was prompt to upgrade to the latest ZFS build, the drives still spin down correctly after the update by when I try to access the pool I see CPU usage go 100% in htop and the system then freezes after a couple of seconds. The only thing left to do after is a hard reset. Drives not in the ZFS pool still spin down/up without any issues, so I am guessing this is specific to ZFS. OS: Ubuntu 14.04.3 server |
gbooker
commented
Sep 21, 2015
|
I should also add that in my configuration nothing changed between the instances where it ran fine and when it was locking up continually except for the ZFS version. The kernel version didn't even change and the USB mirror pool was present for about a year prior without incident. @kernelOfTruth I'm not sure if those values would affect anything. If I'm reading it correctly, it sets a timeout of 1000 seconds (nearly 17 minutes) before it gives up on the drive and panics. As I said, when I had my lockups, one of the USB disks had spun up and it was in a few seconds, not on the order of minutes. Also I was present for two of these events and noticed CPU cores hitting 100% utilization within seconds of access to the pool and successive cores also becoming unavailable with the entire system locked within 30 or so seconds from the initial point of access. I suspect other processes are hitting a mutex which is locked and never released resulting in a pause that never resumes. I've not dug into the code enough to speak to what changed between 0.6.4 and 0.6.5 but everything I've seen points to that as the culprit. It could also be in SPL instead of ZFS. |
This issue surprises me because nothing noteworthy changed between 0.6.4 and 0.6.5 in this regard. ZFS has never done anything special to either explicitly spin-up or spin-down the drives. Spinning down is left to the standard Linux utilities, and the drives should automatically spin-up when ZFS issues an I/O to them which they need to service. Does anyone have any additional debugging they can provide? A back trace from the console perhaps? |
behlendorf
added
the
Bug - Point Release
label
Sep 22, 2015
behlendorf
added this to the 0.7.0 milestone
Sep 22, 2015
0xFelix
commented
Sep 23, 2015
|
I would gladly help you, but could you tell me how to make such a backtrace? |
behlendorf
modified the milestones:
0.6.5.2,
0.7.0
Sep 23, 2015
behlendorf
added
Bug - Minor
and removed
Bug - Point Release
labels
Sep 23, 2015
|
@kernelOfTruth thanks we may have what we need in #3817. @0xFelix the stacks in #3817 suggest that this might be caused by getting one of the IO threads wedged waiting for an I/O that will now never complete. This might lead to a more severe issue than in the past because we're more aggressive about managing the number of running threads. If you're able to reproduce this could you try setting the module option |
added a commit
that referenced
this issue
Sep 25, 2015
|
Resolved by 5592404 which will be cherry-picked in to 0.6.5.2 release. |
behlendorf
closed this
Sep 25, 2015
mountassir
referenced this issue
Sep 30, 2015
Closed
0.6.5.1 - I/O timeout during disk spin up #3856
added a commit
that referenced
this issue
Sep 30, 2015
Bronek
referenced this issue
Oct 2, 2015
Closed
0.6.5 regression: zpool import -d /dev/disk/by-id hangs #3866
0xFelix
commented
Oct 7, 2015
|
@behlendorf The system freezes are gone after updating to 0.6.5.2 but I'm getting these now, I'm sure the disks are 100% OK!?
|
|
@0xFelix are you sure ?
just do a few to be really sure to be honest: that looks suspiciously like an error of libata driver, controller, cable, etc and rather not failing drives if yes, it's really bad luck (but I doubt it) fingers crossed |
0xFelix
commented
Oct 7, 2015
|
@kernelOfTruth I guess it is unlikely that all 6 disks died at the same time?! The errors come from sd[b-g] ... all disks that are in this pool. Never had any problems before 0.6.5. |
0xFelix
commented
Oct 7, 2015
|
@kernelOfTruth Tried |
|
@0xFelix yeah, that's what I meant, it's highly unlikely just found #3212 again where "block: remove artifical max_hw_sectors cap" was mentioned (3.19+) ensure that you upgrade to at least 3.19.8 which includes https://lkml.org/lkml/2015/8/27/712 , http://www.gossamer-threads.com/lists/linux/kernel/2219390?page=last (that should be covered by recent Ubuntu system updates ?)
|
|
need more info on the disks
search terms suggest it could be related to NCQ timeout (seagate firmware bug), USB3.0, and other factors USB driver doesn't apply since yours are SATA-connected ... |
0xFelix
commented
Oct 7, 2015
|
@kernelOfTruth I'm on |
0xFelix
commented
Oct 7, 2015
|
@kernelOfTruth Seagate NCQ timeout bug sounds plausible... 5 of these disks are ST3000DM001, but disk |
|
@0xFelix alright, quick "fix" would then be to boot the kernel via
that should disable NCQ like e.g. so:
Have backups ready, run a few S.M.A.R.T. tests (short, conveyance [if applicable], long, offline) your output mentions sdb, sdc, sdd, sde, sdf, sdg look for further indications of error messages in dmesg output (also during boot) that smells really fishy https://forums.gentoo.org/viewtopic-t-969756.html?sid=92d287eb3cdf6d9ddb248fe941a7d11b , http://unix.stackexchange.com/questions/99553/does-a-bad-sector-indicate-a-failing-disk consider further drive firmware issues with e.g. ALPM (some drives have issues with lower power states - so setting to max_performance should be your best bet if not already set), the cables are fine ? PSU ? power connection ? well, this topic somewhat appears to go beyond this issue entry but it would be still good to know if there was some underlying problem with 3.19+ kernels and newer ZoL |
0xFelix
commented
Oct 7, 2015
|
@kernelOfTruth did that work? Not sure if the disks on my LSI2008 card got ncq disabled too...
|
|
@0xFelix it doesn't look like it's disabled, but I'm not familiar with that controller since 2011 that capability should be existent to disable NCQ (http://markmail.org/message/b5gp7jaon47zbrsq) hm, so it could also be an issue with NCQ (the drives ?) and /or the firmware of that controller
indicates that it was applied to at least one drive - can't see which one that is due to the previous output being missing from your paste |
0xFelix
commented
Oct 7, 2015
|
@kernelOfTruth |
first hit searching for http://hwraid.le-vert.net/wiki/LSIFusionMPT the tools are https://bugs.launchpad.net/ubuntu/+source/ecs/+bug/599830 suggest updating the firmware disabling for drives: |
0xFelix
commented
Oct 7, 2015
|
Seems like those utils are for the older mptsas driver, not mpt2sas. I will try to update the firmware in the next days... |
|
@0xFelix have backups ready, just in case ... |
that seems to be included by Linux 3.19.8-ckt7 according to http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.19.y which is the latest kernel (september 25th, it depends whether the packages are already available) can't easily see whether Linux 3.19.8-ckt7 equals 3.19.0-30-generic that's one nontransparent naming scheme http://www.ubuntuupdates.org/package/canonical_kernel_team/vivid/main/base/linux does one need to enable extended stable kernel ppas ? Ubuntu /sigh |
0xFelix
commented
Oct 8, 2015
|
@kernelOfTruth I examined the situation a bit further. The errors occured again at exactly 3am, when a periodic task called I don't think 3.19.8-ckt7 equals 3.19.0-30-generic. Well... maybe I have to file another Ubuntu bug. |
|
@0xFelix so it's either a Ubuntu-specific thing (?), a Linux kernel harddrive timeout issue or that ZFS doesn't get notified about the drives not being ready (there appears to be an issue - but haven't seen exactly what or why it would (not) trigger the notification in the first place (?)) can you see what timeout is set on the disks and potentially raise it ? https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/task_controlling-scsi-command-timer-onlining-devices.html are those disks in standby or sleeping ? according to hdparm the latter is deeper and might cause the delay to be ever bigger:
|
0xFelix
commented
Oct 11, 2015
|
@kernelOfTruth Tried to raise the timeout... the errors still occur. The state the drives are in is equal to I guess it is a kernel bug then? Also: According to I file a bug report for ubuntu now. |
|
@0xFelix yes, might be I hope the devs or people on launchpad can offer some insight into this issue |
|
@0xFelix the issue your seeing definitely looks like a problem occurring below ZFS in the stack. Those read I/O errors are coming from SCSI driver. ZFS issued a read due to the |
0xFelix
commented
Oct 12, 2015
|
@behlendorf Seems like SCSI yeah, but don't worry, thank you for your help and thank you for the great zfsonlinux project! :-) I filed a bug in launchpad, let's see what the ubuntu devs have to say. ;-) |
0xFelix
commented
Nov 21, 2015
|
@kernelOfTruth @behlendorf I've rebuilt my server with Debian 8 in the hope things would get better... but.. exactly the same bug on Debian as well. I again begin to think it has something to do with ZFS, executing |
|
@0xFelix that doesn't look like an issue related to ZFS is e.g. I put the disks to sleep by
then checking the state, even after hours later:
Any progress on the bug you filed at launchpad ? |
0xFelix
commented
Nov 21, 2015
|
Testing like you did by only using hdparm does not produce the error. Putting the disk to sleep, the writing something to the pool and then checking a disks status does produce the error. The confirmed the error but no further action followed until now. |
0xFelix
commented
Nov 24, 2015
|
@kernelOfTruth I guess I found my bug... http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg14937.html Apart from not outputting "Device not ready" this seems to be exactly my bug. So it is no ZFS problem... The id for the commit including the fix is d3ed1731ba6bca32ac0ac96377cae8fd735e16e6, it should have been included since mainline kernel 3.4.11. Let's see if Debian and Ubuntu both included this fix in their kernel sources... |
|
@0xFelix that commit ID seems to be dead or meanwhile invalid: The new one is: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/scsi_error.c?id=14216561e164671ce147458653b1fea06a4ada1e [SCSI] Fix 'Device not ready' issue on mpt2sas https://www.redhat.com/archives/dm-devel/2014-September/msg00175.html Re: [dm-devel] [PATCH 1/1] multipath-tools: Change path checker for IBM IPR devices https://lkml.org/lkml/2014/11/16/122 [GIT PULL] SCSI fixes for 3.18-rc4 I believe I have observed similar behavior with an external USB 3.0 HDD enclosure which I no longer use ... |
kobuki
commented
Mar 7, 2016
|
I'm experiencing similar behavior. I've recently upgraded my ZFS VM from Deabian 7.x to 8.3 along with the ZFS stack (0.6.4-1.2-1-wheezy -> 0.6.5.2-2-wheezy). I'm having my disks sleep after 30 minutes of inactivity (hdparm -y 242). When they're woken up by activity on the pool, I see the following errors in dmesg for all 6 disks of the raidz2 pool:
But, when I simply do the fiollowing with the sleeping disks:
then there are NO errors appearing at all. It appears that ZFS does something differently than simply trying to read blocks from the disk. These errors never appeared with the old system, with kernel 3.2 and ZFS 0.6.4. I'm using an LSI2008 HBA redirected to a KVM VM using MMIO. Apart form these errors appearing in the logs on wakeup, I experience no odd behavior on my pool. The I/O errors produced by the lower layers do not seem to affect the pool I/O error counters, even though ZED reports them in the syslog and in email. I'm no expert at all, but one might think that this might be the solution to the problem, as linked by @kernelOfTruth previously. I'll try to patch my kernel later to see what happens (using 4.3.0-0.bpo.1-amd64 now). |
0xFelix
commented
Jul 11, 2016
|
@kobuki Did you find a solution to the problem? Did the mentioned patch work? |
kobuki
commented
Jul 11, 2016
|
Sorry, I didn't try it. But I did a few driver upgrades since then and I haven't seen the errors in syslog recently. I will check again. But I never had any problems aside these worrysome log entries. |
0xFelix
commented
Jul 11, 2016
|
What kernel and driver are you running currently? Still Debian 8? |
This was referenced Oct 10, 2016
hoppel118
commented
Oct 10, 2016
|
Hey guys, the same for me, have a look at these issues: #3785 Here I reported some things about my situation and my environment: Greetings Hoppel118 |
thomasvoigt
commented
Dec 6, 2016
|
Hi there! I can replicate this behavior on Linux-4.8.11 (vanilla) + LSI SAS2008 (mpt3sas) + dmraid + xfs. HTH and best regards, |
night199uk
commented
Jun 7, 2017
|
+1 - kernel 4.8.0-54 (Ubuntu) + LSI SAS2008 (mpt3sas) + zfs on Seagate ST8000DM002 |
|
I don't see any mention of the |
0xFelix commentedSep 16, 2015
I observed the following, not quite 100% sure if the problem is the spindown:
My system freezes when zfs waits for disks to spin up after updating to 0.65.
I have a RAIDZ2 with 6 disks, that go to sleep after 1 hour of idling.
With 0.64 everything worked kind of fine, waiting for the data of the pool to become ready after the disks spun up already took quite an amount of time, but it worked.
Now my system freezes when zfs waits for disks to spin up and the only thing I can do is hard reset the system. Preventing the disks from going to sleep currently works as a workaround.
Does ZFS not support the spindown of disks?!
System Info:
E3-1225 v3 Processor
16GB of ECC RAM
M1015 P20 IT
6x 3TB drives
ZFS version: 0.65
OS: Ubuntu 14.04.3 LTS (Kernel 3.19)