-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pool unexpectedly locked up - zfs 0.6.5-32_g256fa98 #4025
Comments
Here is an extract from the logs from when the pool locked up during a scrub. Potentially this needs to be filed as a seperate bug?
|
@wipash thanks it looks like you posted everything we need to determine what happened. In your case the failed VERIFY is almost certainly responsible for the other hangs. When it hit the VERIFY it was holding a directory lock that those other threads are trying to acquire. My best guess is this is another side effect of a class of system attribute bugs which were fixed in recent releases. You can apply the following patch to handle the error and avoid the panic. diff --git a/module/zfs/zfs_acl.c b/module/zfs/zfs_acl.c
index a208dea..86c9154 100644
--- a/module/zfs/zfs_acl.c
+++ b/module/zfs/zfs_acl.c
@@ -1836,8 +1836,13 @@ zfs_acl_ids_create(znode_t *dzp, int flag, vattr_t *vap,
if (!(flag & IS_ROOT_NODE) && (S_ISDIR(ZTOI(dzp)->i_mode) &&
(dzp->z_pflags & ZFS_INHERIT_ACE)) &&
!(dzp->z_pflags & ZFS_XATTR)) {
- VERIFY(0 == zfs_acl_node_read(dzp, B_TRUE,
- &paclp, B_FALSE));
+ error = zfs_acl_node_read(dzp, B_TRUE,
+ &paclp, B_FALSE);
+ if (error) {
+ mutex_exit(&dzp->z_lock);
+ mutex_exit(&dzp->z_acl_lock);
+ return (error);
+ }
acl_ids->z_aclp = zfs_acl_inherit(zsb,
vap->va_mode, paclp, acl_ids->z_mode, &need_chmod);
inherited = B_TRUE; |
Thanks! |
@wipash how about your following test result? did the same stuck error happen again? what's the fix of it |
I'm currently running spl-0.6.5.7 / zfs-0.6.5.7, have not seen this issue again. |
During normal operation today one of my pools locked up entirely. Reading from the pool just stalled the process. The host was still functioning otherwise.
top
showed txg_quiesce and txg_sync using all available CPU.I've attached the kernel logs.
I'm don't fully understand how to interpret them, so I'm not entirely sure if it's ZFS at fault here or the underlying storage (HighPoint RocketRAID r750).
This system has previously locked up whilst under load from a scrub.
The system is running stable so far after a reboot.
This is my first foray into ZFS debugging, so please let me know what other information I should provide.
This server runs Samba 4.2.5 to serve files to a large number of Windows guests.
System version info:
System CPU/RAM:
Pool details:
Arcstats, taken after reboot, not sure if it's useful
Kernel logs
The text was updated successfully, but these errors were encountered: