Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool unexpectedly locked up - zfs 0.6.5-32_g256fa98 #4025

Closed
wipash opened this issue Nov 19, 2015 · 5 comments
Closed

Pool unexpectedly locked up - zfs 0.6.5-32_g256fa98 #4025

wipash opened this issue Nov 19, 2015 · 5 comments
Labels
Status: Inactive Not being actively updated

Comments

@wipash
Copy link

wipash commented Nov 19, 2015

During normal operation today one of my pools locked up entirely. Reading from the pool just stalled the process. The host was still functioning otherwise.

top showed txg_quiesce and txg_sync using all available CPU.

I've attached the kernel logs.
I'm don't fully understand how to interpret them, so I'm not entirely sure if it's ZFS at fault here or the underlying storage (HighPoint RocketRAID r750).
This system has previously locked up whilst under load from a scrub.
The system is running stable so far after a reboot.

This is my first foray into ZFS debugging, so please let me know what other information I should provide.

This server runs Samba 4.2.5 to serve files to a large number of Windows guests.

System version info:

# cat /etc/centos-release
CentOS Linux release 7.1.1503 (Core)

# modinfo zfs
filename:       /lib/modules/3.10.0-229.20.1.el7.x86_64/weak-updates/zfs.ko
version:        0.6.5-32_g256fa98

# modinfo spl
filename:       /lib/modules/3.10.0-229.20.1.el7.x86_64/weak-updates/spl.ko
version:        0.6.5-11_ge7b75d9

# uname -sr
Linux 3.10.0-229.20.1.el7.x86_64

# smbd -V
Version 4.2.5-SerNet-RedHat-19.el7

System CPU/RAM:

# cat /proc/meminfo
MemTotal:       16233688 kB
MemFree:        11734276 kB
MemAvailable:   11882608 kB
Buffers:           33032 kB
Cached:           283432 kB
SwapCached:            0 kB
Active:           263268 kB
Inactive:         194960 kB
Active(anon):     142460 kB
Inactive(anon):    25656 kB
Active(file):     120808 kB
Inactive(file):   169304 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8347644 kB
SwapFree:        8347644 kB
Dirty:              3172 kB
Writeback:             0 kB
AnonPages:        141868 kB
Mapped:            56064 kB
Shmem:             26268 kB
Slab:            1576048 kB
SReclaimable:      54328 kB
SUnreclaim:      1521720 kB
KernelStack:        7696 kB
PageTables:        35800 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16464488 kB
Committed_AS:     898148 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     2508888 kB
VmallocChunk:   34357135996 kB
HardwareCorrupted:     0 kB
AnonHugePages:     32768 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       92420 kB
DirectMap2M:     3018752 kB
DirectMap1G:    15728640 kB


# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping        : 4
microcode       : 0x428
cpu MHz         : 1537.511
cache size      : 15360 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips        : 4200.12
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Pool details:

# zpool status wlg_live_2
  pool: wlg_live_2
 state: ONLINE
  scan: scrub repaired 0 in 18h33m with 0 errors on Fri Nov 13 18:19:19 2015
config:

        NAME                                             STATE     READ WRITE CKSUM
        wlg_live_2                                       ONLINE       0     0     0
          ata-WDC_WD4000FYYZ-01UL1B2_WD-WCC135xxxxxx     ONLINE       0     0     0
          ata-WDC_WD4000FYYZ-01UL1B2_WD-WCC135xxxxxx     ONLINE       0     0     0
          ata-WDC_WD4000FYYZ-01UL1B2_WD-WCC135xxxxxx     ONLINE       0     0     0
          ata-WDC_WD4000FYYZ-01UL1B2_WD-WCC135xxxxxx     ONLINE       0     0     0
          ata-WDC_WD4000FYYZ-01UL1B2_WD-WMC130xxxxxx     ONLINE       0     0     0
        cache
          ata-Samsung_SSD_850_PRO_256GB_S1SUNSAG2xxxxxx  ONLINE       0     0     0

errors: No known data errors

# zpool get all wlg_live_2
NAME        PROPERTY                    VALUE                       SOURCE
wlg_live_2  size                        18.1T                       -
wlg_live_2  capacity                    22%                         -
wlg_live_2  altroot                     -                           default
wlg_live_2  health                      ONLINE                      -
wlg_live_2  guid                        812656477794199399          default
wlg_live_2  version                     -                           default
wlg_live_2  bootfs                      -                           default
wlg_live_2  delegation                  on                          default
wlg_live_2  autoreplace                 off                         default
wlg_live_2  cachefile                   -                           default
wlg_live_2  failmode                    wait                        default
wlg_live_2  listsnapshots               off                         default
wlg_live_2  autoexpand                  off                         default
wlg_live_2  dedupditto                  0                           default
wlg_live_2  dedupratio                  1.00x                       -
wlg_live_2  free                        14.1T                       -
wlg_live_2  allocated                   4.00T                       -
wlg_live_2  readonly                    off                         -
wlg_live_2  ashift                      0                           default
wlg_live_2  comment                     -                           default
wlg_live_2  expandsize                  -                           -
wlg_live_2  freeing                     0                           default
wlg_live_2  fragmentation               12%                         -
wlg_live_2  leaked                      0                           default
wlg_live_2  feature@async_destroy       enabled                     local
wlg_live_2  feature@empty_bpobj         active                      local
wlg_live_2  feature@lz4_compress        active                      local
wlg_live_2  feature@spacemap_histogram  active                      local
wlg_live_2  feature@enabled_txg         active                      local
wlg_live_2  feature@hole_birth          active                      local
wlg_live_2  feature@extensible_dataset  enabled                     local
wlg_live_2  feature@embedded_data       active                      local
wlg_live_2  feature@bookmarks           enabled                     local
wlg_live_2  feature@filesystem_limits   enabled                     local
wlg_live_2  feature@large_blocks        enabled                     local


# zfs get all wlg_live_2
NAME        PROPERTY              VALUE                  SOURCE
wlg_live_2  type                  filesystem             -
wlg_live_2  creation              Tue Jul 28 20:17 2015  -
wlg_live_2  used                  4.00T                  -
wlg_live_2  available             13.6T                  -
wlg_live_2  referenced            21K                    -
wlg_live_2  compressratio         1.31x                  -
wlg_live_2  mounted               yes                    -
wlg_live_2  quota                 none                   default
wlg_live_2  reservation           none                   default
wlg_live_2  recordsize            128K                   default
wlg_live_2  mountpoint            /wlg_live_2            default
wlg_live_2  sharenfs              off                    default
wlg_live_2  checksum              on                     default
wlg_live_2  compression           lz4                    local
wlg_live_2  atime                 on                     default
wlg_live_2  devices               on                     default
wlg_live_2  exec                  on                     default
wlg_live_2  setuid                on                     default
wlg_live_2  readonly              off                    default
wlg_live_2  zoned                 off                    default
wlg_live_2  snapdir               hidden                 default
wlg_live_2  aclinherit            passthrough            local
wlg_live_2  canmount              on                     default
wlg_live_2  xattr                 sa                     local
wlg_live_2  copies                1                      default
wlg_live_2  version               5                      -
wlg_live_2  utf8only              off                    -
wlg_live_2  normalization         none                   -
wlg_live_2  casesensitivity       sensitive              -
wlg_live_2  vscan                 off                    default
wlg_live_2  nbmand                on                     local
wlg_live_2  sharesmb              off                    default
wlg_live_2  refquota              none                   default
wlg_live_2  refreservation        none                   default
wlg_live_2  primarycache          all                    default
wlg_live_2  secondarycache        all                    default
wlg_live_2  usedbysnapshots       0                      -
wlg_live_2  usedbydataset         21K                    -
wlg_live_2  usedbychildren        4.00T                  -
wlg_live_2  usedbyrefreservation  0                      -
wlg_live_2  logbias               latency                default
wlg_live_2  dedup                 off                    default
wlg_live_2  mlslabel              none                   default
wlg_live_2  sync                  standard               default
wlg_live_2  refcompressratio      1.00x                  -
wlg_live_2  written               21K                    -
wlg_live_2  logicalused           5.23T                  -
wlg_live_2  logicalreferenced     10.5K                  -
wlg_live_2  filesystem_limit      none                   default
wlg_live_2  snapshot_limit        none                   default
wlg_live_2  filesystem_count      none                   default
wlg_live_2  snapshot_count        none                   default
wlg_live_2  snapdev               hidden                 default
wlg_live_2  acltype               posixacl               local
wlg_live_2  context               none                   default
wlg_live_2  fscontext             none                   default
wlg_live_2  defcontext            none                   default
wlg_live_2  rootcontext           none                   default
wlg_live_2  relatime              on                     local
wlg_live_2  redundant_metadata    all                    default
wlg_live_2  overlay               off                    default

Arcstats, taken after reboot, not sure if it's useful

# cat /proc/spl/kstat/zfs/arcstats
6 1 0x01 91 4368 1023664061 481242097065
name                            type data
hits                            4    756252
misses                          4    298646
demand_data_hits                4    33300
demand_data_misses              4    3679
demand_metadata_hits            4    552916
demand_metadata_misses          4    264450
prefetch_data_hits              4    20
prefetch_data_misses            4    3891
prefetch_metadata_hits          4    170016
prefetch_metadata_misses        4    26626
mru_hits                        4    178628
mru_ghost_hits                  4    0
mfu_hits                        4    407588
mfu_ghost_hits                  4    0
deleted                         4    914
mutex_miss                      4    0
evict_skip                      4    16023
evict_not_enough                4    0
evict_l2_cached                 4    0
evict_l2_eligible               4    13523968
evict_l2_ineligible             4    8192
evict_l2_skip                   4    0
hash_elements                   4    94655
hash_elements_max               4    94655
hash_collisions                 4    2459
hash_chains                     4    2179
hash_chain_max                  4    2
p                               4    4155824128
c                               4    8311648256
c_min                           4    33554432
c_max                           4    8311648256
size                            4    1959759696
hdr_size                        4    40193808
data_size                       4    936477696
metadata_size                   4    814782464
other_size                      4    168305728
anon_size                       4    12702720
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    987219968
mru_evictable_data              4    334465024
mru_evictable_metadata          4    488659456
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    751337472
mfu_evictable_data              4    601750016
mfu_evictable_metadata          4    37809664
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    298047
l2_feeds                        4    399
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    818417152
l2_writes_sent                  4    398
l2_writes_done                  4    398
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    1
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    1073621504
l2_asize                        4    807250944
l2_hdr_size                     4    0
l2_compress_successes           4    18993
l2_compress_zeros               4    0
l2_compress_failures            4    5178
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    1023282000
arc_meta_limit                  4    6233736192
arc_meta_max                    4    1023283232
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    259735552

Kernel logs

Nov 19 14:35:41 whg-sto-102 kernel: INFO: task smbd:3641 blocked for more than 120 seconds.
Nov 19 14:35:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:35:41 whg-sto-102 kernel: smbd            D ffff88047fd73680     0  3641   4971 0x00000080
Nov 19 14:35:41 whg-sto-102 kernel: ffff8801f9b83c20 0000000000000086 ffff8804583cad80 ffff8801f9b83fd8
Nov 19 14:35:41 whg-sto-102 kernel: ffff8801f9b83fd8 ffff8801f9b83fd8 ffff8804583cad80 ffff8803c2404b70
Nov 19 14:35:41 whg-sto-102 kernel: ffff8803c2404b74 ffff8804583cad80 00000000ffffffff ffff8803c2404b78
Nov 19 14:35:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:35:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b




Nov 19 14:37:41 whg-sto-102 kernel: INFO: task smbd:3641 blocked for more than 120 seconds.
Nov 19 14:37:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:37:41 whg-sto-102 kernel: smbd            D ffff88047fd73680     0  3641   4971 0x00000080
Nov 19 14:37:41 whg-sto-102 kernel: ffff8801f9b83c20 0000000000000086 ffff8804583cad80 ffff8801f9b83fd8
Nov 19 14:37:41 whg-sto-102 kernel: ffff8801f9b83fd8 ffff8801f9b83fd8 ffff8804583cad80 ffff8803c2404b70
Nov 19 14:37:41 whg-sto-102 kernel: ffff8803c2404b74 ffff8804583cad80 00000000ffffffff ffff8803c2404b78
Nov 19 14:37:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
Nov 19 14:37:41 whg-sto-102 kernel: INFO: task smbd:12496 blocked for more than 120 seconds.
Nov 19 14:37:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:37:41 whg-sto-102 kernel: smbd            D ffff88047fc33680     0 12496   4971 0x00000080
Nov 19 14:37:41 whg-sto-102 kernel: ffff8803377fbc20 0000000000000082 ffff88046283b8e0 ffff8803377fbfd8
Nov 19 14:37:41 whg-sto-102 kernel: ffff8803377fbfd8 ffff8803377fbfd8 ffff88046283b8e0 ffff8803c2404b70
Nov 19 14:37:41 whg-sto-102 kernel: ffff8803c2404b74 ffff88046283b8e0 00000000ffffffff ffff8803c2404b78
Nov 19 14:37:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
Nov 19 14:37:41 whg-sto-102 kernel: INFO: task smbd:15956 blocked for more than 120 seconds.
Nov 19 14:37:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:37:41 whg-sto-102 kernel: smbd            D ffff88047fd13680     0 15956   4971 0x00000080
Nov 19 14:37:41 whg-sto-102 kernel: ffff8802661a3c18 0000000000000086 ffff88031c3ccfa0 ffff8802661a3fd8
Nov 19 14:37:41 whg-sto-102 kernel: ffff8802661a3fd8 ffff8802661a3fd8 ffff88031c3ccfa0 ffff88031c3ccfa0
Nov 19 14:37:41 whg-sto-102 kernel: ffff88030eef23d8 fffffffeffffffff ffff88030eef23e0 ffff88023a6b0c00
Nov 19 14:37:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff816096a9>] schedule+0x29/0x70
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff8160b185>] rwsem_down_read_failed+0xf5/0x165
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff812e2c54>] call_rwsem_down_read_failed+0x14/0x30
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff81608a60>] ? down_read+0x20/0x30
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffffa05066df>] zpl_xattr_get+0x5f/0x120 [zfs]
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffffa0506804>] zpl_xattr_user_get+0x64/0x90 [zfs]
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811eaccc>] generic_getxattr+0x4c/0x70
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811eb118>] vfs_getxattr+0x88/0xb0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811eb25b>] getxattr+0xab/0x1d0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d699d>] ? putname+0x3d/0x60
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811d7b42>] ? user_path_at_empty+0x72/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811c8d11>] ? __sb_end_write+0x31/0x60
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff811ebf84>] SyS_getxattr+0x64/0xc0
Nov 19 14:37:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b


Nov 19 14:39:41 whg-sto-102 kernel: INFO: task smbd:3641 blocked for more than 120 seconds.
Nov 19 14:39:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:39:41 whg-sto-102 kernel: smbd            D ffff88047fd73680     0  3641   4971 0x00000080
Nov 19 14:39:41 whg-sto-102 kernel: ffff8801f9b83c20 0000000000000086 ffff8804583cad80 ffff8801f9b83fd8
Nov 19 14:39:41 whg-sto-102 kernel: ffff8801f9b83fd8 ffff8801f9b83fd8 ffff8804583cad80 ffff8803c2404b70
Nov 19 14:39:41 whg-sto-102 kernel: ffff8803c2404b74 ffff8804583cad80 00000000ffffffff ffff8803c2404b78
Nov 19 14:39:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
Nov 19 14:39:41 whg-sto-102 kernel: INFO: task smbd:12496 blocked for more than 120 seconds.
Nov 19 14:39:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:39:41 whg-sto-102 kernel: smbd            D ffff88047fc33680     0 12496   4971 0x00000080
Nov 19 14:39:41 whg-sto-102 kernel: ffff8803377fbc20 0000000000000082 ffff88046283b8e0 ffff8803377fbfd8
Nov 19 14:39:41 whg-sto-102 kernel: ffff8803377fbfd8 ffff8803377fbfd8 ffff88046283b8e0 ffff8803c2404b70
Nov 19 14:39:41 whg-sto-102 kernel: ffff8803c2404b74 ffff88046283b8e0 00000000ffffffff ffff8803c2404b78
Nov 19 14:39:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
Nov 19 14:39:41 whg-sto-102 kernel: INFO: task smbd:15956 blocked for more than 120 seconds.
Nov 19 14:39:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:39:41 whg-sto-102 kernel: smbd            D ffff88047fd13680     0 15956   4971 0x00000080
Nov 19 14:39:41 whg-sto-102 kernel: ffff8802661a3c18 0000000000000086 ffff88031c3ccfa0 ffff8802661a3fd8
Nov 19 14:39:41 whg-sto-102 kernel: ffff8802661a3fd8 ffff8802661a3fd8 ffff88031c3ccfa0 ffff88031c3ccfa0
Nov 19 14:39:41 whg-sto-102 kernel: ffff88030eef23d8 fffffffeffffffff ffff88030eef23e0 ffff88023a6b0c00
Nov 19 14:39:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816096a9>] schedule+0x29/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff8160b185>] rwsem_down_read_failed+0xf5/0x165
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff812e2c54>] call_rwsem_down_read_failed+0x14/0x30
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81608a60>] ? down_read+0x20/0x30
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffffa05066df>] zpl_xattr_get+0x5f/0x120 [zfs]
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffffa0506804>] zpl_xattr_user_get+0x64/0x90 [zfs]
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811eaccc>] generic_getxattr+0x4c/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811eb118>] vfs_getxattr+0x88/0xb0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811eb25b>] getxattr+0xab/0x1d0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d699d>] ? putname+0x3d/0x60
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7b42>] ? user_path_at_empty+0x72/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811c8d11>] ? __sb_end_write+0x31/0x60
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811ebf84>] SyS_getxattr+0x64/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
Nov 19 14:39:41 whg-sto-102 kernel: INFO: task smbd:16287 blocked for more than 120 seconds.
Nov 19 14:39:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:39:41 whg-sto-102 kernel: smbd            D ffff88047fc73680     0 16287   4971 0x00000080
Nov 19 14:39:41 whg-sto-102 kernel: ffff88023084bc20 0000000000000086 ffff88031c3cdb00 ffff88023084bfd8
Nov 19 14:39:41 whg-sto-102 kernel: ffff88023084bfd8 ffff88023084bfd8 ffff88031c3cdb00 ffff8803c2404b70
Nov 19 14:39:41 whg-sto-102 kernel: ffff8803c2404b74 ffff88031c3cdb00 00000000ffffffff ffff8803c2404b78
Nov 19 14:39:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811c753e>] ? do_readv_writev+0x19e/0x260
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
Nov 19 14:39:41 whg-sto-102 kernel: INFO: task smbd:16327 blocked for more than 120 seconds.
Nov 19 14:39:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:39:41 whg-sto-102 kernel: smbd            D ffff88047fcd3680     0 16327   4971 0x00000080
Nov 19 14:39:41 whg-sto-102 kernel: ffff88005a867c18 0000000000000082 ffff880461c971c0 ffff88005a867fd8
Nov 19 14:39:41 whg-sto-102 kernel: ffff88005a867fd8 ffff88005a867fd8 ffff880461c971c0 ffff880461c971c0
Nov 19 14:39:41 whg-sto-102 kernel: ffff88030eef23d8 ffffffffffffffff ffff88030eef23e0 ffff880240073000
Nov 19 14:39:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff816096a9>] schedule+0x29/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff8160b185>] rwsem_down_read_failed+0xf5/0x165
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff812e2c54>] call_rwsem_down_read_failed+0x14/0x30
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81608a60>] ? down_read+0x20/0x30
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffffa05066df>] zpl_xattr_get+0x5f/0x120 [zfs]
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffffa0506910>] zpl_xattr_security_get+0x50/0x70 [zfs]
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811eaccc>] generic_getxattr+0x4c/0x70
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811eb118>] vfs_getxattr+0x88/0xb0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811eb25b>] getxattr+0xab/0x1d0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d699d>] ? putname+0x3d/0x60
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811d7b42>] ? user_path_at_empty+0x72/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811138fd>] ? call_rcu_sched+0x1d/0x20
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff811ebf84>] SyS_getxattr+0x64/0xc0
Nov 19 14:39:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b



Nov 19 14:41:41 whg-sto-102 kernel: INFO: task smbd:3641 blocked for more than 120 seconds.
Nov 19 14:41:41 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 19 14:41:41 whg-sto-102 kernel: smbd            D ffff88047fd73680     0  3641   4971 0x00000080
Nov 19 14:41:41 whg-sto-102 kernel: ffff8801f9b83c20 0000000000000086 ffff8804583cad80 ffff8801f9b83fd8
Nov 19 14:41:41 whg-sto-102 kernel: ffff8801f9b83fd8 ffff8801f9b83fd8 ffff8804583cad80 ffff8803c2404b70
Nov 19 14:41:41 whg-sto-102 kernel: ffff8803c2404b74 ffff8804583cad80 00000000ffffffff ffff8803c2404b78
Nov 19 14:41:41 whg-sto-102 kernel: Call Trace:
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff816016b3>] lookup_slow+0x33/0xa7
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811d4aa3>] path_lookupat+0x773/0x7a0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811d7ba1>] user_path_at+0x11/0x20
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10
Nov 19 14:41:41 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
@wipash
Copy link
Author

wipash commented Nov 19, 2015

Here is an extract from the logs from when the pool locked up during a scrub. Potentially this needs to be filed as a seperate bug?

Nov  4 08:35:00 whg-sto-102 kernel: VERIFY(0 == zfs_acl_node_read(dzp, B_TRUE, &paclp, B_FALSE)) failed
Nov  4 08:35:00 whg-sto-102 kernel: PANIC at zfs_acl.c:1840:zfs_acl_ids_create()
Nov  4 08:35:00 whg-sto-102 kernel: Showing stack for process 19374
Nov  4 08:35:00 whg-sto-102 kernel: CPU: 9 PID: 19374 Comm: smbd Tainted: PF          O--------------   3.10.0-229.7.2.el7.x86_64 #1
Nov  4 08:35:00 whg-sto-102 kernel: Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.0a 12/05/2013
Nov  4 08:35:00 whg-sto-102 kernel: ffffffffa053a7a1 000000007f54bca3 ffff88042c2e38f8 ffffffff81604386
Nov  4 08:35:00 whg-sto-102 kernel: ffff88042c2e3908 ffffffffa018b764 ffff88042c2e3a90 ffffffffa018b82f
Nov  4 08:35:00 whg-sto-102 kernel: ffff88042c2e3978 ffffffff00000028 ffff88042c2e3aa0 ffff88042c2e3a40
Nov  4 08:35:00 whg-sto-102 kernel: Call Trace:
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff81604386>] dump_stack+0x19/0x1b
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa018b764>] spl_dumpstack+0x44/0x50 [spl]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa018b82f>] spl_panic+0xbf/0xf0 [spl]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa04ac10a>] ? sa_size+0xfa/0x110 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa04dc920>] ? zfs_acl_node_read.constprop.17+0xb0/0x310 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa04649ae>] ? dmu_buf_rele+0xe/0x10 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa04de80c>] zfs_acl_ids_create+0x6dc/0x720 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa04dd069>] ? zfs_zaccess_common+0x169/0x1a0 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa04f86fa>] zfs_create+0x20a/0x760 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa0515513>] zpl_create+0xc3/0x180 [zfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811d2a8d>] vfs_create+0xcd/0x130
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811d5ccf>] do_last+0xb8f/0x1270
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa02f9c62>] ? xfs_file_buffered_aio_write+0x232/0x260 [xfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811d6472>] path_openat+0xc2/0x490
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffffa02f9d96>] ? xfs_file_aio_write+0x106/0x150 [xfs]
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811d7c3b>] do_filp_open+0x4b/0xb0
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811e4697>] ? __alloc_fd+0xa7/0x130
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811c5933>] do_sys_open+0xf3/0x1f0
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff811c5a4e>] SyS_open+0x1e/0x20
Nov  4 08:35:00 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b

Nov  4 08:37:30 whg-sto-102 kernel: INFO: task smbd:19374 blocked for more than 120 seconds.
Nov  4 08:37:30 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  4 08:37:30 whg-sto-102 kernel: smbd            D ffff88047fd33680     0 19374  20785 0x00000080
Nov  4 08:37:30 whg-sto-102 kernel: ffff88042c2e38f8 0000000000000082 ffff880451f9ad80 ffff88042c2e3fd8
Nov  4 08:37:30 whg-sto-102 kernel: ffff88042c2e3fd8 ffff88042c2e3fd8 ffff880451f9ad80 ffffffffa053a7a1
Nov  4 08:37:30 whg-sto-102 kernel: 0000000000000730 ffffffffa0527240 ffffffffa054146c 0000000000000000
Nov  4 08:37:30 whg-sto-102 kernel: Call Trace:
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff816096a9>] schedule+0x29/0x70
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa018b855>] spl_panic+0xe5/0xf0 [spl]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa04ac10a>] ? sa_size+0xfa/0x110 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa04dc920>] ? zfs_acl_node_read.constprop.17+0xb0/0x310 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa04649ae>] ? dmu_buf_rele+0xe/0x10 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa04de80c>] zfs_acl_ids_create+0x6dc/0x720 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa04dd069>] ? zfs_zaccess_common+0x169/0x1a0 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa04f86fa>] zfs_create+0x20a/0x760 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa0515513>] zpl_create+0xc3/0x180 [zfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811d2a8d>] vfs_create+0xcd/0x130
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811d5ccf>] do_last+0xb8f/0x1270
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa02f9c62>] ? xfs_file_buffered_aio_write+0x232/0x260 [xfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811d6472>] path_openat+0xc2/0x490
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffffa02f9d96>] ? xfs_file_aio_write+0x106/0x150 [xfs]
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811d7c3b>] do_filp_open+0x4b/0xb0
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811e4697>] ? __alloc_fd+0xa7/0x130
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811c5933>] do_sys_open+0xf3/0x1f0
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff811c5a4e>] SyS_open+0x1e/0x20
Nov  4 08:37:30 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b

Nov  4 08:45:30 whg-sto-102 kernel: INFO: task smbd:2503 blocked for more than 120 seconds.
Nov  4 08:45:30 whg-sto-102 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  4 08:45:30 whg-sto-102 kernel: smbd            D ffff88047fc53680     0  2503  20785 0x00000080
Nov  4 08:45:30 whg-sto-102 kernel: ffff8801de043d70 0000000000000086 ffff8800775f8000 ffff8801de043fd8
Nov  4 08:45:30 whg-sto-102 kernel: ffff8801de043fd8 ffff8801de043fd8 ffff8800775f8000 ffff8802fd0a0008
Nov  4 08:45:30 whg-sto-102 kernel: ffff8802fd0a000c ffff8800775f8000 00000000ffffffff ffff8802fd0a0010
Nov  4 08:45:30 whg-sto-102 kernel: Call Trace:
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff8160a899>] schedule_preempt_disabled+0x29/0x70
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff816085e5>] __mutex_lock_slowpath+0xc5/0x1c0
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff81607a4f>] mutex_lock+0x1f/0x2f
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffffa04f9fb5>] zfs_getattr_fast+0x75/0x190 [zfs]
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffffa051485c>] zpl_getattr+0x4c/0x70 [zfs]
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff811cb3f6>] vfs_getattr+0x46/0x80
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff811cb525>] vfs_fstatat+0x75/0xc0
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff811cbb34>] SYSC_newfstatat+0x24/0x60
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff811c671a>] ? vfs_write+0x17a/0x1e0
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff811cbd7e>] SyS_newfstatat+0xe/0x10
Nov  4 08:45:30 whg-sto-102 kernel: [<ffffffff81614209>] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

@wipash thanks it looks like you posted everything we need to determine what happened. In your case the failed VERIFY is almost certainly responsible for the other hangs. When it hit the VERIFY it was holding a directory lock that those other threads are trying to acquire. My best guess is this is another side effect of a class of system attribute bugs which were fixed in recent releases.

You can apply the following patch to handle the error and avoid the panic.

diff --git a/module/zfs/zfs_acl.c b/module/zfs/zfs_acl.c
index a208dea..86c9154 100644
--- a/module/zfs/zfs_acl.c
+++ b/module/zfs/zfs_acl.c
@@ -1836,8 +1836,13 @@ zfs_acl_ids_create(znode_t *dzp, int flag, vattr_t *vap, 
                if (!(flag & IS_ROOT_NODE) && (S_ISDIR(ZTOI(dzp)->i_mode) &&
                    (dzp->z_pflags & ZFS_INHERIT_ACE)) &&
                    !(dzp->z_pflags & ZFS_XATTR)) {
-                       VERIFY(0 == zfs_acl_node_read(dzp, B_TRUE,
-                           &paclp, B_FALSE));
+                       error = zfs_acl_node_read(dzp, B_TRUE,
+                           &paclp, B_FALSE);
+                       if (error) {
+                               mutex_exit(&dzp->z_lock);
+                               mutex_exit(&dzp->z_acl_lock);
+                               return (error);
+                       }
                        acl_ids->z_aclp = zfs_acl_inherit(zsb,
                            vap->va_mode, paclp, acl_ids->z_mode, &need_chmod);
                        inherited = B_TRUE;

@wipash
Copy link
Author

wipash commented Nov 20, 2015

Thanks!
I've applied the patch, and I'll report back if it falls over again.
Cheers!

@hsepeng
Copy link
Contributor

hsepeng commented Jun 17, 2016

@wipash how about your following test result? did the same stuck error happen again? what's the fix of it

@wipash
Copy link
Author

wipash commented Jun 17, 2016

I'm currently running spl-0.6.5.7 / zfs-0.6.5.7, have not seen this issue again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated
Projects
None yet
Development

No branches or pull requests

4 participants
@behlendorf @wipash @hsepeng and others