Kernel panic from zio_decompress in lzjb_decompress under heavy I/O #2709

ChromaxS · 2014-09-17T02:53:13Z

I'm experiencing a crash on a system with 8GB of RAM when using compression. I've got a basic mirror setup (2 devices) with no log device but I do have a cache device.

I traced back the code history to where the cache started getting compressed regardless of the volume preferences and tried disabling caching on it by patching the ZFS driver to never compress on the cache device, but it still led to a crash. To do this, I patched include/zfs/sys/dmu_objset.h to have DMU_OS_IS_L2COMPRESSIBLE(os) always return false (0).

Finally, I simply removed the cache device and it seemed like things were stable for a bit but ultimately the system is still crashing.

At this point I'm running with a watchdog that checks for the specific "zio_decompress" kernel panic message to keep the system up... this is just a personal server that doesn't need to be reliable or anything. The watchdog is just so that the system recovers after the kernel panic.

At this point the work around I have is to reduce the amount of I/O. I can get the system to crash within an hour by rsync'ing at full speed to this system, but I can delay the inevitable (by around a few hours to a day) by rsync'ing with a bandwidth limit of 1M.

I've run memtest on this system for around 10 hours and not encountered any reports of bad RAM.

I'm running archlinux with a manually compiled kernel (vanilla with spl/zfs patches):
kernel tag v3.17-rc5 on commit 9e82bf014195d6f0054982c463575cdce24292be
SPL at f9bde4f
ZFS at 2d50158
SPL/ZFS are compiled into the kernel statically with no loaded modules. ZFS userland tools are compiled from the same commit.

Unfortunately due to the nature of the crash, I don't have too many logs of it happening, but they all pretty much end up in lzjb_decompres with zio_decompress at the top of the stack (note that this panic is vanilla and does not contain the no-cache compression patch I made):

[ 3055.450251] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 3055.450308] IP: [<ffffffff815ed3ff>] memset+0x1f/0xb0
[ 3055.450343] PGD 206d14067 PUD 206d18067 PMD 0
[ 3055.450374] Oops: 0002 [#1] SMP
[ 3055.450400] Modules linked in:
[ 3055.450425] CPU: 0 PID: 2095 Comm: txg_sync Not tainted 3.17.0-rc5-antex+ #5
[ 3055.450456] Hardware name: OEM OEM/AB9/AB9RPO(Intel965+ICH8), BIOS 6.00 PG 11/06/2007
[ 3055.450504] task: ffff88020decd7d0 ti: ffff88020b680000 task.ti: ffff88020b680000
[ 3055.450551] RIP: 0010:[<ffffffff815ed3ff>]  [<ffffffff815ed3ff>] memset+0x1f/0xb0
[ 3055.450597] RSP: 0018:ffff88020b683b78  EFLAGS: 00010206
[ 3055.450626] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000800
[ 3055.450658] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 3055.450691] RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000000
[ 3055.450723] R10: 000000000000000d R11: 0000000000000000 R12: ffff880054da52a8
[ 3055.450754] R13: ffff88020c47e150 R14: 0000000000000001 R15: ffff880054da5250
[ 3055.450785] FS:  0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[ 3055.450832] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3055.450862] CR2: 0000000000000000 CR3: 0000000206d16000 CR4: 00000000000007f0
[ 3055.450894] Stack:
[ 3055.450915]  ffffffff8122f076 ffffffff8122fa9e ffff88020c47e150 ffff88020de9c4c8
[ 3055.450962]  ffff880200000001 ffff8800d3f0c4a8 ffff88020de9c4c8 ffff88020c47e168
[ 3055.451011]  0000000400000000 ffff880054da5250 0000000000000292 ffffffff812372b0
[ 3055.451064] Call Trace:
[ 3055.451088]  [<ffffffff8122f076>] ? dbuf_read+0x316/0x7d0
[ 3055.451114]  [<ffffffff8122fa9e>] ? dbuf_hold_impl+0x7e/0x90
[ 3055.451144]  [<ffffffff812372b0>] ? dmu_buf_hold_noread+0xb0/0x150
[ 3055.451175]  [<ffffffff81237398>] ? dmu_buf_hold+0x48/0x80
[ 3055.451205]  [<ffffffff8122bb4d>] ? bpobj_enqueue+0x11d/0x3c0
[ 3055.451235]  [<ffffffff8126d380>] ? spa_sync_props+0x4d0/0x4d0
[ 3055.451263]  [<ffffffff8126d385>] ? bpobj_enqueue_cb+0x5/0x10
[ 3055.451290]  [<ffffffff8122b1e0>] ? bplist_iterate+0x80/0xb0
[ 3055.451318]  [<ffffffff8126fa19>] ? spa_sync+0x6c9/0xae0
[ 3055.451348]  [<ffffffff8112bfaf>] ? __wake_up+0x2f/0x50
[ 3055.451379]  [<ffffffff8127ed00>] ? txg_sync_thread+0x340/0x670
[ 3055.451411]  [<ffffffff8127e9c0>] ? txg_delay+0x100/0x100
[ 3055.451442]  [<ffffffff811b9900>] ? thread_generic_wrapper+0x70/0x80
[ 3055.451471]  [<ffffffff811b9890>] ? __thread_exit+0xa0/0xa0
[ 3055.451500]  [<ffffffff81114c5a>] ? kthread+0xca/0xe0
[ 3055.451528]  [<ffffffff81114b90>] ? kthread_create_on_node+0x170/0x170
[ 3055.451564]  [<ffffffff81a6e6ec>] ? ret_from_fork+0x7c/0xb0
[ 3055.451595]  [<ffffffff81114b90>] ? kthread_create_on_node+0x170/0x170
[ 3055.451625] Code: 1e 44 88 1f c3 90 90 90 90 90 90 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 66 66 66 90 66 66 66 90 66 66
[ 3055.451808] RIP  [<ffffffff815ed3ff>] memset+0x1f/0xb0
[ 3055.451836]  RSP <ffff88020b683b78>
[ 3055.451860] CR2: 0000000000000000
[ 3055.452095] ---[ end trace c74bdc2be8a04fae ]---
[ 3056.324226] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 3056.324435] IP: [<ffffffff81261823>] lzjb_decompress+0xa3/0xd0
[ 3056.324565] PGD 0
[ 3056.324666] Oops: 0002 [#2] SMP
[ 3056.324797] Modules linked in:
[ 3056.324892] CPU: 3 PID: 2018 Comm: z_rd_int/1 Tainted: G      D        3.17.0-rc5-antex+ #5
[ 3056.324981] Hardware name: OEM OEM/AB9/AB9RPO(Intel965+ICH8), BIOS 6.00 PG 11/06/2007
[ 3056.325071] task: ffff88021ebbc7d0 ti: ffff88020b9bc000 task.ti: ffff88020b9bc000
[ 3056.325148] RIP: 0010:[<ffffffff81261823>]  [<ffffffff81261823>] lzjb_decompress+0xa3/0xd0
[ 3056.325255] RSP: 0018:ffff88020b9bfcc8  EFLAGS: 00010282
[ 3056.325311] RAX: 0000000000000001 RBX: ffff880155a09108 RCX: 0000000000004000
[ 3056.325372] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff8801de750402
[ 3056.325434] RBP: ffff880155a09108 R08: 0000000000000000 R09: 0000000000004000
[ 3056.325496] R10: 0000000000000001 R11: 0000000000000084 R12: 0000000000000000
[ 3056.325555] R13: ffff88021ebbc7d0 R14: 0000000000200000 R15: ffff880155a09108
[ 3056.325613] FS:  0000000000000000(0000) GS:ffff88021fd80000(0000) knlGS:0000000000000000
[ 3056.325689] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3056.325748] CR2: 0000000000000000 CR3: 0000000206ca6000 CR4: 00000000000007e0
[ 3056.325808] Stack:
[ 3056.325857]  ffffffff812c1927 ffff88004630d040 ffffffff812c0316 ffff880155a09108
[ 3056.326052]  0000000000000000 ffffffff812c5220 0000000000000004 ffffffff81279601
[ 3056.326247]  ffff880155a09108 0000000000000000 ffff880155a0a0c8 0000000000000000
[ 3056.326459] Call Trace:
[ 3056.326517]  [<ffffffff812c1927>] ? zio_decompress+0x37/0x50
[ 3056.326580]  [<ffffffff812c0316>] ? zio_pop_transforms+0x66/0xa0
[ 3056.326642]  [<ffffffff812c5220>] ? zio_done+0x190/0xd70
[ 3056.326700]  [<ffffffff81279601>] ? spa_config_exit+0x61/0x90
[ 3056.326760]  [<ffffffff812c5888>] ? zio_done+0x7f8/0xd70
[ 3056.326819]  [<ffffffff812c0047>] ? zio_wait_for_children+0x47/0x50
[ 3056.326888]  [<ffffffff812c5888>] ? zio_done+0x7f8/0xd70
[ 3056.326944]  [<ffffffff812c1a63>] ? zio_execute+0xa3/0x140
[ 3056.327003]  [<ffffffff811bb8ec>] ? taskq_thread+0x23c/0x4b0
[ 3056.327063]  [<ffffffff8111d010>] ? wake_up_state+0x10/0x10
[ 3056.327118]  [<ffffffff811bb6b0>] ? taskq_cancel_id+0x1d0/0x1d0
[ 3056.327176]  [<ffffffff81114c5a>] ? kthread+0xca/0xe0
[ 3056.327230]  [<ffffffff81114b90>] ? kthread_create_on_node+0x170/0x170
[ 3056.327290]  [<ffffffff81a6e6ec>] ? ret_from_fork+0x7c/0xb0
[ 3056.327349]  [<ffffffff81114b90>] ? kthread_create_on_node+0x170/0x170
[ 3056.327408] Code: 44 0f b6 41 ff 48 83 c0 01 83 ea 01 49 39 c1 44 88 40 ff 77 e3 49 39 c1 77 86 31 c0 c3 0f 1f 00 0f b6 17 48 83 c0 01 48 83 c7 01 <88> 50 ff 49 39 c1 0f 87 69 ff ff ff eb e1 0f 1f 80 00 00 00 00
[ 3056.328193] RIP  [<ffffffff81261823>] lzjb_decompress+0xa3/0xd0
[ 3056.328193]  RSP <ffff88020b9bfcc8>
[ 3056.328193] CR2: 0000000000000000
[ 3056.328193] ---[ end trace c74bdc2be8a04faf ]---

The text was updated successfully, but these errors were encountered:

dweeezil · 2014-09-18T04:48:55Z

I'll note that you can set the l2arc_nocompress to a nonzero value disable l2arc compression.

That said, I'm rather curious as to the history of this problem. Is it 3.17 specific? Is dedup enabled? The l2arc compression feature has been around for awhile and I'm not aware of other problems being reported related to it.

I'm guessing the memset call is actually a "bzero()" in the source code from the looks of things but the stack trace isn't making it very easy to figure out which one. My best guess is the first one in bpobj_enqueue().

It sounds like this is something you can reproduce pretty easily. Could you try compiling your spl and zfs modules with --enable-debug? That might cause one of the asserts to be tripped.

ChromaxS · 2014-09-19T17:12:59Z

I've been having this problem for a while, it's not just been on the 3.17 kernel. Dedup is turned on as well.

So I enabled --enable-debug and it's now crashing trying to bring the mirror back to a consistent state... I think it finally crashed at the right point that one of the mirror devices got corrupted.

[  869.840185] SPLError: 2517:0:(vdev.c:1765:vdev_dtl_should_excise()) VERIFY3(scn->scn_phys.scn_min_txg < vd->vdev_resilver_txg) failed (205958 < 46267)
[  869.840248] SPLError: 2517:0:(vdev.c:1765:vdev_dtl_should_excise()) SPL PANIC
[  869.840281] SPL: Showing stack for process 2517
[  869.840310] CPU: 2 PID: 2517 Comm: txg_sync Not tainted 3.17.0-rc5-antex+ #6
[  869.840342] Hardware name: OEM OEM/AB9/AB9RPO(Intel965+ICH8), BIOS 6.00 PG 11/06/2007
[  869.840387]  ffffffff81c873f0 ffff88020b01b898 ffffffff81b0667d ffff8800da947010
[  869.840438]  ffff88020b01b8b0 ffffffff811beec0 0000000000000000 ffff88020b01b8d8
[  869.840492]  ffffffff811c00f6 ffffffff81e5016e ffff88020b01b910 ffffffff81eb7d23
[  869.840541] Call Trace:
[  869.840568]  [<ffffffff81b0667d>] dump_stack+0x45/0x56
[  869.840598]  [<ffffffff811beec0>] spl_debug_dumpstack+0x40/0x50
[  869.840629]  [<ffffffff811c00f6>] spl_debug_bug+0x76/0xd0
[  869.840659]  [<ffffffff811cf165>] spl_PANIC+0xa5/0xc0
[  869.840688]  [<ffffffff811c7152>] ? kmem_alloc_debug+0x232/0x4a0
[  869.840718]  [<ffffffff81236993>] ? avl_find+0x63/0x100
[  869.840746]  [<ffffffff812cb1f4>] ? txg_list_add+0x64/0xa0
[  869.840773]  [<ffffffff812c227c>] ? spa_config_held+0x8c/0xb0
[  869.840802]  [<ffffffff812ce696>] vdev_dtl_reassess+0x636/0x680
[  869.840835]  [<ffffffff8123a170>] ? nvlist_add_common.part.51+0x300/0x420
[  869.840870]  [<ffffffff812c8c60>] ? space_map_alloc_delta+0x60/0x60
[  869.840902]  [<ffffffff812ce0cd>] vdev_dtl_reassess+0x6d/0x680
[  869.840930]  [<ffffffff812c17df>] ? log_internal+0x12f/0x170
[  869.840957]  [<ffffffff812c227c>] ? spa_config_held+0x8c/0xb0
[  869.840986]  [<ffffffff812ce0cd>] vdev_dtl_reassess+0x6d/0x680
[  869.841018]  [<ffffffff812c1899>] ? spa_history_log_internal+0x79/0x1a0
[  869.841052]  [<ffffffff812997fe>] dsl_scan_done+0x1ae/0x260
[  869.841083]  [<ffffffff8129dc54>] dsl_scan_sync+0x974/0x1040
[  869.841113]  [<ffffffff81325a23>] ? zio_wait+0x1f3/0x3f0
[  869.841141]  [<ffffffff812b5532>] spa_sync+0x452/0xc50
[  869.841170]  [<ffffffff8114d730>] ? ktime_get_ts64+0x40/0x100
[  869.841201]  [<ffffffff812ca1fa>] txg_sync_thread+0x39a/0x6d0
[  869.841233]  [<ffffffff812c9e60>] ? txg_thread_wait+0xe0/0xe0
[  869.841265]  [<ffffffff811c890e>] thread_generic_wrapper+0x7e/0xc0
[  869.841294]  [<ffffffff811c8890>] ? __thread_exit+0xa0/0xa0
[  869.841323]  [<ffffffff8111a376>] kthread+0xd6/0xf0
[  869.841349]  [<ffffffff8111a2a0>] ? kthread_create_on_node+0x170/0x170
[  869.841381]  [<ffffffff81b0e96c>] ret_from_fork+0x7c/0xb0
[  869.841409]  [<ffffffff8111a2a0>] ? kthread_create_on_node+0x170/0x170

I zero'd out the first and last 100MB of the drive and did a "replace" on the pool with the same device (after running the drive through some diagnostics). The resilver just completed and I am now back to rsync'ing to see if the debug build provides any additional information if it crashes.

behlendorf · 2016-06-23T17:46:33Z

Fix in #4790.

behlendorf · 2016-07-15T20:57:38Z

Closing, fixed by #4790.

behlendorf added the Bug label Sep 22, 2014

behlendorf added this to the 0.7.0 milestone Jun 23, 2016

behlendorf mentioned this issue Jun 23, 2016

FreeBSD rS271776 Persist vdev_resilver_txg changes #4790

Closed

behlendorf closed this as completed Jul 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panic from zio_decompress in lzjb_decompress under heavy I/O #2709

Kernel panic from zio_decompress in lzjb_decompress under heavy I/O #2709

ChromaxS commented Sep 17, 2014

dweeezil commented Sep 18, 2014

ChromaxS commented Sep 19, 2014

behlendorf commented Jun 23, 2016

behlendorf commented Jul 15, 2016

Kernel panic from zio_decompress in lzjb_decompress under heavy I/O #2709

Kernel panic from zio_decompress in lzjb_decompress under heavy I/O #2709

Comments

ChromaxS commented Sep 17, 2014

dweeezil commented Sep 18, 2014

ChromaxS commented Sep 19, 2014

behlendorf commented Jun 23, 2016

behlendorf commented Jul 15, 2016