excessively long time importing ZFS cache in 0.7.0 RC2 #5576

Open
Bronek opened this Issue Jan 9, 2017 · 4 comments

Projects

None yet

3 participants

@Bronek
Bronek commented Jan 9, 2017 edited
Type Version/Name
Distribution Name Arch
Linux Kernel 4.8.17
Architecture x64
ZFS Version 0.7.0 rc2
SPL Version 0.7.0 rc2

Importing ZFS cache at startup takes slightly more than 5 minutes. Additionally during this time following was logged:

Jan 09 20:37:11 gdansk systemd[1]: Starting Import ZFS pools by cache file...
Jan 09 20:41:07 gdansk kernel: INFO: task l2arc_feed:377 blocked for more than 120 seconds.
Jan 09 20:41:07 gdansk kernel:       Tainted: P           O    4.8.17-1-ARCH #1
Jan 09 20:41:07 gdansk kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 09 20:41:07 gdansk kernel: l2arc_feed      D ffff881033dcbd08     0   377      2 0x00000000
Jan 09 20:41:07 gdansk kernel:  ffff881033dcbd08 00ad000000000200 ffff8820393d3800 ffff8820386be200
Jan 09 20:41:07 gdansk kernel:  ffffffffa05c91e0 ffff881033dcc000 ffff8820386be200 ffffffffa06a9d04
Jan 09 20:41:07 gdansk kernel:  00000000ffffffff ffffffffa06a9d08 ffff881033dcbd20 ffffffff815c4f85
Jan 09 20:41:07 gdansk kernel: Call Trace:
Jan 09 20:41:07 gdansk kernel:  [<ffffffff815c4f85>] schedule+0x35/0x80
Jan 09 20:41:07 gdansk kernel:  [<ffffffff815c520e>] schedule_preempt_disabled+0xe/0x10
Jan 09 20:41:07 gdansk kernel:  [<ffffffff815c65e3>] __mutex_lock_slowpath+0xb3/0x120
Jan 09 20:41:07 gdansk kernel:  [<ffffffffa048f320>] ? l2arc_evict+0x2c0/0x2c0 [zfs]
Jan 09 20:41:07 gdansk kernel:  [<ffffffff815c666f>] mutex_lock+0x1f/0x30
Jan 09 20:41:07 gdansk kernel:  [<ffffffffa048f485>] l2arc_feed_thread+0x165/0xa30 [zfs]
Jan 09 20:41:07 gdansk kernel:  [<ffffffff8102b76e>] ? __switch_to+0x2ce/0x5b0
Jan 09 20:41:07 gdansk kernel:  [<ffffffff810b46b3>] ? pick_next_task_fair+0x113/0x4c0
Jan 09 20:41:07 gdansk kernel:  [<ffffffffa02261aa>] ? spl_kmem_free+0x2a/0x40 [spl]
Jan 09 20:41:07 gdansk kernel:  [<ffffffff811d4bfb>] ? kfree+0x14b/0x160
Jan 09 20:41:07 gdansk kernel:  [<ffffffffa048f320>] ? l2arc_evict+0x2c0/0x2c0 [zfs]
Jan 09 20:41:07 gdansk kernel:  [<ffffffffa0228571>] thread_generic_wrapper+0x71/0x80 [spl]
Jan 09 20:41:07 gdansk kernel:  [<ffffffffa0228500>] ? __thread_exit+0x20/0x20 [spl]
Jan 09 20:41:07 gdansk kernel:  [<ffffffff81096a78>] kthread+0xd8/0xf0
Jan 09 20:41:07 gdansk kernel:  [<ffffffff8102b76e>] ? __switch_to+0x2ce/0x5b0
Jan 09 20:41:07 gdansk kernel:  [<ffffffff815c8cbf>] ret_from_fork+0x1f/0x40
Jan 09 20:41:07 gdansk kernel:  [<ffffffff810969a0>] ? kthread_worker_fn+0x180/0x180
Jan 09 20:42:25 gdansk systemd[1]: Started Import ZFS pools by cache file.

The system was not shutdown cleanly previously (due to OOM when testing 0.7.0 RC2). The first start of the system after upgrade to 0.7.0 RC2 only needed 10s to import cache (same as version 0.6.5.8). However even after clean restart following excessively slow importing of ZFS cache, the problem persisted.

@behlendorf
Member

It's likely that l2arc cache is actually the victim here. It was block waiting for acquire the mutex. Do you believe this is a new issue with -rc2?

@Bronek
Bronek commented Jan 13, 2017 edited

Yes, it is new. I just checked my logs (from serial port) and since Nov 2015 this was first time ZFS import by cache took more than 2 minutes. There were few occasions when it took over 1 minute, in over a year (machine is restarted daily). The L2ARC is hosted on DC P3700 NVMe (same as SLOG).

@kpande
Member
kpande commented Jan 14, 2017

you've reported a lot of issues that may be resolved by commits after -rc2. can you test with HEAD?

@Bronek
Bronek commented Jan 15, 2017

after upgrade to current HEAD (and linux 4.9.4) I no longer see OOM, but I will try "dirty" shutdown to see if other problems arise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment