gentoo 0.6.5.4-r2 deadlock #4318

alexanderhaensch · 2016-02-08T07:41:55Z

This basically crashes the host.Very annoying.
@ryao is one of the additional patches responsible for this?

[696507.617042] VERIFY(txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool)) failed
[696507.617047] PANIC at dmu_tx.c:1277:dmu_tx_assign()
[696507.617048] Showing stack for process 15718
[696507.617051] CPU: 7 PID: 15718 Comm: dump_list_strat Tainted: P O 3.14.51-hardened #6
[696507.617052] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0b 06/30/2014
[696507.617053] 0000000000000000 8a444e7f241c2441 ffff881617f2afe0 ffffffff816581ce
[696507.617056] ffffffffa0bcc09f ffff881617f2aff0 ffffffffa07987ff ffff881617f2b178
[696507.617058] ffffffffa07988ca ffff881eeffc70c0 ffff881e00000028 ffff881617f2b188
[696507.617060] Call Trace:
[696507.617069] [] dump_stack+0x45/0x56
[696507.617114] [] ? _fini+0x207cd/0x7d6e7 [zfs]
[696507.617123] [] spl_dumpstack+0x3f/0x50 [spl]
[696507.617129] [] spl_panic+0xba/0xf0 [spl]
[696507.617140] [] ? _fini+0x2cee/0x7d6e7 [zfs]
[696507.617144] [] ? _raw_spin_unlock+0x9/0x10
[696507.617150] [] ? tsd_hash_search.isra.0+0x6f/0xa0 [spl]
[696507.617151] [] ? _raw_spin_unlock+0x9/0x10
[696507.617169] [] ? rrw_held+0x7a/0xf0 [zfs]
[696507.617180] [] ? _fini+0x20de6/0x7d6e7 [zfs]
[696507.617195] [] dmu_tx_assign+0x732/0x750 [zfs]
[696507.617208] [] ? dmu_tx_count_dnode+0x53/0xa0 [zfs]
[696507.617221] [] ? dmu_tx_hold_sa+0x1fb/0x230 [zfs]
[696507.617236] [] zfs_inactive+0x152/0x3c0 [zfs]
[696507.617248] [] zpl_evict_inode+0x3e/0x90 [zfs]
[696507.617257] [] ? _fini+0xddae/0x7d6e7 [zfs]
[696507.617262] [] evict+0x9e/0x1b0
[696507.617264] [] dispose_list+0x36/0x50
[696507.617266] [] prune_icache_sb+0x51/0x80
[696507.617269] [] super_cache_scan+0x102/0x170
[696507.617274] [] shrink_slab_node+0xf8/0x180
[696507.617276] [] shrink_slab+0x11a/0x150
[696507.617279] [] do_try_to_free_pages+0x45a/0x580
[696507.617281] [] try_to_free_pages+0xc7/0xf0
[696507.617283] [] __alloc_pages_nodemask+0x630/0xae0
[696507.617287] [] alloc_pages_current+0x9f/0x150
[696507.617290] [] new_slab+0x2ac/0x340
[696507.617292] [] __slab_alloc+0x455/0x5b0
[696507.617298] [] ? spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617314] [] ? vdev_mirror_io_start+0xc3/0x1e0 [zfs]
[696507.617320] [] ? spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617326] [] ? spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617328] [] kmem_cache_alloc+0x9b/0x130
[696507.617334] [] spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617338] [] ? __mutex_init+0x51/0x60
[696507.617355] [] ? refcount_create+0x32/0x120 [zfs]
[696507.617367] [] ? dbuf_cons+0x6e/0x80 [zfs]
[696507.617380] [] zio_create+0x85/0x700 [zfs]
[696507.617392] [] zio_null+0x5c/0x60 [zfs]
[696507.617405] [] zio_root+0x19/0x20 [zfs]
[696507.617417] [] dbuf_read+0x92e/0xd00 [zfs]
[696507.617419] [] ? _raw_spin_unlock+0x9/0x10
[696507.617436] [] ? refcount_remove_many+0x1ba/0x2a0 [zfs]
[696507.617451] [] ? dnode_rele_and_unlock+0x55/0xc0 [zfs]
[696507.617462] [] ? _fini+0x1926/0x7d6e7 [zfs]
[696507.617475] [] dmu_bonus_hold+0xf1/0x360 [zfs]
[696507.617486] [] ? _fini+0x1f7e/0x7d6e7 [zfs]
[696507.617501] [] dsl_dataset_hold_obj+0x4b/0xac0 [zfs]
[696507.617516] [] ? zap_cursor_retrieve+0x185/0x420 [zfs]
[696507.617527] [] ? _fini+0x1f7e/0x7d6e7 [zfs]
[696507.617538] [] ? _fini+0x1f7e/0x7d6e7 [zfs]
[696507.617552] [] dmu_objset_find_dp_impl+0x54c/0x6a0 [zfs]
[696507.617566] [] dmu_objset_find_dp_impl+0x35b/0x6a0 [zfs]
[696507.617580] [] dmu_objset_find_dp+0x178/0x210 [zfs]
[696507.617595] [] ? dataset_name_hidden+0x50/0x50 [zfs]
[696507.617609] [] dump_list_strategy_one+0x6e/0xf0 [zfs]
[696507.617624] [] dump_list_strategy_impl+0x2c4/0x400 [zfs]
[696507.617627] [] ? try_to_wake_up+0xdb/0x2a0
[696507.617641] [] ? dump_list_strategy_impl+0x400/0x400 [zfs]
[696507.617655] [] dump_list_strategy+0x33/0xd0 [zfs]
[696507.617657] [] ? kfree+0xe9/0xf0
[696507.617663] [] thread_generic_wrapper+0x75/0xc0 [spl]
[696507.617669] [] ? __thread_exit+0x20/0x20 [spl]
[696507.617674] [] kthread+0xe8/0x100
[696507.617677] [] ? kthread_create_on_node+0x1b0/0x1b0
[696507.617680] [] ret_from_fork+0x49/0x80
[696507.617682] [] ? kthread_create_on_node+0x1b0/0x1b0

behlendorf · 2016-02-08T17:48:20Z

@ryao this does look like a Gentoo specific issue, dump_list_strategy() doesn't appear in the official ZoL code. You may want to disable direct reclaim for this code path to resolve this.

alexanderhaensch · 2016-02-09T08:15:54Z

The function is in the new API patches. Can i disable direct reclaim with a switch during runtime?

ryao · 2016-02-09T23:52:24Z

@alexanderhaensch The new API is calling into the code under the pool config lock (which I thought was safe). A presumably unmounted dataset was outside of memory, so we tried reading it. When allocating the buffer for the read, the system was low on memory, so it entered direct reclaim. That started a DMU transaction in a code path holding the config lock. There is a VERIFY statement In dmu_tx_assign() intended to disallow that, which caused this.

You can emerge the -r0 version of the userland tools to avoid this code path until I write a fix. I can imagine a few ways of fixing this, but I am still thinking about which is the right way. I should decide on one and put it into the tree this week, although I can share the patch sooner if you need it. I just need to decide on which first.

alexanderhaensch · 2016-02-10T07:30:35Z

@ryao This happens Fridays or Saturdays and i have a cronjob in mind that could hit the bug. I will try to disable the not essential cronjob and wait.

ryao · 2016-02-10T13:55:45Z

@behlendorf It looks like this is a direct reclaim bug that can happen with a multitude of ioctls, but it was so rare that it had not been reported until I put the proposed stable API into production in Gentoo. To name a few possible call paths:

dmu_objset_find_dp
spa_check_logs
spa_load_impl
spa_load
spa_load_best
spa_open_common
spa_open
zfs_ioc_{pool_scan,pool_freeze,pool_upgrade,pool_get_history,pool_reguid,vdev_add,vdev_remove,vdev_set_state,vdev_attach,vdev_detach,vdev_split,vdev_setpath,vdev_setfru,pool_set_props,error_log,clear,pool_reopen}

In those call paths, this only would happen on uninitialized pools. The same problem should occur under dmu_objset_find, which appears in several interesting call paths too, such as zfs_ioc_rename.

As it turns out, dsl_dataset_hold_obj has ASSERT(dsl_pool_config_held(dp)); while dmu_tx_assign has ASSERT(txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool));. This makes it impossible to satisfy simultaneously outside of disable direct reclaim via spl_fstrans_mark()/spl_fstrans_unmark().

The right solution seems to be to disable direct reclaim on all ioctls. We could try to cherry-pick only the ioctls that are known to be affected by direct reclaim, but the ioctls are run so infrequently that trying to do cherry-pick ones is of dubious benefit.

We have already disabled direct reclaim on the ZIO threads, the txg sync thread, the eviction callback, the VFS and zvol requests. Once we disable it on the ioctl interface, the only things in the ZFS codebase that have not had it disabled are the txg quiesce thread and the zil clean taskq. However, the txg quiesce thread does a memory allocation that can deadlock us, which I just spotted upon checking it:

kmem_zalloc
spa_txg_history_add
txg_quiesce
txg_quiesce_thread

At the same time, the zil clean taskq looks okay, but that is because it does appear to do memory allocations. Consequently, leaving direct reclaim enabled is the same as explicitly disabling it and if we have disabled it on everything else, we might as well disable it on it too. That would allow us to do some cleanup by disabling direct reclaim at the SPL such that we could remove spl_fstrans_mark()/spl_fstrans_unmark() from the txg_sync thread and the ZIO worker threads while disabling direct reclaim on the quiesce thread, which appears to need it.

alexanderhaensch · 2016-02-15T07:43:15Z

I have more information when this bug is triggered:
On the weekend there is a job running that collects the disk usage information on rsnapshot directories. This takes a very long time (18h) so it is overlapping with two individual zfs-send jobs and one rsnapshot weekly.
Disabling the rsnapshot-du cronjob is a workaround for me.

behlendorf · 2016-11-08T01:16:07Z

Closing. Gentoo specific and resolved.

ryao mentioned this issue Feb 16, 2016

issues #2217, #3681 - set of commits dealing with zvol_*minor*() processing #3830

Closed

lkastner mentioned this issue Apr 6, 2016

zfs-mount fails with 0.6.5.4-r2 on gentoo #4494

Closed

behlendorf closed this as completed Nov 8, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gentoo 0.6.5.4-r2 deadlock #4318

gentoo 0.6.5.4-r2 deadlock #4318

alexanderhaensch commented Feb 8, 2016

behlendorf commented Feb 8, 2016

alexanderhaensch commented Feb 9, 2016

ryao commented Feb 9, 2016

alexanderhaensch commented Feb 10, 2016

ryao commented Feb 10, 2016

alexanderhaensch commented Feb 15, 2016

behlendorf commented Nov 8, 2016

gentoo 0.6.5.4-r2 deadlock #4318

gentoo 0.6.5.4-r2 deadlock #4318

Comments

alexanderhaensch commented Feb 8, 2016

behlendorf commented Feb 8, 2016

alexanderhaensch commented Feb 9, 2016

ryao commented Feb 9, 2016

alexanderhaensch commented Feb 10, 2016

ryao commented Feb 10, 2016

alexanderhaensch commented Feb 15, 2016

behlendorf commented Nov 8, 2016