Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gentoo 0.6.5.4-r2 deadlock #4318

Closed
alexanderhaensch opened this issue Feb 8, 2016 · 7 comments
Closed

gentoo 0.6.5.4-r2 deadlock #4318

alexanderhaensch opened this issue Feb 8, 2016 · 7 comments

Comments

@alexanderhaensch
Copy link

This basically crashes the host.Very annoying.
@ryao is one of the additional patches responsible for this?

[696507.617042] VERIFY(txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool)) failed
[696507.617047] PANIC at dmu_tx.c:1277:dmu_tx_assign()
[696507.617048] Showing stack for process 15718
[696507.617051] CPU: 7 PID: 15718 Comm: dump_list_strat Tainted: P O 3.14.51-hardened #6
[696507.617052] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0b 06/30/2014
[696507.617053] 0000000000000000 8a444e7f241c2441 ffff881617f2afe0 ffffffff816581ce
[696507.617056] ffffffffa0bcc09f ffff881617f2aff0 ffffffffa07987ff ffff881617f2b178
[696507.617058] ffffffffa07988ca ffff881eeffc70c0 ffff881e00000028 ffff881617f2b188
[696507.617060] Call Trace:
[696507.617069] [] dump_stack+0x45/0x56
[696507.617114] [] ? _fini+0x207cd/0x7d6e7 [zfs]
[696507.617123] [] spl_dumpstack+0x3f/0x50 [spl]
[696507.617129] [] spl_panic+0xba/0xf0 [spl]
[696507.617140] [] ? _fini+0x2cee/0x7d6e7 [zfs]
[696507.617144] [] ? _raw_spin_unlock+0x9/0x10
[696507.617150] [] ? tsd_hash_search.isra.0+0x6f/0xa0 [spl]
[696507.617151] [] ? _raw_spin_unlock+0x9/0x10
[696507.617169] [] ? rrw_held+0x7a/0xf0 [zfs]
[696507.617180] [] ? _fini+0x20de6/0x7d6e7 [zfs]
[696507.617195] [] dmu_tx_assign+0x732/0x750 [zfs]
[696507.617208] [] ? dmu_tx_count_dnode+0x53/0xa0 [zfs]
[696507.617221] [] ? dmu_tx_hold_sa+0x1fb/0x230 [zfs]
[696507.617236] [] zfs_inactive+0x152/0x3c0 [zfs]
[696507.617248] [] zpl_evict_inode+0x3e/0x90 [zfs]
[696507.617257] [] ? _fini+0xddae/0x7d6e7 [zfs]
[696507.617262] [] evict+0x9e/0x1b0
[696507.617264] [] dispose_list+0x36/0x50
[696507.617266] [] prune_icache_sb+0x51/0x80
[696507.617269] [] super_cache_scan+0x102/0x170
[696507.617274] [] shrink_slab_node+0xf8/0x180
[696507.617276] [] shrink_slab+0x11a/0x150
[696507.617279] [] do_try_to_free_pages+0x45a/0x580
[696507.617281] [] try_to_free_pages+0xc7/0xf0
[696507.617283] [] __alloc_pages_nodemask+0x630/0xae0
[696507.617287] [] alloc_pages_current+0x9f/0x150
[696507.617290] [] new_slab+0x2ac/0x340
[696507.617292] [] __slab_alloc+0x455/0x5b0
[696507.617298] [] ? spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617314] [] ? vdev_mirror_io_start+0xc3/0x1e0 [zfs]
[696507.617320] [] ? spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617326] [] ? spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617328] [] kmem_cache_alloc+0x9b/0x130
[696507.617334] [] spl_kmem_cache_alloc+0x9d/0xc50 [spl]
[696507.617338] [] ? __mutex_init+0x51/0x60
[696507.617355] [] ? refcount_create+0x32/0x120 [zfs]
[696507.617367] [] ? dbuf_cons+0x6e/0x80 [zfs]
[696507.617380] [] zio_create+0x85/0x700 [zfs]
[696507.617392] [] zio_null+0x5c/0x60 [zfs]
[696507.617405] [] zio_root+0x19/0x20 [zfs]
[696507.617417] [] dbuf_read+0x92e/0xd00 [zfs]
[696507.617419] [] ? _raw_spin_unlock+0x9/0x10
[696507.617436] [] ? refcount_remove_many+0x1ba/0x2a0 [zfs]
[696507.617451] [] ? dnode_rele_and_unlock+0x55/0xc0 [zfs]
[696507.617462] [] ? _fini+0x1926/0x7d6e7 [zfs]
[696507.617475] [] dmu_bonus_hold+0xf1/0x360 [zfs]
[696507.617486] [] ? _fini+0x1f7e/0x7d6e7 [zfs]
[696507.617501] [] dsl_dataset_hold_obj+0x4b/0xac0 [zfs]
[696507.617516] [] ? zap_cursor_retrieve+0x185/0x420 [zfs]
[696507.617527] [] ? _fini+0x1f7e/0x7d6e7 [zfs]
[696507.617538] [] ? _fini+0x1f7e/0x7d6e7 [zfs]
[696507.617552] [] dmu_objset_find_dp_impl+0x54c/0x6a0 [zfs]
[696507.617566] [] dmu_objset_find_dp_impl+0x35b/0x6a0 [zfs]
[696507.617580] [] dmu_objset_find_dp+0x178/0x210 [zfs]
[696507.617595] [] ? dataset_name_hidden+0x50/0x50 [zfs]
[696507.617609] [] dump_list_strategy_one+0x6e/0xf0 [zfs]
[696507.617624] [] dump_list_strategy_impl+0x2c4/0x400 [zfs]
[696507.617627] [] ? try_to_wake_up+0xdb/0x2a0
[696507.617641] [] ? dump_list_strategy_impl+0x400/0x400 [zfs]
[696507.617655] [] dump_list_strategy+0x33/0xd0 [zfs]
[696507.617657] [] ? kfree+0xe9/0xf0
[696507.617663] [] thread_generic_wrapper+0x75/0xc0 [spl]
[696507.617669] [] ? __thread_exit+0x20/0x20 [spl]
[696507.617674] [] kthread+0xe8/0x100
[696507.617677] [] ? kthread_create_on_node+0x1b0/0x1b0
[696507.617680] [] ret_from_fork+0x49/0x80
[696507.617682] [] ? kthread_create_on_node+0x1b0/0x1b0

@behlendorf
Copy link
Contributor

@ryao this does look like a Gentoo specific issue, dump_list_strategy() doesn't appear in the official ZoL code. You may want to disable direct reclaim for this code path to resolve this.

@alexanderhaensch
Copy link
Author

The function is in the new API patches. Can i disable direct reclaim with a switch during runtime?

@ryao
Copy link
Contributor

ryao commented Feb 9, 2016

@alexanderhaensch The new API is calling into the code under the pool config lock (which I thought was safe). A presumably unmounted dataset was outside of memory, so we tried reading it. When allocating the buffer for the read, the system was low on memory, so it entered direct reclaim. That started a DMU transaction in a code path holding the config lock. There is a VERIFY statement In dmu_tx_assign() intended to disallow that, which caused this.

You can emerge the -r0 version of the userland tools to avoid this code path until I write a fix. I can imagine a few ways of fixing this, but I am still thinking about which is the right way. I should decide on one and put it into the tree this week, although I can share the patch sooner if you need it. I just need to decide on which first.

@alexanderhaensch
Copy link
Author

@ryao This happens Fridays or Saturdays and i have a cronjob in mind that could hit the bug. I will try to disable the not essential cronjob and wait.

@ryao
Copy link
Contributor

ryao commented Feb 10, 2016

@behlendorf It looks like this is a direct reclaim bug that can happen with a multitude of ioctls, but it was so rare that it had not been reported until I put the proposed stable API into production in Gentoo. To name a few possible call paths:

dmu_objset_find_dp
spa_check_logs
spa_load_impl
spa_load
spa_load_best
spa_open_common
spa_open
zfs_ioc_{pool_scan,pool_freeze,pool_upgrade,pool_get_history,pool_reguid,vdev_add,vdev_remove,vdev_set_state,vdev_attach,vdev_detach,vdev_split,vdev_setpath,vdev_setfru,pool_set_props,error_log,clear,pool_reopen}

In those call paths, this only would happen on uninitialized pools. The same problem should occur under dmu_objset_find, which appears in several interesting call paths too, such as zfs_ioc_rename.

As it turns out, dsl_dataset_hold_obj has ASSERT(dsl_pool_config_held(dp)); while dmu_tx_assign has ASSERT(txg_how != TXG_WAIT || !dsl_pool_config_held(tx->tx_pool));. This makes it impossible to satisfy simultaneously outside of disable direct reclaim via spl_fstrans_mark()/spl_fstrans_unmark().

The right solution seems to be to disable direct reclaim on all ioctls. We could try to cherry-pick only the ioctls that are known to be affected by direct reclaim, but the ioctls are run so infrequently that trying to do cherry-pick ones is of dubious benefit.

We have already disabled direct reclaim on the ZIO threads, the txg sync thread, the eviction callback, the VFS and zvol requests. Once we disable it on the ioctl interface, the only things in the ZFS codebase that have not had it disabled are the txg quiesce thread and the zil clean taskq. However, the txg quiesce thread does a memory allocation that can deadlock us, which I just spotted upon checking it:

kmem_zalloc
spa_txg_history_add
txg_quiesce
txg_quiesce_thread

At the same time, the zil clean taskq looks okay, but that is because it does appear to do memory allocations. Consequently, leaving direct reclaim enabled is the same as explicitly disabling it and if we have disabled it on everything else, we might as well disable it on it too. That would allow us to do some cleanup by disabling direct reclaim at the SPL such that we could remove spl_fstrans_mark()/spl_fstrans_unmark() from the txg_sync thread and the ZIO worker threads while disabling direct reclaim on the quiesce thread, which appears to need it.

@alexanderhaensch
Copy link
Author

I have more information when this bug is triggered:
On the weekend there is a job running that collects the disk usage information on rsnapshot directories. This takes a very long time (18h) so it is overlapping with two individual zfs-send jobs and one rsnapshot weekly.
Disabling the rsnapshot-du cronjob is a workaround for me.

@behlendorf
Copy link
Contributor

Closing. Gentoo specific and resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants