Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs hanged when multiple operations on dataset #4404

Closed
zhenghc opened this issue Mar 11, 2016 · 4 comments
Closed

zfs hanged when multiple operations on dataset #4404

zhenghc opened this issue Mar 11, 2016 · 4 comments

Comments

@zhenghc
Copy link

zhenghc commented Mar 11, 2016

when are currently testing flocker with zfs on kubernetes.

when multiple zfs command operate on same pool, all command hanged and was not killable

root 11402 29978 0 14:22 pts/2 00:00:00 zfs list
root 27963 27920 0 12:44 ? 00:00:00 zfs receive flocker-silver/5564c527-6028-4fcd-87fd-58b25453d190.default.82872f21-c667-44ef-a620-757edb09583b
root 27966 16368 0 12:44 ? 00:00:00 zfs rename flocker-silver/af1addb3-46af-4291-af24-dd8973a592e2.default.37565b70-33da-4cee-a1a1-e13c00ea02fe flocker-silver/5a178e58-0fd2-403a-b9a1-2b4a4d075c5e.default.37565b70-33da-4cee-a1a1-e13c00ea02fe
root 27995 15601 0 12:44 ? 00:00:00 zfs get -Hp all flocker-silver/af1addb3-46af-4291-af24-dd8973a592e2.default.6ca0988a-7b1f-4b52-a7ef-518a15f8b607
root 28406 16511 0 12:47 pts/0 00:00:00 zfs list
root 29641 25624 0 12:55 pts/1 00:00:00 zfs list

call stack are
[]# cat /proc/27963/stack
[] cv_wait_common+0x125/0x150 [spl]
[] __cv_wait+0x15/0x20 [spl]
[] rrw_enter_read_impl+0x53/0x170 [zfs]
[] rrw_enter+0x20/0x30 [zfs]
[] dsl_pool_hold+0x5a/0x80 [zfs]
[] dmu_objset_hold+0x35/0xd0 [zfs]
[] dsl_prop_get+0x44/0xa0 [zfs]
[] dsl_prop_get_integer+0x1e/0x20 [zfs]
[] __zvol_create_minor+0xbd/0x640 [zfs]
[] zvol_create_minor+0x33/0x70 [zfs]
[] zvol_create_minors_cb+0xe/0x20 [zfs]
[] dmu_objset_find_impl+0x3a5/0x430 [zfs]
[] dmu_objset_find+0x58/0x90 [zfs]
[] zvol_create_minors+0x2f/0x40 [zfs]
[] zfs_ioc_recv+0x3a2/0xbf0 [zfs]
[] zfsdev_ioctl+0x4bd/0x500 [zfs]
[] do_vfs_ioctl+0x2e5/0x4c0
[] SyS_ioctl+0xa1/0xc0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

[]# cat /proc/27966/stack
[] cv_wait_common+0x125/0x150 [spl]
[] __cv_wait+0x15/0x20 [spl]
[] txg_wait_synced+0xef/0x140 [zfs]
[] dsl_sync_task+0x177/0x270 [zfs]
[] dsl_dir_rename+0x5b/0x80 [zfs]
[] zfs_ioc_rename+0x107/0x110 [zfs]
[] zfsdev_ioctl+0x4bd/0x500 [zfs]
[] do_vfs_ioctl+0x2e5/0x4c0
[] SyS_ioctl+0xa1/0xc0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

[]# cat /proc/27995/stack
[] cv_wait_common+0x125/0x150 [spl]
[] __cv_wait+0x15/0x20 [spl]
[] rrw_enter_read_impl+0x53/0x170 [zfs]
[] rrw_enter+0x20/0x30 [zfs]
[] dsl_pool_hold+0x5a/0x80 [zfs]
[] dmu_objset_hold+0x35/0xd0 [zfs]
[] zfs_ioc_objset_stats+0x30/0x80 [zfs]
[] zfsdev_ioctl+0x4bd/0x500 [zfs]
[] do_vfs_ioctl+0x2e5/0x4c0
[] SyS_ioctl+0xa1/0xc0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

[]# cat /proc/28406/stack
[] cv_wait_common+0x125/0x150 [spl]
[] __cv_wait+0x15/0x20 [spl]
[] rrw_enter_read_impl+0x53/0x170 [zfs]
[] rrw_enter+0x20/0x30 [zfs]
[] dsl_pool_hold+0x5a/0x80 [zfs]
[] dmu_objset_hold+0x35/0xd0 [zfs]
[] zfs_ioc_objset_stats+0x30/0x80 [zfs]
[] zfsdev_ioctl+0x4bd/0x500 [zfs]
[] do_vfs_ioctl+0x2e5/0x4c0
[] SyS_ioctl+0xa1/0xc0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff

[]# uname -a
Linux tos-perf1 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[]# rpm -qa spl
spl-0.6.5.4-1.el7.centos.x86_64

[]# rpm -qa zfs
zfs-0.6.5.4-1.el7.centos.x86_64

stack.txt

here is the dmesg

dmesg.txt

@zhenghc
Copy link
Author

zhenghc commented Mar 14, 2016

another kernel thread observed in dmesg log

12564 Mar 11 12:47:14 tos-perf4 kernel: INFO: task txg_sync:6845 blocked for more than 120 seconds.
12565 Mar 11 12:47:14 tos-perf4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
12566 Mar 11 12:47:14 tos-perf4 kernel: txg_sync D ffffffffa0783cc8 0 6845 2 0x00000080
12567 Mar 11 12:47:14 tos-perf4 kernel: ffff881038c1faf0 0000000000000046 ffff88103bfea280 ffff881038c1ffd8
12568 Mar 11 12:47:14 tos-perf4 kernel: ffff881038c1ffd8 ffff881038c1ffd8 ffff88103bfea280 ffffffffa0783cc0
12569 Mar 11 12:47:14 tos-perf4 kernel: ffffffffa0783cc4 ffff88103bfea280 00000000ffffffff ffffffffa0783cc8
12570 Mar 11 12:47:14 tos-perf4 kernel: Call Trace:
12571 Mar 11 12:47:14 tos-perf4 kernel: [] schedule_preempt_disabled+0x29/0x70
12572 Mar 11 12:47:14 tos-perf4 kernel: [] __mutex_lock_slowpath+0xc5/0x1c0
12573 Mar 11 12:47:14 tos-perf4 kernel: [] mutex_lock+0x1f/0x2f
12574 Mar 11 12:47:14 tos-perf4 kernel: [] zvol_rename_minors+0x6b/0x1b0 [zfs]
12575 Mar 11 12:47:14 tos-perf4 kernel: [] dsl_dir_rename_sync+0x322/0x5d0 [zfs]
12576 Mar 11 12:47:14 tos-perf4 kernel: [] ? __cv_wait+0x15/0x20 [spl]
12577 Mar 11 12:47:14 tos-perf4 kernel: [] dsl_sync_task_sync+0xf2/0x100 [zfs]
12578 Mar 11 12:47:14 tos-perf4 kernel: [] dsl_pool_sync+0x31b/0x430 [zfs]
12579 Mar 11 12:47:14 tos-perf4 kernel: [] spa_sync+0x388/0xb70 [zfs]
12580 Mar 11 12:47:14 tos-perf4 kernel: [] ? autoremove_wake_function+0x2b/0x40
12581 Mar 11 12:47:14 tos-perf4 kernel: [] txg_sync_thread+0x3cc/0x640 [zfs]
12582 Mar 11 12:47:14 tos-perf4 kernel: [] ? txg_fini+0x2a0/0x2a0 [zfs]
12583 Mar 11 12:47:14 tos-perf4 kernel: [] thread_generic_wrapper+0x71/0x80 [spl]
12584 Mar 11 12:47:14 tos-perf4 kernel: [] ? __thread_exit+0x20/0x20 [spl]
12585 Mar 11 12:47:14 tos-perf4 kernel: [] kthread+0xcf/0xe0
12586 Mar 11 12:47:14 tos-perf4 kernel: [] ? kthread_create_on_node+0x140/0x140
12587 Mar 11 12:47:14 tos-perf4 kernel: [] ret_from_fork+0x58/0x90
12588 Mar 11 12:47:14 tos-perf4 kernel: [] ? kthread_create_on_node+0x140/0x140

@zhenghc
Copy link
Author

zhenghc commented Mar 14, 2016

seems this is a duplicated of #3681

@kernelOfTruth
Copy link
Contributor

a0bd735 Add support for asynchronous zvol minor operations

@zhenghc you can try out to apply that commit on top of 0.6.5.5 or use an rpm of master, if that is available to see whether that improves things

Thank for the report

@behlendorf
Copy link
Contributor

This should be resolved in master with the following commits. I'm going to tentatively close this issue and we can reopen it if you're able to reproduce this in mater. This changes are more invasive than we'd normally backport to the release branch so you may need to cherry pick them for now.

1ee159f Fix lock order inversion with zvol_open()
a0bd735 Add support for asynchronous zvol minor operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants