Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Deadlock with CONFIG_DEBUG_MUTEX (ArchLinux) #167

Closed
delroth opened this Issue Mar 20, 2011 · 27 comments

Comments

Projects
None yet
6 participants

delroth commented Mar 20, 2011

The ZFS module deadlocked while I was rsync-ing a large (35GB) file to a ZFS mountpoint.

I'm using a 2.6.38 kernel with the last Git version of SPL and ZFS.

Here is the backtrace shown in dmesg :

INFO: task z_wr_int/0:4369 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/0      D ffff88018c17ffd8     0  4369      2 0x00000000
 ffff88018c17faf0 0000000000000046 ffff8801adf80c30 ffff8801af05a800
 0000000000000003 ffff88018c3506f0 ffff88018c17ffd8 ffff88018c17ffd8
 ffff88018c17ffd8 ffff88018c3509d0 ffff88018c17ffd8 ffff88018c17e000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3989>] ? mutex_unlock+0x9/0x10
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/1:4370 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/1      D 000000010008e03d     0  4370      2 0x00000000
 ffff88018d851af0 0000000000000046 ffff8801af188da0 ffff8801af188da0
 ffff8801af188da0 ffff88018c353e70 ffff88018d851fd8 ffff88018d851fd8
 ffff88018d851fd8 ffff88018c354150 ffff88018d851fd8 ffff88018d850000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3989>] ? mutex_unlock+0x9/0x10
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/2:4371 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/2      D 000000010008e02a     0  4371      2 0x00000000
 ffff88018d853af0 0000000000000046 ffff88018d853c7c 000000010008e02a
 0000000000000000 ffff88018c3575f0 ffff88018d853fd8 ffff88018d853fd8
 ffff88018d853fd8 ffff88018c3578d0 ffff88018d853fd8 ffff88018d852000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/3:4372 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/3      D 000000010008e03d     0  4372      2 0x00000000
 ffff88018d855af0 0000000000000046 ffff8801adf80c30 ffff8801af05a800
 0000000000000003 ffff88018c356f00 ffff88018d855fd8 ffff88018d855fd8
 ffff88018d855fd8 ffff88018c3571e0 ffff88018d855fd8 ffff88018d854000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3989>] ? mutex_unlock+0x9/0x10
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/4:4373 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/4      D 000000010008e065     0  4373      2 0x00000000
 ffff88018d857af0 0000000000000046 ffff8801adf80c30 000000010008e065
 0000000000000003 ffff88018c353780 ffff88018d857fd8 ffff88018d857fd8
 ffff88018d857fd8 ffff88018c353a60 ffff88018d857fd8 ffff88018d856000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/5:4374 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/5      D 000000010008e03f     0  4374      2 0x00000000
 ffff88018f231af0 0000000000000046 ffff8801adf80c30 ffff8801af05a800
 0000000000000003 ffff88018c350de0 ffff88018f231fd8 ffff88018f231fd8
 ffff88018f231fd8 ffff88018c3510c0 ffff88018f231fd8 ffff88018f230000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3989>] ? mutex_unlock+0x9/0x10
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/6:4375 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/6      D ffff88018f233fd8     0  4375      2 0x00000000
 ffff88018f233a90 0000000000000046 ffff88018f233940 ffffffff811e41e6
 ffff88018f2339a0 ffff88018c356810 ffff88018f233fd8 ffff88018f233fd8
 ffff88018f233fd8 ffff88018c356af0 ffff88018f233fd8 ffff88018f232000
Call Trace:
 [<ffffffff811e41e6>] ? elv_set_request+0x16/0x30
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa08056f9>] dsl_dir_diduse_space+0x119/0x160 [zfs]
 [<ffffffffa07faf22>] dsl_dataset_block_born+0x132/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/7:4376 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/7      D 000000010008e03d     0  4376      2 0x00000000
 ffff88018f235af0 0000000000000046 ffff8801adf80c30 ffff880100000000
 0000000000000003 ffff88018c3529a0 ffff88018f235fd8 ffff88018f235fd8
 ffff88018f235fd8 ffff88018c352c80 ffff88018f235fd8 ffff88018f234000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/8:4377 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/8      D 000000010008e034     0  4377      2 0x00000000
 ffff88018f237af0 0000000000000046 ffff8801adf80c30 ffff880100000000
 0000000000000000 ffff88018c354c50 ffff88018f237fd8 ffff88018f237fd8
 ffff88018f237fd8 ffff88018c354f30 ffff88018f237fd8 ffff88018f236000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
INFO: task z_wr_int/9:4378 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
z_wr_int/9      D ffff88018a419fd8     0  4378      2 0x00000000
 ffff88018a419af0 0000000000000046 ffff8801adf80c30 ffff8801af05a800
 0000000000000003 ffff88018c3522b0 ffff88018a419fd8 ffff88018a419fd8
 ffff88018a419fd8 ffff88018c352590 ffff88018a419fd8 ffff88018a418000
Call Trace:
 [<ffffffff813b3bbd>] ? __mutex_lock_slowpath+0x22d/0x310
 [<ffffffff813b3989>] ? mutex_unlock+0x9/0x10
 [<ffffffff813b3ac6>] __mutex_lock_slowpath+0x136/0x310
 [<ffffffff813b3cb1>] mutex_lock+0x11/0x30
 [<ffffffffa07faea4>] dsl_dataset_block_born+0xb4/0x200 [zfs]
 [<ffffffffa07decd4>] dbuf_write_done+0x1a4/0x250 [zfs]
 [<ffffffffa07d6bfa>] arc_write_done+0xda/0x290 [zfs]
 [<ffffffffa086aff2>] zio_done+0x212/0xb40 [zfs]
 [<ffffffffa0771471>] ? kmem_free_debug+0x11/0x20 [spl]
 [<ffffffffa0833f8f>] ? vdev_mirror_map_free+0x1f/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa08359a5>] ? vdev_raidz_map_free_vsd+0x25/0x30 [zfs]
 [<ffffffffa086b43f>] zio_done+0x65f/0xb40 [zfs]
 [<ffffffffa0867609>] zio_execute+0x79/0x100 [zfs]
 [<ffffffff8104701e>] ? __wake_up+0x4e/0x70
 [<ffffffffa0773835>] taskq_thread+0x1d5/0x3b0 [spl]
 [<ffffffff81051d70>] ? default_wake_function+0x0/0x10
 [<ffffffffa0773660>] ? taskq_thread+0x0/0x3b0 [spl]
 [<ffffffff81079446>] kthread+0x96/0xa0
 [<ffffffff8100cbe4>] kernel_thread_helper+0x4/0x10
 [<ffffffff810793b0>] ? kthread+0x0/0xa0
 [<ffffffff8100cbe0>] ? kernel_thread_helper+0x0/0x10
Owner

behlendorf commented Mar 20, 2011

Thanks for the bug report. There is one well known deadlock which can occur but it's unfortunately an upstream kernel bug. If your comfortable with rebuilding your kernel you can try the following patch.

https://bugzilla.kernel.org/attachment.cgi?id=50802

In the meanwhile I've been investigating a way to avoid the issue without patching the kernel.

delroth commented Mar 20, 2011

I do not see any vmalloc in the tracebacks. Are you sure this is related to this kernel bug?

@delroth delroth closed this Mar 20, 2011

Owner

behlendorf commented Mar 20, 2011

I didn't either, but the traces also looked incomplete so it was an educated guess. What was there for the traces didn't clearly show the root cause

@behlendorf behlendorf reopened this Mar 20, 2011

L0cutus commented Mar 25, 2011

the same happens to me 2 times:
http://pastebin.com/GvtGAjMw
zfs list:
NAME USED AVAIL REFER MOUNTPOINT
tank 10,3G 446G 21K /media/tank
tank/fish 10,3G 455G 1,40G -

mount:
tank on /media/tank type zfs (rw)
/dev/zd0p1 on /mnt/tank/fish@pristine1 type ext2 (ro)
/dev/zd16p1 on /mnt/tank/fish1 type ext2 (rw)

action that cause the deadlock:

[kki@l0cutus ~]$ cp /home/kki/.jd/downloads/* /mnt/tank/fish1/
sending incremental file list
file1.iso
4.43G 100% 59.84MB/s 0:01:10 (xfer#1, to-check=267/269)
file2.iso
3.19G 100% 63.85MB/s 0:00:47 (xfer#2, to-check=266/269)
02 Rocce E Scogli/02 Rocce E Scogli.avi
362.86M 100% 41.06MB/s 0:00:08 (xfer#3, to-check=188/269)
02 Rocce E Scogli/02 Rocce E Scogli.part1.rar
93.06M 49% 15.77MB/s 0:00:05

and in that position it stopped, the only solution was a reboot.

kernel: 2.6.37
os: Archlinux
latest git spl/zfs
the test disk was a 500gb usb disk

thanks !

Contributor

clefru commented Apr 8, 2011

I have been following the long long thread on Issue 154. Claims there are that we don't need this kernel patch anymore with recent code.

I tested most recent code from yesterday in my VirtualBox play environment (2GB ram), but I still see this deadlock, simply by running around 3 bonnie++ on a fresh ZFS.

Unfortunately bugzilla is down. Could someone drop the patch from above in a different location? Is that the patch in question http://permalink.gmane.org/gmane.linux.kernel.mm/59658 ?

I would highly appreciate a two line update on that issue. Thank you!

Owner

behlendorf commented Apr 8, 2011

Are you sure you are still seeing this deadlock? There are still a couple slightly differently deadlocks lurking particularly for machines with only 2GB of ram. That's one of the major reasons I'm still calling this a release candidate. If you can open a new issue with the console stacks from your deadlock I'm happy to take a look.

Contributor

clefru commented Apr 8, 2011

I am sorry. Where are my manners? Here is a full log:
http://clemens.endorphin.org/zfs-lockup-issue167.log

It has the same anatomy as the stack trace posted in this issue. The ZFS volume was fresh from zpool create + zfs primarycache=none.

Owner

behlendorf commented Apr 8, 2011

This was determined to be a different deadlock than that described in #154. Your stack matches the one in this bug, but the original root cause determination was wrong. In this case the deadlock looks to be related to dsl_dir_diduse_space(). From your stacks.

z_wr_int/4:1099
    mutex_lock+0x11/0x30
    dsl_dir_diduse_space+0x119/0x160 [zfs]
    dsl_dataset_block_born+0x132/0x200 [zfs]
    dbuf_write_done+0x1a4/0x250 [zfs]
  • dsl_dataset_block_born() takes ds->ds_dir->dd_lock mutex
    • dsl_dir_diduse_space() incorrectly determines mutex is not held and takes it

This results in a deadlock. It's not clear how this is possible since !MUTEX_HELD() works well in the rest of the code. However, that's what the stack clearly shows. We'll make sure we dig in to how this can happen but it may take us a little bit to get to it.

Contributor

clefru commented Apr 8, 2011

Thank you for the ultra quick response.

Just to make sure that I get what you are saying (I have trouble following multiple uses of "this"):

It would be interesting for me, if other users have that problem. I simply have to run 3 concurrent bonnie++-s and I almost immediately run into this deadlock. If your system survives that test, please leave me a note. I would really like to get to the source of a potential workaround.

(My kernel is PREEMPT_NONE btw - if that would help, I can provide my VirtualBox vdi image)

Owner

behlendorf commented Apr 8, 2011

You have it exactly right.

Thus far I've only heard of one or two other people who have encountered this. Since it hasn't been that common it has not moved to the top of the list for things to get fixed. I personally haven't seen this while running bonnie++ so it's likely related to how your kernel is built. I see your running a 2.6.37 kernel with archlinux.

Since this looks like it's related to MUTEX_HELD behaving incorrectly it would be useful to know how HAVE_MUTEX_OWNER is set in the spl_config.h.

delroth commented Apr 8, 2011

I am also using the Arch kernel with PREEMPT_NONE btw. This may be linked somehow.

@delroth delroth closed this Apr 8, 2011

Contributor

clefru commented Apr 8, 2011

I was suspecting the arch kernel too, so I ran inconclusive systematic tests two months ago with the KQ branch, where I ran into this problem too: https://spreadsheets2.google.com/a/endorphin.org/ccc?hl=en&hl=en&key=t9tjFCJ9fngkBzHFBXk9D3w&authkey=CIKig4MK#gid=0

The only configuration I found sorta stable was Ubuntu kernel + Ubuntu config + 1 CPU. At some point however, I found that configuration failing too, and I gave up in frustration.

Unfortunately, I don't have test hardware available to do more systematic test... maybe I can buy some. I want to get this working so desperately.

Owner

behlendorf commented Apr 8, 2011

For what it's worth I'm using Darik's Ubuntu PPA for my home NAS and it's working quite well. He does a nice job tracking the latest fixes which haven't made it in to an -rc tag just yet.

Back to this bug we'll get it fixed but to be honest we haven't focused on it yet. It sounds like what I need to do is install a VM for ARCH Linux and do a little testing there. There are quite a few ARCH users now it would be good to ensure it works for them.

Contributor

clefru commented Apr 9, 2011

I digged some more: http://clemens.endorphin.org/arch-kill-diff - that's the minimal configuration diff I could find that discriminates lockup and no-lockup. It is basically CONFIG_DEBUG_MUTEXES=n which triggers a lot of interesting other defines including: CONFIG_MUTEX_SPIN_ON_OWNER=y, CONFIG_INLINE_(SPIN|READ|WRITE)_UNLOCK(_IRQ)?=y.

Result:

  • 4 parallel bonnie++s don't lock up the system anymore with that config.
  • 25%-50% the bonnie++ processes die in the "Rewrite" step with a segfault after "drastic I/O error (re-write read): Input/output error", while on ext4 bonnie++ runs fine (serially 3 times).

I am not sure, if the more aggressive mutex setting just replaces a lock-up with a race... :/

Owner

behlendorf commented Apr 9, 2011

Thanks for doing the leg work on this. CONFIG_DEBUG_MUTEX makes sense as the likely cause since of what's causing the deadlock. My hunch is that for some reason this causes MUTEX_HELD to return incorrect results which will trigger the deadlock.

As for the segfault that sounds like it may be a different issue. Running ZFS in the kernel would be hard pressed to cause a user application to segfault. All it can do is make a system call behave incorrectly. This could lead to a segfault if the application doesn't correctly handle the system call error.

L0cutus commented Apr 9, 2011

after i've compiled the latest archlinux kernel and latest spl/zfs git, rsync doesn't 'die' anymore, BUT system become unusable (too much cpu&ram intensive), same operation
done with zfs-fuse doesn't make unusable my notebook.
i've an Intel dual core U2300 @ 1.20GHz with 3gb of ram, the test consist into copy some files with rsync to an zfs-compressed-deduped usb disk:
This is done with zfs-fuse while i'm writing this message, the same operation with
zfs 'native' have the same results BUT with my pc frozen until finished.

Number of files: 29940
Number of files transferred: 25892
Total file size: 2.90G bytes
Total transferred file size: 2.90G bytes
Literal data: 2.90G bytes
Matched data: 0 bytes
File list size: 885.79K
File list generation time: 0.030 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 2.91G
Total bytes received: 508.15K

sent 2.91G bytes received 508.15K bytes 6.31M bytes/sec
total size is 2.90G speedup is 1.00

[root@l0cutus ~]# zpool get all test

NAME PROPERTY VALUE SOURCE
test size 464G -
test capacity 0% -
test altroot - default
test health ONLINE -
test guid 12250003360106422442 default
test version 23 default
test bootfs - default
test delegation on default
test autoreplace off default
test cachefile - default
test failmode wait default
test listsnapshots off default
test autoexpand off default
test dedupditto 0 default
test dedupratio 2.11x -
test free 462G -
test allocated 2,18G -

[root@l0cutus ~]# zfs get all test

NAME PROPERTY VALUE SOURCE
test type filesystem -
test creation sab apr 9 19:21 2011 -
test used 4,55G -
test available 455G -
test referenced 4,54G -
test compressratio 1.22x -
test mounted yes -
test quota none default
test reservation none default
test recordsize 128K default
test mountpoint /media/TEST local
test sharenfs off default
test checksum on default
test compression on local
test atime off local
test devices on default
test exec on default
test setuid on default
test readonly off default
test zoned off default
test snapdir hidden default
test aclmode groupmask default
test aclinherit restricted default
test canmount on default
test xattr on default
test copies 1 default
test version 4 -
test utf8only off -
test normalization none -
test casesensitivity sensitive -
test vscan off default
test nbmand off default
test sharesmb off default
test refquota none default
test refreservation none default
test primarycache all default
test secondarycache all default
test usedbysnapshots 0 -
test usedbydataset 4,54G -
test usedbychildren 11,7M -
test usedbyrefreservation 0 -
test logbias latency default
test dedup on local
test mlslabel on -

[root@l0cutus ~]# uname -a

Linux l0cutus 2.6.37-ARCH #1 SMP PREEMPT Fri Mar 25 15:10:00 CET 2011 x86_64 Genuine Intel(R) CPU U2300 @ 1.20GHz GenuineIntel GNU/Linux

Contributor

clefru commented Apr 9, 2011

behlendorf: It's most likely a bug in bonnie++ that's causing the segfault. It's just handling the I/O error incorrectly. And btw I changed to 4GB on bare metal (no VirtualBox) for this testing.

Legendin commented Jun 1, 2011

I can confirm this problem on 2.6.38-ARCH / zfs 0.6.0rc4 (8GB on bare metal with 3x2TB in raidz). I can also confirm that using a kernel configured with clefru's config remedies the situation (thank you clefru!).

@behlendorf behlendorf reopened this Jun 1, 2011

Owner

behlendorf commented Jun 1, 2011

After rereading the history in this bug I'm going to reopen it. While we have a work around I'd rather have a real fix.

delroth commented Jun 1, 2011

Actually I did not realize I closed it with my last comment. May have clicked on "Comment & Close" instead of "Comment" when replying...

Member

gunnarbeutner commented Oct 16, 2011

I can reliably reproduce this with "/usr/local/libexec/zfs/zconfig.sh -c -t 5". The problem is that we're clearing the mutex owner while not holding the mutex:

Thread 1:
mutex_unlock(mp)

Thread 2:
mutex_lock(mp)
-> mutex_set_owner(mp)
--> mp->m.owner = current_thread_info();

Thread 1:
mp->m.owner = NULL;

Thread 2:
MUTEX_HELD(mp) == 0

Owner

behlendorf commented Oct 17, 2011

Indeed, your right... I remember this case. When I was working on this code I wasn't able to 100% close this race in mutex_exit() when (HAVE_MUTEX_OWNER && CONFIG_SMP && CONFIG_DEBUG_MUTEXES) are set. The problem is that when CONFIG_DEBUG_MUTEXES is set mutex_unlock() doesn't clear the owner, so I clear it just after the lock is dropped (hense the race). We could clear it just before the lock is dropped however I think there may be a substantial time interval when the lock is then still technically held but has no owner. Still that may be preferable. If you have a solid reproducer could you try this.

diff --git a/include/sys/mutex.h b/include/sys/mutex.h
index 659214f..9cad228 100644
--- a/include/sys/mutex.h
+++ b/include/sys/mutex.h
@@ -84,8 +84,8 @@ mutex_owner(kmutex_t *mp)
 #ifdef CONFIG_DEBUG_MUTEXES
 # define mutex_exit(mp)                                                 \
 ({                                                                      \
-        mutex_unlock(&(mp)->m);                                         \
         (mp)->m.owner = NULL;                                           \
+        mutex_unlock(&(mp)->m);                                         \
 })
 #else
 # define mutex_exit(mp)                 mutex_unlock(&(mp)->m)
Member

gunnarbeutner commented Oct 17, 2011

Well, this almost works - except for the fact where debug_mutex_unlock() checks the lock's owner (which by that time has been set to NULL):

[  239.156374] ------------[ cut here ]------------
[  239.156621] WARNING: at kernel/mutex-debug.c:78 debug_mutex_unlock+0xdb/0xe0()
[  239.156705] Hardware name: VMware Virtual Platform
[  239.156776] Modules linked in: zfs(+) zcommon znvpair zavl zunicode spl vesafb vmw_balloon zlib_deflate psmouse serio_raw i2c_piix4 shpchp mptspi mptscsih e1000 mptbase floppy [last unloaded: spl]
[  239.157006] Pid: 11930, comm: arc_reclaim Not tainted 2.6.38.8 #4
[  239.157083] Call Trace:
[  239.157147]  [<ffffffff81064cef>] ? warn_slowpath_common+0x7f/0xc0
[  239.157227]  [<ffffffff81064d4a>] ? warn_slowpath_null+0x1a/0x20
[  239.157651]  [<ffffffff8109931b>] ? debug_mutex_unlock+0xdb/0xe0
[  239.157735]  [<ffffffff815cba00>] ? __mutex_unlock_slowpath+0x90/0x140
[  239.157817]  [<ffffffff815cbabe>] ? mutex_unlock+0xe/0x10
[  239.157898]  [<ffffffffa00dfec5>] ? __cv_timedwait_common+0xa5/0x140 [spl]
[  239.157985]  [<ffffffff81087570>] ? autoremove_wake_function+0x0/0x50
[  239.158070]  [<ffffffffa00dff73>] ? __cv_timedwait_interruptible+0x13/0x20 [spl]
[  239.158191]  [<ffffffffa0161ed3>] ? arc_reclaim_thread+0xc3/0x160 [zfs]
[  239.158300]  [<ffffffffa0161e10>] ? arc_reclaim_thread+0x0/0x160 [zfs]
[  239.158386]  [<ffffffffa00dc318>] ? thread_generic_wrapper+0x78/0x90 [spl]
[  239.158472]  [<ffffffffa00dc2a0>] ? thread_generic_wrapper+0x0/0x90 [spl]
[  239.158555]  [<ffffffff81086fbe>] ? kthread+0xbe/0xd0
[  239.158629]  [<ffffffff8104d906>] ? finish_task_switch+0x46/0xe0
[  239.158708]  [<ffffffff8100cd24>] ? kernel_thread_helper+0x4/0x10
[  239.158787]  [<ffffffff81086f00>] ? kthread+0x0/0xd0
[  239.158860]  [<ffffffff8100cd20>] ? kernel_thread_helper+0x0/0x10

I don't think we can safely rely on the m.owner variable in this case. Clearing it before mutex_unlock() would cause warnings - while clearing it afterwards causes deadlocks. Here's another option (i.e. using the m_owner variable):

diff --git a/include/sys/mutex.h b/include/sys/mutex.h
index 659214f..c55104a 100644
--- a/include/sys/mutex.h
+++ b/include/sys/mutex.h
@@ -35,7 +35,7 @@ typedef enum {
         MUTEX_ADAPTIVE = 2
 } kmutex_type_t;

-#if defined(HAVE_MUTEX_OWNER) && defined(CONFIG_SMP)
+#if defined(HAVE_MUTEX_OWNER) && defined(CONFIG_SMP) && !defined(CONFIG_DEBUG_MUTEXES)

 /*
  * We define a 1-field struct rather than a straight typedef to enforce type
@@ -79,17 +79,7 @@ mutex_owner(kmutex_t *mp)

 #define mutex_tryenter(mp)              mutex_trylock(&(mp)->m)
 #define mutex_enter(mp)                 mutex_lock(&(mp)->m)
-
-/* mutex->owner is not cleared when CONFIG_DEBUG_MUTEXES is set */
-#ifdef CONFIG_DEBUG_MUTEXES
-# define mutex_exit(mp)                                                 \
-({                                                                      \
-        mutex_unlock(&(mp)->m);                                         \
-        (mp)->m.owner = NULL;                                           \
-})
-#else
-# define mutex_exit(mp)                 mutex_unlock(&(mp)->m)
-#endif /* CONFIG_DEBUG_MUTEXES */
+#define mutex_exit(mp)                  mutex_unlock(&(mp)->m)

 #ifdef HAVE_GPL_ONLY_SYMBOLS
 # define mutex_enter_nested(mp, sc)     mutex_lock_nested(&(mp)->m, sc)

I believe we can also safely remove the spinlocks in spl_mutex_set_owner/spl_mutex_clear_owner because these functions are only called while holding the mutex. mutex_owner() could be made safe using ACCESS_ONCE (instead of the spinlock).

Another option would be to unify the mutex codebase and get rid of the m.owner special case. But I guess the additional memory accesses for m_owner might have a bit of a performance impact. This would need further testing.

(Oh, and btw: the MUTEX #define in mutex.h:104 is weird - it works as intended due to how the struct is laid-out in memory but should probably be changed to match the kmutex_t struct).

Hmm, I guess I should set up a branch for this and get started on those patches. :)

Owner

behlendorf commented Oct 18, 2011

Ahh yes, it's all coming back to me!

Since we can't seem to safely use the existing m.owner when CONFIG_DEBUG_MUTEXES is set it does seem like the only safe thing to do is fallback to the !HAVE_MUTEX_OWNER compatibility implementation. This is clearly the common case anyway since most default distribution kernels are built with this disabled.

Gunnar, if you can propose and test two patches for these mutex fixes I'm happy to retest and commit them. Proposed changes:

  • Remove explicit CONFIG_DEBUG_MUTEXES case as proposed above
  • Remove spin locks from spl_mutex_set/clear_owner and mutex_owner. I agree this should be safe with the current usage although it also shouldn't be costing us anything either. By definition the spin locks must be uncontended due to the mutex, we should verify this by replacing it with a VERIFY(spin_trylock()) in a debugging patch. Better safe than sorry.

As for the MUTEX define on 104.4 I don't think that's too weird, By design we are relying on how this structure get's organized in memory.

Member

gunnarbeutner commented Oct 18, 2011

Actually, it looks like wait_lock isn't held while the mutex is locked and other threads will lock/unlock wait_lock when they're trying to acquire the mutex - which means spin_trylock() is actually likely to fail in spl_mutex_set_owner/spl_mutex_clear_owner.

I'll probably have the patches finished sometime tomorrow.

Owner

behlendorf commented Oct 19, 2011

Everything looks good, I've reviewed and merged the fixes in to the spl and my usual regression test suite passed on the 12 distributions I always test. Close this issue, thanks Gunnar!

@behlendorf behlendorf closed this Oct 19, 2011

@dajhorn dajhorn referenced this issue in zfsonlinux/pkg-zfs Dec 14, 2011

@gunnarbeutner @behlendorf gunnarbeutner + behlendorf Fix race condition in mutex_exit()
On kernels with CONFIG_DEBUG_MUTEXES mutex_exit() clears the mutex
owner after releasing the mutex. This would cause mutex_owner()
to return an incorrect owner if another thread managed to lock the
mutex before mutex_exit() had a chance to clear the owner.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS issue #167
3160d4f

@dajhorn dajhorn referenced this issue in zfsonlinux/pkg-zfs Sep 7, 2012

@dechamps @behlendorf dechamps + behlendorf Add DKIOCTRIM for TRIM support.
See dechamps/zfs@cc6cd40 for details.

This harmless addition was merged to simplify testing the ZFS TRIM
support patches.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #167
ac8ca67
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment