Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O starvation during VirtualBox VM installation #5867

Closed
krichter722 opened this issue Mar 6, 2017 · 29 comments
Closed

I/O starvation during VirtualBox VM installation #5867

krichter722 opened this issue Mar 6, 2017 · 29 comments
Milestone

Comments

@krichter722
Copy link

krichter722 commented Mar 6, 2017

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 16.10
Linux Kernel 4.8.0-39-generic
Architecture amd64
ZFS Version 0.7.0-rc3_123_gebd9aa8c17
SPL Version 0.7.0-rc3_2_g97048200

Describe the problem you're observing

On a source installation with git clean -x -f -d && ./autogen.sh && ./configure && make && sudo make install && sudo make deb-kmod && sudo dpkg -i *.deb with the above versions, the installation of an Ubuntu 16.10 VM in VirtualBox 5.1.14 r112924 causes extreme I/O starvation which always (100 % of 5 tries with and without reboots in between) starts with 6 z_fr_iss_[n] processes/threads, then a short appearance of > 40 z_fr_iss_[n]s, finally a short period of one txg_sync all taking 100 % of I/O according to iotop. The read and write rate rarely exceed 5 MB/s, mostly remain at a few 100 KB/s, then in a final burst during txg_sync write rates goes up to the limit of the HDD (ca .180 MB/s) for a few seconds and seems to write everything it should have written in the last minutes.

The unresponsiveness of programs is quite minimal, e.g. vlc reading from the ZFS dataset to fill the buffer sometimes gets a few KB every few seconds (has a bar showing the streaming progress, so I can tell). Only VirtualBox is requiring so much throughput that the system go to the edge of collapse for several minutes. Maybe it does something different from "normal" I/O.

The system always recovers from the starvation. The only pool on the system (with 1 HDD and 1 SDD as cache, both based on partitions which don't cover the full device) has been upgraded with zpool upgrade. There's a z_upgrade process/thread which caused some I/O according to iotop and now doesn't seem to do anything anymore (is sleeping constantly).

The symptoms are very similar to #5692, but the blocked for more than 120 seconds message either isn't logged at all or also for different (apparently random processes) with different backtraces (see below). It's possible that those backtraces are unrelated to the problem which also would explain the issue to reoccur after applying the suggested fix in #5692.

I know that the hardware setup isn't ideal for ZFS, but except for VirtualBox the performance is excellent and has improved dramatically over the years. I'd really appreciate if we could sort this out further.

Describe how to reproduce the problem

Install as described above, use VirtualBox as described and wait for the issue to happen.

Include any warning/errors/backtraces from the system logs

relevant part of dmesg output [Mär 6 18:02] INFO: task txg_sync:12713 blocked for more than 120 seconds. [ +0,000008] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000003] txg_sync D ffff96c3e6a2fac8 0 12713 2 0x00000000 [ +0,000017] ffff96c3e6a2fac8 00ff96c408708080 ffff96c42ce75880 ffff96c410debb00 [ +0,000001] 0000000000000000 ffff96c3e6a30000 ffff96c43f319300 7fffffffffffffff [ +0,000002] ffff96c095dcceb8 0000000000000001 ffff96c3e6a2fae0 ffffffffb0e9bd35 [ +0,000002] Call Trace: [ +0,000005] [] schedule+0x35/0x80 [ +0,000002] [] schedule_timeout+0x22a/0x3f0 [ +0,000046] [] ? zio_taskq_dispatch+0x91/0xa0 [zfs] [ +0,000032] [] ? zio_issue_async+0x12/0x20 [zfs] [ +0,000031] [] ? zio_nowait+0xb6/0x150 [zfs] [ +0,000003] [] ? ktime_get+0x41/0xb0 [ +0,000002] [] io_schedule_timeout+0xa4/0x110 [ +0,000007] [] cv_wait_common+0xb2/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000004] [] __cv_wait_io+0x18/0x20 [spl] [ +0,000029] [] zio_wait+0xec/0x190 [zfs] [ +0,000026] [] dsl_pool_sync+0xb8/0x490 [zfs] [ +0,000027] [] spa_sync+0x412/0xd70 [zfs] [ +0,000029] [] txg_sync_thread+0x2c4/0x480 [zfs] [ +0,000029] [] ? txg_delay+0x160/0x160 [zfs] [ +0,000003] [] thread_generic_wrapper+0x71/0x80 [spl] [ +0,000003] [] ? __thread_exit+0x20/0x20 [spl] [ +0,000002] [] kthread+0xd8/0xf0 [ +0,000002] [] ret_from_fork+0x1f/0x40 [ +0,000002] [] ? kthread_create_on_node+0x1e0/0x1e0 [ +0,000034] INFO: task DOM Worker:21357 blocked for more than 120 seconds. [ +0,000002] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000001] DOM Worker D ffff96c3061a7d38 0 21357 17450 0x00000000 [ +0,000002] ffff96c3061a7d38 0000000088bffca3 ffff96c3c2e2d880 ffff96c39fc31d80 [ +0,000001] ffff96c3a80ae740 ffff96c3061a8000 ffff96c4265bd028 ffff96c4265bd000 [ +0,000002] ffff96c4265bd1c8 0000000000000000 ffff96c3061a7d50 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000002] [] schedule+0x35/0x80 [ +0,000004] [] cv_wait_common+0x110/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000004] [] __cv_wait+0x15/0x20 [spl] [ +0,000029] [] zil_commit.part.11+0x79/0x7a0 [zfs] [ +0,000005] [] ? tsd_hash_search.isra.0+0x46/0xa0 [spl] [ +0,000003] [] ? tsd_set+0x2b4/0x500 [spl] [ +0,000002] [] ? mutex_lock+0x12/0x30 [ +0,000029] [] zil_commit+0x17/0x20 [zfs] [ +0,000029] [] zfs_fsync+0x7a/0xf0 [zfs] [ +0,000029] [] zpl_fsync+0x68/0xa0 [zfs] [ +0,000002] [] vfs_fsync_range+0x4b/0xb0 [ +0,000002] [] ? SyS_futex+0x85/0x170 [ +0,000001] [] do_fsync+0x3d/0x70 [ +0,000002] [] SyS_fsync+0x10/0x20 [ +0,000002] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ +0,000002] INFO: task mozStorage #2:21636 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000001] mozStorage #2 D ffff96c2f29ffd38 0 21636 17450 0x00000000 [ +0,000002] ffff96c2f29ffd38 00ff96c3090e53b0 ffffffffb140d540 ffff96c2f6813b00 [ +0,000001] 00000001802a002a ffff96c2f2a00000 ffff96c4265bd028 ffff96c4265bd000 [ +0,000002] ffff96c4265bd1c8 0000000000000000 ffff96c2f29ffd50 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000001] [] schedule+0x35/0x80 [ +0,000005] [] cv_wait_common+0x110/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000003] [] __cv_wait+0x15/0x20 [spl] [ +0,000029] [] zil_commit.part.11+0x79/0x7a0 [zfs] [ +0,000004] [] ? tsd_hash_search.isra.0+0x46/0xa0 [spl] [ +0,000004] [] ? tsd_set+0x2b4/0x500 [spl] [ +0,000001] [] ? mutex_lock+0x12/0x30 [ +0,000029] [] zil_commit+0x17/0x20 [zfs] [ +0,000028] [] zfs_fsync+0x7a/0xf0 [zfs] [ +0,000029] [] zpl_fsync+0x68/0xa0 [zfs] [ +0,000002] [] vfs_fsync_range+0x4b/0xb0 [ +0,000001] [] do_fsync+0x3d/0x70 [ +0,000001] [] SyS_fsync+0x10/0x20 [ +0,000002] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ +0,000005] INFO: task DOM Worker:22425 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000000] DOM Worker D ffff96c364a33c18 0 22425 17450 0x00000000 [ +0,000002] ffff96c364a33c18 00ff96c14c33a008 ffff96c42ce72c40 ffff96c38e3349c0 [ +0,000001] 0000000000000001 ffff96c364a34000 ffff96c43f359300 7fffffffffffffff [ +0,000002] ffff96c0962cb748 0000000000000001 ffff96c364a33c30 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000001] [] schedule+0x35/0x80 [ +0,000002] [] schedule_timeout+0x22a/0x3f0 [ +0,000001] [] ? __wake_up+0x44/0x50 [ +0,000003] [] ? taskq_dispatch_ent+0x55/0x170 [spl] [ +0,000002] [] ? ktime_get+0x41/0xb0 [ +0,000001] [] io_schedule_timeout+0xa4/0x110 [ +0,000004] [] cv_wait_common+0xb2/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000003] [] __cv_wait_io+0x18/0x20 [spl] [ +0,000029] [] zio_wait+0xec/0x190 [zfs] [ +0,000029] [] zil_commit.part.11+0x3c6/0x7a0 [zfs] [ +0,000004] [] ? tsd_hash_search.isra.0+0x46/0xa0 [spl] [ +0,000028] [] zil_commit+0x17/0x20 [zfs] [ +0,000029] [] zfs_fsync+0x7a/0xf0 [zfs] [ +0,000029] [] zpl_fsync+0x68/0xa0 [zfs] [ +0,000001] [] vfs_fsync_range+0x4b/0xb0 [ +0,000002] [] ? SyS_futex+0x85/0x170 [ +0,000001] [] do_fsync+0x3d/0x70 [ +0,000002] [] SyS_fsync+0x10/0x20 [ +0,000001] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [Mär 6 18:08] INFO: task kworker/u16:8:2058 blocked for more than 120 seconds. [ +0,000008] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000003] kworker/u16:8 D ffff96c4222937a0 0 2058 2 0x00000000 [ +0,000013] Workqueue: writeback wb_workfn (flush-zfs-4) [ +0,000014] ffff96c4222937a0 00ff96c3090aa720 ffffffffb140d540 ffff96c421e5c9c0 [ +0,000001] ffffffffb080b26a ffff96c422294000 ffff96c41e50d170 ffff96c41e50d148 [ +0,000002] ffff96c41e50d180 0000000000000000 ffff96c4222937b8 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000004] [] ? __kmalloc_node+0x1da/0x2a0 [ +0,000003] [] schedule+0x35/0x80 [ +0,000008] [] cv_wait_common+0x110/0x130 [spl] [ +0,000002] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000003] [] __cv_wait+0x15/0x20 [spl] [ +0,000033] [] dmu_tx_wait+0x9b/0x340 [zfs] [ +0,000032] [] zfs_putpage+0x52c/0x550 [zfs] [ +0,000002] [] ? page_mkclean_one+0x44/0xe0 [ +0,000001] [] ? rmap_walk_file+0x19c/0x260 [ +0,000001] [] ? rmap_walk+0x41/0x60 [ +0,000001] [] ? page_mkclean+0x9f/0xb0 [ +0,000001] [] ? __page_check_address+0x1c0/0x1c0 [ +0,000031] [] zpl_putpage+0x38/0x50 [zfs] [ +0,000001] [] write_cache_pages+0x205/0x530 [ +0,000030] [] ? zpl_write_common_iovec+0xe0/0xe0 [zfs] [ +0,000002] [] ? find_next_bit+0x18/0x20 [ +0,000002] [] ? mutex_lock+0x12/0x30 [ +0,000027] [] ? rrw_exit+0x45/0x130 [zfs] [ +0,000041] [] zpl_writepages+0x94/0x170 [zfs] [ +0,000001] [] do_writepages+0x1e/0x30 [ +0,000002] [] __writeback_single_inode+0x45/0x320 [ +0,000001] [] writeback_sb_inodes+0x268/0x5f0 [ +0,000002] [] __writeback_inodes_wb+0x92/0xc0 [ +0,000001] [] wb_writeback+0x278/0x310 [ +0,000002] [] wb_workfn+0x2df/0x410 [ +0,000001] [] ? __switch_to+0x2ce/0x6c0 [ +0,000002] [] process_one_work+0x1fc/0x4b0 [ +0,000001] [] worker_thread+0x4b/0x500 [ +0,000001] [] ? process_one_work+0x4b0/0x4b0 [ +0,000001] [] kthread+0xd8/0xf0 [ +0,000002] [] ret_from_fork+0x1f/0x40 [ +0,000002] [] ? kthread_create_on_node+0x1e0/0x1e0 [ +0,000026] INFO: task txg_sync:12713 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000001] txg_sync D ffff96c3e6a2fac8 0 12713 2 0x00000000 [ +0,000002] ffff96c3e6a2fac8 00ff96c408708080 ffff96c42cfa3b00 ffff96c410debb00 [ +0,000001] 0000000000000000 ffff96c3e6a30000 ffff96c43f399300 7fffffffffffffff [ +0,000001] ffff96c37aedade8 0000000000000001 ffff96c3e6a2fae0 ffffffffb0e9bd35 [ +0,000002] Call Trace: [ +0,000001] [] schedule+0x35/0x80 [ +0,000002] [] schedule_timeout+0x22a/0x3f0 [ +0,000030] [] ? zio_taskq_dispatch+0x91/0xa0 [zfs] [ +0,000029] [] ? zio_issue_async+0x12/0x20 [zfs] [ +0,000029] [] ? zio_nowait+0xb6/0x150 [zfs] [ +0,000002] [] ? ktime_get+0x41/0xb0 [ +0,000002] [] io_schedule_timeout+0xa4/0x110 [ +0,000005] [] cv_wait_common+0xb2/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000004] [] __cv_wait_io+0x18/0x20 [spl] [ +0,000029] [] zio_wait+0xec/0x190 [zfs] [ +0,000024] [] dsl_pool_sync+0xb8/0x490 [zfs] [ +0,000027] [] spa_sync+0x412/0xd70 [zfs] [ +0,000028] [] txg_sync_thread+0x2c4/0x480 [zfs] [ +0,000028] [] ? txg_delay+0x160/0x160 [zfs] [ +0,000004] [] thread_generic_wrapper+0x71/0x80 [spl] [ +0,000003] [] ? __thread_exit+0x20/0x20 [spl] [ +0,000001] [] kthread+0xd8/0xf0 [ +0,000002] [] ret_from_fork+0x1f/0x40 [ +0,000001] [] ? kthread_create_on_node+0x1e0/0x1e0 [ +0,000016] INFO: task bitcoin-msghand:26858 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000001] bitcoin-msghand D ffff96c1b60139a8 0 26858 18042 0x00000000 [ +0,000001] ffff96c1b60139a8 00ffffffc0b0365f ffff96c42c506740 ffff96c3a80a8000 [ +0,000002] ffff96c25759f800 ffff96c1b6014000 ffff96c41e50d170 ffff96c41e50d148 [ +0,000001] ffff96c41e50d180 0000000000000000 ffff96c1b60139c0 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000002] [] schedule+0x35/0x80 [ +0,000004] [] cv_wait_common+0x110/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000003] [] __cv_wait+0x15/0x20 [spl] [ +0,000022] [] dmu_tx_wait+0x9b/0x340 [zfs] [ +0,000022] [] dmu_tx_assign+0x88/0x490 [zfs] [ +0,000033] [] ? dmu_tx_count_dnode+0x5c/0xa0 [zfs] [ +0,000024] [] ? dmu_tx_hold_sa+0x60/0x170 [zfs] [ +0,000041] [] zfs_dirty_inode+0x105/0x310 [zfs] [ +0,000002] [] ? legitimize_path.isra.26+0x2e/0x60 [ +0,000002] [] ? legitimize_path.isra.26+0x2e/0x60 [ +0,000001] [] ? __d_lookup+0x5f/0x130 [ +0,000001] [] ? unlazy_walk+0x159/0x180 [ +0,000002] [] ? lookup_fast+0x1e4/0x310 [ +0,000001] [] ? dput+0x40/0x270 [ +0,000030] [] zpl_dirty_inode+0x2c/0x40 [zfs] [ +0,000001] [] __mark_inode_dirty+0x17a/0x3a0 [ +0,000002] [] generic_update_time+0x79/0xd0 [ +0,000001] [] touch_atime+0xa8/0xd0 [ +0,000001] [] generic_file_mmap+0x3e/0x60 [ +0,000029] [] zpl_mmap+0x84/0xe0 [zfs] [ +0,000001] [] mmap_region+0x3a5/0x640 [ +0,000001] [] do_mmap+0x42e/0x510 [ +0,000002] [] ? common_mmap+0x48/0x50 [ +0,000001] [] ? apparmor_mmap_file+0x18/0x20 [ +0,000002] [] ? security_mmap_file+0xdd/0xf0 [ +0,000001] [] vm_mmap_pgoff+0xba/0xf0 [ +0,000003] [] SyS_mmap_pgoff+0x1c1/0x290 [ +0,000001] [] SyS_mmap+0x1b/0x30 [ +0,000002] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ +0,000015] INFO: task EMT-3:20808 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000001] EMT-3 D ffff96c342a539f0 0 20808 20688 0x00000000 [ +0,000001] ffff96c342a539f0 00ff96c342a539e8 ffffffffb140d540 ffff96c3a558bb00 [ +0,000002] 0000000000000000 ffff96c342a54000 ffff96c41e50d170 ffff96c41e50d148 [ +0,000001] ffff96c41e50d180 0000000000000000 ffff96c342a53a08 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000002] [] schedule+0x35/0x80 [ +0,000004] [] cv_wait_common+0x110/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000003] [] __cv_wait+0x15/0x20 [spl] [ +0,000022] [] dmu_tx_wait+0x9b/0x340 [zfs] [ +0,000022] [] dmu_tx_assign+0x88/0x490 [zfs] [ +0,000021] [] ? dmu_tx_count_dnode+0x5c/0xa0 [zfs] [ +0,000029] [] zfs_write+0x60a/0xd40 [zfs] [ +0,000001] [] ? finish_wait+0x55/0x70 [ +0,000013] [] ? rtR0SemEventMultiLnxWait.isra.2+0x350/0x390 [vboxdrv] [ +0,000029] [] zpl_write_common_iovec+0x8c/0xe0 [zfs] [ +0,000029] [] zpl_iter_write+0xb7/0xf0 [zfs] [ +0,000001] [] new_sync_write+0xd5/0x130 [ +0,000002] [] __vfs_write+0x26/0x40 [ +0,000001] [] vfs_write+0xb5/0x1a0 [ +0,000002] [] SyS_write+0x55/0xc0 [ +0,000001] [] ? SyS_ioctl+0x63/0x90 [ +0,000001] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ +0,000002] INFO: task Port0:20828 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000000] Port0 D ffff96c33e28b9f0 0 20828 20688 0x00000000 [ +0,000002] ffff96c33e28b9f0 00ff96c33e28b9e8 ffff96c42ce749c0 ffff96c38816ac40 [ +0,000001] 0000000000000001 ffff96c33e28c000 ffff96c41e50d170 ffff96c41e50d148 [ +0,000001] ffff96c41e50d180 0000000000000000 ffff96c33e28ba08 ffffffffb0e9bd35 [ +0,000002] Call Trace: [ +0,000001] [] schedule+0x35/0x80 [ +0,000004] [] cv_wait_common+0x110/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000004] [] __cv_wait+0x15/0x20 [spl] [ +0,000021] [] dmu_tx_wait+0x9b/0x340 [zfs] [ +0,000022] [] dmu_tx_assign+0x88/0x490 [zfs] [ +0,000021] [] ? dmu_tx_count_dnode+0x5c/0xa0 [zfs] [ +0,000029] [] zfs_write+0x60a/0xd40 [zfs] [ +0,000002] [] ? sched_clock+0x9/0x10 [ +0,000029] [] zpl_write_common_iovec+0x8c/0xe0 [zfs] [ +0,000028] [] zpl_iter_write+0xb7/0xf0 [zfs] [ +0,000002] [] new_sync_write+0xd5/0x130 [ +0,000001] [] __vfs_write+0x26/0x40 [ +0,000001] [] vfs_write+0xb5/0x1a0 [ +0,000002] [] SyS_write+0x55/0xc0 [ +0,000001] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ +0,000001] INFO: task ATA-0:20829 blocked for more than 120 seconds. [ +0,000001] Tainted: P OE 4.8.0-39-generic #42-Ubuntu [ +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0,000001] ATA-0 D ffff96c33e28fa58 0 20829 20688 0x00000000 [ +0,000001] ffff96c33e28fa58 00ffffffc0b0365f ffff96c42ce749c0 ffff96c38816e740 [ +0,000001] ffff96c11db94000 ffff96c33e290000 ffff96c41e50d170 ffff96c41e50d148 [ +0,000002] ffff96c41e50d180 0000000000000000 ffff96c33e28fa70 ffffffffb0e9bd35 [ +0,000001] Call Trace: [ +0,000001] [] schedule+0x35/0x80 [ +0,000004] [] cv_wait_common+0x110/0x130 [spl] [ +0,000001] [] ? wake_atomic_t_function+0x60/0x60 [ +0,000004] [] __cv_wait+0x15/0x20 [spl] [ +0,000021] [] dmu_tx_wait+0x9b/0x340 [zfs] [ +0,000022] [] dmu_tx_assign+0x88/0x490 [zfs] [ +0,000032] [] ? dmu_tx_count_dnode+0x5c/0xa0 [zfs] [ +0,000022] [] ? dmu_tx_hold_sa+0x60/0x170 [zfs] [ +0,000040] [] zfs_dirty_inode+0x105/0x310 [zfs] [ +0,000003] [] ? spl_kmem_free+0x33/0x40 [spl] [ +0,000021] [] ? dmu_buf_rele_array.part.5+0x52/0x60 [zfs] [ +0,000003] [] ? spl_kmem_free+0x33/0x40 [spl] [ +0,000030] [] ? zfs_range_unlock+0x1a2/0x2c0 [zfs] [ +0,000001] [] ? mutex_lock+0x12/0x30 [ +0,000026] [] ? rrw_exit+0x45/0x130 [zfs] [ +0,000025] [] ? rrm_exit+0x46/0x80 [zfs] [ +0,000029] [] zpl_dirty_inode+0x2c/0x40 [zfs] [ +0,000001] [] __mark_inode_dirty+0x17a/0x3a0 [ +0,000002] [] generic_update_time+0x79/0xd0 [ +0,000001] [] touch_atime+0xa8/0xd0 [ +0,000028] [] zpl_iter_read+0xbf/0xf0 [zfs] [ +0,000002] [] new_sync_read+0xd2/0x120 [ +0,000001] [] __vfs_read+0x26/0x40 [ +0,000002] [] vfs_read+0x96/0x130 [ +0,000001] [] SyS_read+0x55/0xc0 [ +0,000002] [] entry_SYSCALL_64_fastpath+0x1e/0xa8
@behlendorf
Copy link
Contributor

@krichter722 could you try patches in #5752. Not only do the multi-thread spa sync but they also address an issue which can result in IO not being aggregated optimally.

@krichter722
Copy link
Author

krichter722 commented Mar 6, 2017

@krichter722 could you try patches in #5752. Not only do the multi-thread spa sync but they also address an issue which can result in IO not being aggregated optimally.

@behlendorf Thanks for the hint. 0.7.0-rc3_127_g238a31530c (HEAD of the PR's branch mt-sync) after installation like above and reboot doesn't change the symptoms.

@krichter722
Copy link
Author

krichter722 commented Mar 8, 2017

I figured, that this might be related to virtualization in general since I'm having the same problem when I start LXC 2.0.5-0ubuntu1 where lxc-start or lxc-attach starve I/O for minutes or deduplication because the two datasets which contain VirtualBox and LXC data have dedup activated. In dmesg I see now

[ +18,431577] INFO: task txg_sync:12627 blocked for more than 120 seconds.
[  +0,000004]       Tainted: P           OE   4.8.0-41-generic #44-Ubuntu
[  +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000002] txg_sync        D ffff8f1d8b347b48     0 12627      2 0x00000000
[  +0,000003]  ffff8f1d8b347b48 00ff8f1d8b347b90 ffff8f1dacfa1d80 ffff8f1da3496740
[  +0,000002]  ffff8f1da635dac0 ffff8f1d8b348000 ffff8f1dbf399300 7fffffffffffffff
[  +0,000002]  ffff8f1b14734a08 0000000000000001 ffff8f1d8b347b60 ffffffffabc9bd35
[  +0,000002] Call Trace:
[  +0,000006]  [<ffffffffabc9bd35>] schedule+0x35/0x80
[  +0,000002]  [<ffffffffabc9f50a>] schedule_timeout+0x22a/0x3f0
[  +0,000048]  [<ffffffffc0f64450>] ? zio_reexecute+0x380/0x380 [zfs]
[  +0,000002]  [<ffffffffab4fcb01>] ? ktime_get+0x41/0xb0
[  +0,000002]  [<ffffffffabc9b524>] io_schedule_timeout+0xa4/0x110
[  +0,000007]  [<ffffffffc0b3cb32>] cv_wait_common+0xb2/0x130 [spl]
[  +0,000002]  [<ffffffffab4c7510>] ? wake_atomic_t_function+0x60/0x60
[  +0,000004]  [<ffffffffc0b3cc08>] __cv_wait_io+0x18/0x20 [spl]
[  +0,000034]  [<ffffffffc0f66fec>] zio_wait+0xec/0x190 [zfs]
[  +0,000031]  [<ffffffffc0f05d63>] spa_sync+0x453/0xd70 [zfs]
[  +0,000031]  [<ffffffffc0f17054>] txg_sync_thread+0x2c4/0x480 [zfs]
[  +0,000032]  [<ffffffffc0f16d90>] ? txg_delay+0x160/0x160 [zfs]
[  +0,000004]  [<ffffffffc0b376d1>] thread_generic_wrapper+0x71/0x80 [spl]
[  +0,000003]  [<ffffffffc0b37660>] ? __thread_exit+0x20/0x20 [spl]
[  +0,000003]  [<ffffffffab4a40d8>] kthread+0xd8/0xf0
[  +0,000002]  [<ffffffffabca071f>] ret_from_fork+0x1f/0x40
[  +0,000001]  [<ffffffffab4a4000>] ? kthread_create_on_node+0x1e0/0x1e0

on Ubuntu 4.8.0-41-generic.

@krichter722
Copy link
Author

@krichter722 could you try patches in #5752. Not only do the multi-thread spa sync but they also address an issue which can result in IO not being aggregated optimally.

@behlendorf Thanks for the hint. 0.7.0-rc3_127_g238a31530c (HEAD of the PR's branch mt-sync) after installation like above and reboot doesn't change the symptoms.

Also not fixed after #5752 is merged, i.e. in zfs-0.7.0-rc3-147-g64fc776208 with spl-0.7.0-rc3-7-gbf8abea4.

@krichter722
Copy link
Author

krichter722 commented Mar 21, 2017

probably related to I/O scheduler, you did not share your zpool create command or pool geometry, either.

@kpande I didn't really figure out what the pool geometry is, please tell me if the following doesn't provide information. The pool consists of one partition of a 2TB HDD which has 5 boot partitions < 1GB besides it. It has been created using sudo zpool create data /dev/sdx1 a cache based on a 60 GB partition on a SSD has been added with sudo zpool add data cache /dev/sdy2. Both HDD and SSD have a sector size of 512 according to gparted`.

@krichter722
Copy link
Author

With 0.7.0-rc3_161_g2035575fd6 on Linux 4.8.0-44-generic the issue persists with new backtraces

[Mär29 12:35] INFO: task txg_sync:13074 blocked for more than 120 seconds.
[  +0,000005]       Tainted: P           OE   4.8.0-44-generic #47-Ubuntu
[  +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000002] txg_sync        D ffff8fa6984d3ac8     0 13074      2 0x00000000
[  +0,000005]  ffff8fa6984d3ac8 00ff8fa6e4e9a080 ffff8fa6ece70ec0 ffff8fa6a67dc9c0
[  +0,000002]  0000000000000000 ffff8fa6984d4000 ffff8fa6ff319300 7fffffffffffffff
[  +0,000002]  ffff8fa4348b4a08 0000000000000001 ffff8fa6984d3ae0 ffffffffa729ad35
[  +0,000001] Call Trace:
[  +0,000007]  [<ffffffffa729ad35>] schedule+0x35/0x80
[  +0,000002]  [<ffffffffa729e50a>] schedule_timeout+0x22a/0x3f0
[  +0,000067]  [<ffffffffc0c3a5e1>] ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[  +0,000036]  [<ffffffffc0c3a622>] ? zio_issue_async+0x12/0x20 [zfs]
[  +0,000036]  [<ffffffffc0c3eb56>] ? zio_nowait+0xb6/0x150 [zfs]
[  +0,000003]  [<ffffffffa6afcb01>] ? ktime_get+0x41/0xb0
[  +0,000002]  [<ffffffffa729a524>] io_schedule_timeout+0xa4/0x110
[  +0,000009]  [<ffffffffc0985b32>] cv_wait_common+0xb2/0x130 [spl]
[  +0,000002]  [<ffffffffa6ac7510>] ? wake_atomic_t_function+0x60/0x60
[  +0,000005]  [<ffffffffc0985c08>] __cv_wait_io+0x18/0x20 [spl]
[  +0,000034]  [<ffffffffc0c3e48c>] zio_wait+0xec/0x190 [zfs]
[  +0,000028]  [<ffffffffc0bc0778>] dsl_pool_sync+0xb8/0x440 [zfs]
[  +0,000030]  [<ffffffffc0bdce02>] spa_sync+0x412/0xd70 [zfs]
[  +0,000031]  [<ffffffffc0bee134>] txg_sync_thread+0x2c4/0x480 [zfs]
[  +0,000032]  [<ffffffffc0bede70>] ? txg_delay+0x160/0x160 [zfs]
[  +0,000004]  [<ffffffffc09806d1>] thread_generic_wrapper+0x71/0x80 [spl]
[  +0,000004]  [<ffffffffc0980660>] ? __thread_exit+0x20/0x20 [spl]
[  +0,000002]  [<ffffffffa6aa40d8>] kthread+0xd8/0xf0
[  +0,000003]  [<ffffffffa729f71f>] ret_from_fork+0x1f/0x40
[  +0,000001]  [<ffffffffa6aa4000>] ? kthread_create_on_node+0x1e0/0x1e0
[  +0,000035] INFO: task mozStorage #1:19500 blocked for more than 120 seconds.
[  +0,000002]       Tainted: P           OE   4.8.0-44-generic #47-Ubuntu
[  +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000001] mozStorage #1   D ffff8fa60f7e7c18     0 19500  16633 0x00000000
[  +0,000002]  ffff8fa60f7e7c18 00ffae08a1de4008 ffff8fa6ece70ec0 ffff8fa673ee5880
[  +0,000002]  0000000000000001 ffff8fa60f7e8000 ffff8fa6ff319300 7fffffffffffffff
[  +0,000001]  ffff8fa530631b28 0000000000000001 ffff8fa60f7e7c30 ffffffffa729ad35
[  +0,000002] Call Trace:
[  +0,000001]  [<ffffffffa729ad35>] schedule+0x35/0x80
[  +0,000002]  [<ffffffffa729e50a>] schedule_timeout+0x22a/0x3f0
[  +0,000001]  [<ffffffffa6ac6ec4>] ? __wake_up+0x44/0x50
[  +0,000004]  [<ffffffffc09811b5>] ? taskq_dispatch_ent+0x55/0x170 [spl]
[  +0,000001]  [<ffffffffa6afcb01>] ? ktime_get+0x41/0xb0
[  +0,000001]  [<ffffffffa729a524>] io_schedule_timeout+0xa4/0x110
[  +0,000005]  [<ffffffffc0985b32>] cv_wait_common+0xb2/0x130 [spl]
[  +0,000001]  [<ffffffffa6ac7510>] ? wake_atomic_t_function+0x60/0x60
[  +0,000004]  [<ffffffffc0985c08>] __cv_wait_io+0x18/0x20 [spl]
[  +0,000034]  [<ffffffffc0c3e48c>] zio_wait+0xec/0x190 [zfs]
[  +0,000033]  [<ffffffffc0c37d76>] zil_commit.part.11+0x3c6/0x7a0 [zfs]
[  +0,000005]  [<ffffffffc0986746>] ? tsd_hash_search.isra.0+0x46/0xa0 [spl]
[  +0,000033]  [<ffffffffc0c38167>] zil_commit+0x17/0x20 [zfs]
[  +0,000033]  [<ffffffffc0c2d79a>] zfs_fsync+0x7a/0xf0 [zfs]
[  +0,000032]  [<ffffffffc0c44ac8>] zpl_fsync+0x68/0xa0 [zfs]
[  +0,000002]  [<ffffffffa6c698fb>] vfs_fsync_range+0x4b/0xb0
[  +0,000002]  [<ffffffffa6c699bd>] do_fsync+0x3d/0x70
[  +0,000002]  [<ffffffffa6c69c50>] SyS_fsync+0x10/0x20
[  +0,000002]  [<ffffffffa729f4f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[  +0,000007] INFO: task mozStorage #1:19759 blocked for more than 120 seconds.
[  +0,000001]       Tainted: P           OE   4.8.0-44-generic #47-Ubuntu
[  +0,000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0,000001] mozStorage #1   D ffff8fa6064dfd38     0 19759  16633 0x00000000
[  +0,000001]  ffff8fa6064dfd38 0000000000000000 ffff8fa6ecfa6740 ffff8fa65db93b00
[  +0,000002]  ffff8fa45bcbc7f0 ffff8fa6064e0000 ffff8fa699919828 ffff8fa699919800
[  +0,000002]  ffff8fa6999199c8 0000000000000000 ffff8fa6064dfd50 ffffffffa729ad35
[  +0,000001] Call Trace:
[  +0,000002]  [<ffffffffa729ad35>] schedule+0x35/0x80
[  +0,000004]  [<ffffffffc0985b90>] cv_wait_common+0x110/0x130 [spl]
[  +0,000001]  [<ffffffffa6ac7510>] ? wake_atomic_t_function+0x60/0x60
[  +0,000004]  [<ffffffffc0985bc5>] __cv_wait+0x15/0x20 [spl]
[  +0,000033]  [<ffffffffc0c37a29>] zil_commit.part.11+0x79/0x7a0 [zfs]
[  +0,000004]  [<ffffffffc0986746>] ? tsd_hash_search.isra.0+0x46/0xa0 [spl]
[  +0,000005]  [<ffffffffc0986fa4>] ? tsd_set+0x2b4/0x500 [spl]
[  +0,000001]  [<ffffffffa729cca2>] ? mutex_lock+0x12/0x30
[  +0,000033]  [<ffffffffc0c38167>] zil_commit+0x17/0x20 [zfs]
[  +0,000032]  [<ffffffffc0c2d79a>] zfs_fsync+0x7a/0xf0 [zfs]
[  +0,000033]  [<ffffffffc0c44ac8>] zpl_fsync+0x68/0xa0 [zfs]
[  +0,000002]  [<ffffffffa6c698fb>] vfs_fsync_range+0x4b/0xb0
[  +0,000002]  [<ffffffffa6c699bd>] do_fsync+0x3d/0x70
[  +0,000001]  [<ffffffffa6c69c50>] SyS_fsync+0x10/0x20
[  +0,000002]  [<ffffffffa729f4f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8

@krichter722
Copy link
Author

Is #5923 the same issue?

@ioquatix
Copy link

ioquatix commented Mar 31, 2017

I feel like I'm still having this issue too:

hinoki% uname -a
Linux hinoki 4.10.5-1-ARCH #1 SMP PREEMPT Wed Mar 22 14:42:03 CET 2017 x86_64 GNU/Linux
hinoki% pacman -Q zfs-linux 
zfs-linux 0.6.5.9_4.10.5_1-1

dmesg

[394856.701815] INFO: task txg_sync:1363 blocked for more than 120 seconds.
[394856.701959]       Tainted: P           O    4.10.5-1-ARCH #1
[394856.702065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[394856.702210] txg_sync        D    0  1363      2 0x00000000
[394856.702216] Call Trace:
[394856.702232]  __schedule+0x22f/0x700
[394856.702239]  schedule+0x3d/0x90
[394856.702245]  schedule_timeout+0x243/0x3d0
[394856.702300]  ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[394856.702340]  ? zio_issue_async+0x12/0x20 [zfs]
[394856.702378]  ? zio_nowait+0x79/0x110 [zfs]
[394856.702385]  ? ktime_get+0x41/0xb0
[394856.702390]  io_schedule_timeout+0xa4/0x110
[394856.702401]  cv_wait_common+0xba/0x140 [spl]
[394856.702406]  ? wake_atomic_t_function+0x60/0x60
[394856.702414]  __cv_wait_io+0x18/0x20 [spl]
[394856.702452]  zio_wait+0xc4/0x150 [zfs]
[394856.702498]  dsl_pool_sync+0xb8/0x480 [zfs]
[394856.702548]  spa_sync+0x37f/0xb40 [zfs]
[394856.702554]  ? default_wake_function+0x12/0x20
[394856.702602]  txg_sync_thread+0x3bc/0x630 [zfs]
[394856.702651]  ? txg_delay+0x180/0x180 [zfs]
[394856.702660]  thread_generic_wrapper+0x72/0x80 [spl]
[394856.702665]  kthread+0x101/0x140
[394856.702672]  ? __thread_exit+0x20/0x20 [spl]
[394856.702677]  ? kthread_create_on_node+0x60/0x60
[394856.702681]  ret_from_fork+0x2c/0x40
[394979.595351] INFO: task txg_sync:1363 blocked for more than 120 seconds.
[394979.595495]       Tainted: P           O    4.10.5-1-ARCH #1
[394979.595601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[394979.595750] txg_sync        D    0  1363      2 0x00000000
[394979.595756] Call Trace:
[394979.595771]  __schedule+0x22f/0x700
[394979.595779]  schedule+0x3d/0x90
[394979.595785]  schedule_timeout+0x243/0x3d0
[394979.595842]  ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[394979.595882]  ? zio_issue_async+0x12/0x20 [zfs]
[394979.595920]  ? zio_nowait+0x79/0x110 [zfs]
[394979.595927]  ? ktime_get+0x41/0xb0
[394979.595933]  io_schedule_timeout+0xa4/0x110
[394979.595944]  cv_wait_common+0xba/0x140 [spl]
[394979.595949]  ? wake_atomic_t_function+0x60/0x60
[394979.595956]  __cv_wait_io+0x18/0x20 [spl]
[394979.595994]  zio_wait+0xc4/0x150 [zfs]
[394979.596040]  dsl_pool_sync+0xb8/0x480 [zfs]
[394979.596091]  spa_sync+0x37f/0xb40 [zfs]
[394979.596097]  ? default_wake_function+0x12/0x20
[394979.596146]  txg_sync_thread+0x3bc/0x630 [zfs]
[394979.596195]  ? txg_delay+0x180/0x180 [zfs]
[394979.596203]  thread_generic_wrapper+0x72/0x80 [spl]
[394979.596208]  kthread+0x101/0x140
[394979.596215]  ? __thread_exit+0x20/0x20 [spl]
[394979.596220]  ? kthread_create_on_node+0x60/0x60
[394979.596224]  ret_from_fork+0x2c/0x40
[395225.382485] INFO: task txg_sync:1363 blocked for more than 120 seconds.
[395225.382627]       Tainted: P           O    4.10.5-1-ARCH #1
[395225.382732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[395225.382877] txg_sync        D    0  1363      2 0x00000000
[395225.382884] Call Trace:
[395225.382899]  __schedule+0x22f/0x700
[395225.382906]  schedule+0x3d/0x90
[395225.382912]  schedule_timeout+0x243/0x3d0
[395225.382966]  ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[395225.383005]  ? zio_issue_async+0x12/0x20 [zfs]
[395225.383043]  ? zio_nowait+0x79/0x110 [zfs]
[395225.383050]  ? ktime_get+0x41/0xb0
[395225.383055]  io_schedule_timeout+0xa4/0x110
[395225.383066]  cv_wait_common+0xba/0x140 [spl]
[395225.383071]  ? wake_atomic_t_function+0x60/0x60
[395225.383079]  __cv_wait_io+0x18/0x20 [spl]
[395225.383116]  zio_wait+0xc4/0x150 [zfs]
[395225.383163]  dsl_pool_sync+0xb8/0x480 [zfs]
[395225.383213]  spa_sync+0x37f/0xb40 [zfs]
[395225.383219]  ? default_wake_function+0x12/0x20
[395225.383268]  txg_sync_thread+0x3bc/0x630 [zfs]
[395225.383316]  ? txg_delay+0x180/0x180 [zfs]
[395225.383325]  thread_generic_wrapper+0x72/0x80 [spl]
[395225.383330]  kthread+0x101/0x140
[395225.383337]  ? __thread_exit+0x20/0x20 [spl]
[395225.383342]  ? kthread_create_on_node+0x60/0x60
[395225.383346]  ret_from_fork+0x2c/0x40
[395348.276004] INFO: task txg_sync:1363 blocked for more than 120 seconds.
[395348.276146]       Tainted: P           O    4.10.5-1-ARCH #1
[395348.276252] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[395348.276396] txg_sync        D    0  1363      2 0x00000000
[395348.276403] Call Trace:
[395348.276418]  __schedule+0x22f/0x700
[395348.276424]  schedule+0x3d/0x90
[395348.276431]  schedule_timeout+0x243/0x3d0
[395348.276484]  ? zio_taskq_dispatch+0x91/0xa0 [zfs]
[395348.276524]  ? zio_issue_async+0x12/0x20 [zfs]
[395348.276562]  ? zio_nowait+0x79/0x110 [zfs]
[395348.276569]  ? ktime_get+0x41/0xb0
[395348.276574]  io_schedule_timeout+0xa4/0x110
[395348.276585]  cv_wait_common+0xba/0x140 [spl]
[395348.276590]  ? wake_atomic_t_function+0x60/0x60
[395348.276598]  __cv_wait_io+0x18/0x20 [spl]
[395348.276636]  zio_wait+0xc4/0x150 [zfs]
[395348.276682]  dsl_pool_sync+0xb8/0x480 [zfs]
[395348.276732]  spa_sync+0x37f/0xb40 [zfs]
[395348.276738]  ? default_wake_function+0x12/0x20
[395348.276787]  txg_sync_thread+0x3bc/0x630 [zfs]
[395348.276836]  ? txg_delay+0x180/0x180 [zfs]
[395348.276844]  thread_generic_wrapper+0x72/0x80 [spl]
[395348.276849]  kthread+0x101/0x140
[395348.276856]  ? __thread_exit+0x20/0x20 [spl]
[395348.276861]  ? kthread_create_on_node+0x60/0x60
[395348.276865]  ret_from_fork+0x2c/0x40

@ioquatix
Copy link

I captured the issue in a video here:

https://youtu.be/zjeOouLB-0w

The bottom right pane, is me logged into the remote server initiating the backup, and all the rest are on the backup server, which is running zfs.

@krichter722
Copy link
Author

krichter722 commented Apr 14, 2017

I captured the issue in a video here:

https://youtu.be/zjeOouLB-0w

That's indeed the issue I'm having, thank you. The number of z_fr_iss_[n] differs, I assume that's because of the number of logical CPU cores (8 in my case).

I tried 0.7.0-rc3_202_g66aca24730 which has 66aca24 which I thought might help, but doesn't, instead I'm seeing

relevant parts of dmesg output [ 363.504525] INFO: task txg_sync:13891 blocked for more than 120 seconds. [ 363.504531] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 363.504532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.504534] txg_sync D ffff91858f2dfb48 0 13891 2 0x00000000 [ 363.504539] ffff91858f2dfb48 00ff9183f6c24340 ffff9185ece70000 ffff918598f049c0 [ 363.504541] ffff9185d4fb27c0 ffff91858f2e0000 ffff9185ff319300 7fffffffffffffff [ 363.504544] ffff91845a002488 0000000000000001 ffff91858f2dfb60 ffffffffad89ad55 [ 363.504546] Call Trace: [ 363.504552] [] schedule+0x35/0x80 [ 363.504555] [] schedule_timeout+0x22a/0x3f0 [ 363.504612] [] ? zio_reexecute+0x380/0x380 [zfs] [ 363.504616] [] ? ktime_get+0x41/0xb0 [ 363.504618] [] io_schedule_timeout+0xa4/0x110 [ 363.504628] [] cv_wait_common+0xb2/0x130 [spl] [ 363.504630] [] ? wake_atomic_t_function+0x60/0x60 [ 363.504637] [] __cv_wait_io+0x18/0x20 [spl] [ 363.504682] [] zio_wait+0xec/0x190 [zfs] [ 363.504725] [] spa_sync+0x453/0xd70 [zfs] [ 363.504769] [] txg_sync_thread+0x2c4/0x480 [zfs] [ 363.504812] [] ? txg_delay+0x160/0x160 [zfs] [ 363.504818] [] thread_generic_wrapper+0x71/0x80 [spl] [ 363.504823] [] ? __thread_exit+0x20/0x20 [spl] [ 363.504825] [] kthread+0xd8/0xf0 [ 363.504829] [] ret_from_fork+0x1f/0x40 [ 363.504831] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 484.335506] INFO: task txg_sync:13891 blocked for more than 120 seconds. [ 484.335510] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 484.335511] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 484.335512] txg_sync D ffff91858f2dfb48 0 13891 2 0x00000000 [ 484.335516] ffff91858f2dfb48 00ff9183f6c24340 ffff9185ece70000 ffff918598f049c0 [ 484.335518] ffff9185d4fb27c0 ffff91858f2e0000 ffff9185ff319300 7fffffffffffffff [ 484.335520] ffff91845a002488 0000000000000001 ffff91858f2dfb60 ffffffffad89ad55 [ 484.335522] Call Trace: [ 484.335527] [] schedule+0x35/0x80 [ 484.335529] [] schedule_timeout+0x22a/0x3f0 [ 484.335576] [] ? zio_reexecute+0x380/0x380 [zfs] [ 484.335579] [] ? ktime_get+0x41/0xb0 [ 484.335581] [] io_schedule_timeout+0xa4/0x110 [ 484.335588] [] cv_wait_common+0xb2/0x130 [spl] [ 484.335590] [] ? wake_atomic_t_function+0x60/0x60 [ 484.335594] [] __cv_wait_io+0x18/0x20 [spl] [ 484.335628] [] zio_wait+0xec/0x190 [zfs] [ 484.335659] [] spa_sync+0x453/0xd70 [zfs] [ 484.335691] [] txg_sync_thread+0x2c4/0x480 [zfs] [ 484.335722] [] ? txg_delay+0x160/0x160 [zfs] [ 484.335726] [] thread_generic_wrapper+0x71/0x80 [spl] [ 484.335729] [] ? __thread_exit+0x20/0x20 [spl] [ 484.335731] [] kthread+0xd8/0xf0 [ 484.335734] [] ret_from_fork+0x1f/0x40 [ 484.335735] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 605.166519] INFO: task txg_sync:13891 blocked for more than 120 seconds. [ 605.166523] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.166524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.166525] txg_sync D ffff91858f2dfb48 0 13891 2 0x00000000 [ 605.166529] ffff91858f2dfb48 00ff9183f6c24340 ffff9185ece70000 ffff918598f049c0 [ 605.166530] ffff9185d4fb27c0 ffff91858f2e0000 ffff9185ff319300 7fffffffffffffff [ 605.166532] ffff91845a002488 0000000000000001 ffff91858f2dfb60 ffffffffad89ad55 [ 605.166534] Call Trace: [ 605.166539] [] schedule+0x35/0x80 [ 605.166541] [] schedule_timeout+0x22a/0x3f0 [ 605.166587] [] ? zio_reexecute+0x380/0x380 [zfs] [ 605.166589] [] ? ktime_get+0x41/0xb0 [ 605.166591] [] io_schedule_timeout+0xa4/0x110 [ 605.166598] [] cv_wait_common+0xb2/0x130 [spl] [ 605.166600] [] ? wake_atomic_t_function+0x60/0x60 [ 605.166604] [] __cv_wait_io+0x18/0x20 [spl] [ 605.166638] [] zio_wait+0xec/0x190 [zfs] [ 605.166668] [] spa_sync+0x453/0xd70 [zfs] [ 605.166700] [] txg_sync_thread+0x2c4/0x480 [zfs] [ 605.166732] [] ? txg_delay+0x160/0x160 [zfs] [ 605.166736] [] thread_generic_wrapper+0x71/0x80 [spl] [ 605.166739] [] ? __thread_exit+0x20/0x20 [spl] [ 605.166741] [] kthread+0xd8/0xf0 [ 605.166744] [] ret_from_fork+0x1f/0x40 [ 605.166745] [] ? kthread_create_on_node+0x1e0/0x1e0 [ 605.166775] INFO: task DOM Worker:20076 blocked for more than 120 seconds. [ 605.166777] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.166777] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.166778] DOM Worker D ffff918506917d58 0 20076 17517 0x00000000 [ 605.166780] ffff918506917d58 00ffffffad0370e9 ffff9185ecfa0000 ffff918506918ec0 [ 605.166782] ffff918506918ec0 ffff918506918000 ffff91856fe24068 ffff91856fe24080 [ 605.166783] ffff918506917d90 0000000000000020 ffff918506917d70 ffffffffad89ad55 [ 605.166785] Call Trace: [ 605.166786] [] schedule+0x35/0x80 [ 605.166788] [] rwsem_down_read_failed+0x103/0x160 [ 605.166790] [] call_rwsem_down_read_failed+0x18/0x30 [ 605.166792] [] down_read+0x20/0x40 [ 605.166794] [] SyS_madvise+0x399/0x8b0 [ 605.166797] [] ? getrusage+0x43/0x70 [ 605.166799] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 605.166800] INFO: task mozStorage #2:20083 blocked for more than 120 seconds. [ 605.166801] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.166802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.166803] mozStorage #2 D ffff918502a139a0 0 20083 17517 0x00000000 [ 605.166804] ffff918502a139a0 00ffffffc08aff93 ffff9185ece70000 ffff918511883b00 [ 605.166806] ffff91852558c000 ffff918502a14000 ffff9185e885d170 ffff9185e885d148 [ 605.166808] ffff9185e885d180 0000000000000000 ffff918502a139b8 ffffffffad89ad55 [ 605.166809] Call Trace: [ 605.166810] [] schedule+0x35/0x80 [ 605.166815] [] cv_wait_common+0x110/0x130 [spl] [ 605.166816] [] ? wake_atomic_t_function+0x60/0x60 [ 605.166820] [] __cv_wait+0x15/0x20 [spl] [ 605.166845] [] dmu_tx_wait+0x9b/0x340 [zfs] [ 605.166869] [] dmu_tx_assign+0x87/0x3e0 [zfs] [ 605.166892] [] ? dmu_tx_hold_sa+0x100/0x170 [zfs] [ 605.166926] [] zfs_dirty_inode+0x105/0x310 [zfs] [ 605.166927] [] ? dequeue_entity+0x245/0xab0 [ 605.166929] [] ? put_prev_entity+0x33/0x3e0 [ 605.166931] [] ? __switch_to+0x2ce/0x6c0 [ 605.166932] [] ? pick_next_task_fair+0x116/0x4d0 [ 605.166966] [] zpl_dirty_inode+0x2c/0x40 [zfs] [ 605.166968] [] __mark_inode_dirty+0x17a/0x3a0 [ 605.166969] [] ? __schedule+0x308/0x770 [ 605.166971] [] generic_update_time+0x79/0xd0 [ 605.166973] [] file_update_time+0xbd/0x110 [ 605.166975] [] vma_do_file_update_time+0x22/0x40 [ 605.166977] [] filemap_page_mkwrite+0x48/0xc0 [ 605.166978] [] do_page_mkwrite+0x77/0xf0 [ 605.166980] [] do_wp_page+0x1ed/0x800 [ 605.166981] [] handle_mm_fault+0x8a6/0x13c0 [ 605.166983] [] ? locks_alloc_lock+0x1b/0x70 [ 605.166985] [] __do_page_fault+0x23e/0x4e0 [ 605.166987] [] do_page_fault+0x22/0x30 [ 605.166988] [] page_fault+0x28/0x30 [ 605.166989] INFO: task DOM Worker:20170 blocked for more than 120 seconds. [ 605.166990] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.166991] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.166992] DOM Worker D ffff918506bcfa98 0 20170 17517 0x00000000 [ 605.166993] ffff918506bcfa98 00ffffffc08b01a1 ffff9185ece70ec0 ffff91853df82c40 [ 605.166995] 0000000000000000 ffff918506bd0000 ffff9185e885d170 ffff9185e885d148 [ 605.166996] ffff9185e885d180 0000000000000000 ffff918506bcfab0 ffffffffad89ad55 [ 605.166998] Call Trace: [ 605.166999] [] schedule+0x35/0x80 [ 605.167004] [] cv_wait_common+0x110/0x130 [spl] [ 605.167009] [] ? wake_atomic_t_function+0x60/0x60 [ 605.167013] [] __cv_wait+0x15/0x20 [spl] [ 605.167037] [] dmu_tx_wait+0x9b/0x340 [zfs] [ 605.167070] [] zfs_create+0x307/0x790 [zfs] [ 605.167103] [] zpl_create+0xbf/0x170 [zfs] [ 605.167105] [] path_openat+0x132e/0x1450 [ 605.167106] [] do_filp_open+0x91/0x100 [ 605.167108] [] ? hashlen_string+0xa0/0xa0 [ 605.167109] [] ? __alloc_fd+0x46/0x180 [ 605.167111] [] do_sys_open+0x135/0x280 [ 605.167113] [] ? SyS_futex+0x85/0x170 [ 605.167114] [] SyS_open+0x1e/0x20 [ 605.167116] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 605.167119] INFO: task thunderbird:20655 blocked for more than 120 seconds. [ 605.167120] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.167121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.167122] thunderbird D ffff9184fb31fa78 0 20655 17517 0x00000000 [ 605.167123] ffff9184fb31fa78 00ffffffc08aff93 ffff9185ece70ec0 ffff91850691e740 [ 605.167125] ffff9184c56a0000 ffff9184fb320000 ffff9185e885d170 ffff9185e885d148 [ 605.167126] ffff9185e885d180 0000000000000000 ffff9184fb31fa90 ffffffffad89ad55 [ 605.167128] Call Trace: [ 605.167129] [] schedule+0x35/0x80 [ 605.167134] [] cv_wait_common+0x110/0x130 [spl] [ 605.167135] [] ? wake_atomic_t_function+0x60/0x60 [ 605.167139] [] __cv_wait+0x15/0x20 [spl] [ 605.167163] [] dmu_tx_wait+0x9b/0x340 [zfs] [ 605.167186] [] dmu_tx_assign+0x87/0x3e0 [zfs] [ 605.167209] [] ? dmu_tx_hold_sa+0x100/0x170 [zfs] [ 605.167242] [] zfs_dirty_inode+0x105/0x310 [zfs] [ 605.167243] [] ? __kmalloc_node+0x1da/0x2a0 [ 605.167247] [] ? spl_kmem_alloc+0x9b/0x170 [spl] [ 605.167250] [] ? spl_kmem_free+0x33/0x40 [spl] [ 605.167283] [] ? zfs_range_unlock+0x1a2/0x2c0 [zfs] [ 605.167285] [] ? mutex_lock+0x12/0x30 [ 605.167314] [] ? rrw_exit+0x45/0x130 [zfs] [ 605.167342] [] ? rrm_exit+0x46/0x80 [zfs] [ 605.167374] [] zpl_dirty_inode+0x2c/0x40 [zfs] [ 605.167376] [] __mark_inode_dirty+0x17a/0x3a0 [ 605.167378] [] generic_update_time+0x79/0xd0 [ 605.167379] [] touch_atime+0xa8/0xd0 [ 605.167411] [] zpl_iter_read+0xbf/0xf0 [zfs] [ 605.167413] [] new_sync_read+0xd2/0x120 [ 605.167415] [] __vfs_read+0x26/0x40 [ 605.167416] [] vfs_read+0x96/0x130 [ 605.167418] [] SyS_read+0x55/0xc0 [ 605.167419] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 605.167448] INFO: task Port0:25987 blocked for more than 120 seconds. [ 605.167449] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.167450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.167451] Port0 D ffff918454aa7a10 0 25987 19440 0x00000000 [ 605.167453] ffff918454aa7a10 00000001802a002a ffff9185ece70000 ffff91851199ac40 [ 605.167454] ffff918454aa79f0 ffff918454aa8000 ffff9185e885d170 ffff9185e885d148 [ 605.167456] ffff9185e885d180 0000000000000000 ffff918454aa7a28 ffffffffad89ad55 [ 605.167457] Call Trace: [ 605.167459] [] schedule+0x35/0x80 [ 605.167464] [] cv_wait_common+0x110/0x130 [spl] [ 605.167465] [] ? wake_atomic_t_function+0x60/0x60 [ 605.167469] [] __cv_wait+0x15/0x20 [spl] [ 605.167493] [] dmu_tx_wait+0x9b/0x340 [zfs] [ 605.167516] [] dmu_tx_assign+0x87/0x3e0 [zfs] [ 605.167548] [] zfs_write+0x60a/0xd40 [zfs] [ 605.167550] [] ? dequeue_task_fair+0x62f/0x8f0 [ 605.167551] [] ? put_prev_entity+0x33/0x3e0 [ 605.167553] [] ? __switch_to+0x2ce/0x6c0 [ 605.167586] [] zpl_write_common_iovec+0x8c/0xe0 [zfs] [ 605.167618] [] zpl_iter_write+0xb7/0xf0 [zfs] [ 605.167620] [] new_sync_write+0xd5/0x130 [ 605.167621] [] __vfs_write+0x26/0x40 [ 605.167623] [] vfs_write+0xb5/0x1a0 [ 605.167624] [] SyS_write+0x55/0xc0 [ 605.167626] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 605.167629] INFO: task vlc:26584 blocked for more than 120 seconds. [ 605.167631] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 605.167631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.167632] vlc D ffff918480aa3a78 0 26584 17517 0x00000000 [ 605.167634] ffff918480aa3a78 00ffffffc08aff93 ffff9185ecfa0000 ffff918554d48000 [ 605.167635] ffff918513ae6000 ffff918480aa4000 ffff9185e885d170 ffff9185e885d148 [ 605.167637] ffff9185e885d180 0000000000000000 ffff918480aa3a90 ffffffffad89ad55 [ 605.167638] Call Trace: [ 605.167640] [] schedule+0x35/0x80 [ 605.167644] [] cv_wait_common+0x110/0x130 [spl] [ 605.167645] [] ? wake_atomic_t_function+0x60/0x60 [ 605.167649] [] __cv_wait+0x15/0x20 [spl] [ 605.167673] [] dmu_tx_wait+0x9b/0x340 [zfs] [ 605.167697] [] dmu_tx_assign+0x87/0x3e0 [zfs] [ 605.167719] [] ? dmu_tx_hold_sa+0x100/0x170 [zfs] [ 605.167752] [] zfs_dirty_inode+0x105/0x310 [zfs] [ 605.167755] [] ? spl_kmem_free+0x33/0x40 [spl] [ 605.167778] [] ? dmu_buf_rele_array.part.5+0x52/0x60 [zfs] [ 605.167781] [] ? spl_kmem_free+0x33/0x40 [spl] [ 605.167814] [] ? zfs_range_unlock+0x1a2/0x2c0 [zfs] [ 605.167815] [] ? mutex_lock+0x12/0x30 [ 605.167843] [] ? rrw_exit+0x45/0x130 [zfs] [ 605.167872] [] ? rrm_exit+0x46/0x80 [zfs] [ 605.167905] [] zpl_dirty_inode+0x2c/0x40 [zfs] [ 605.167906] [] __mark_inode_dirty+0x17a/0x3a0 [ 605.167908] [] generic_update_time+0x79/0xd0 [ 605.167909] [] touch_atime+0xa8/0xd0 [ 605.167941] [] zpl_iter_read+0xbf/0xf0 [zfs] [ 605.167943] [] new_sync_read+0xd2/0x120 [ 605.167944] [] __vfs_read+0x26/0x40 [ 605.167946] [] vfs_read+0x96/0x130 [ 605.167947] [] SyS_read+0x55/0xc0 [ 605.167949] [] entry_SYSCALL_64_fastpath+0x1e/0xa8 [ 725.997471] INFO: task DOM Worker:20076 blocked for more than 120 seconds. [ 725.997487] Tainted: P OE 4.8.0-46-generic #49-Ubuntu [ 725.997488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 725.997490] DOM Worker D ffff918506917d58 0 20076 17517 0x00000000 [ 725.997494] ffff918506917d58 00ffffffad0370e9 ffff9185ecfa0000 ffff918506918ec0 [ 725.997496] ffff918506918ec0 ffff918506918000 ffff91856fe24068 ffff91856fe24080 [ 725.997498] ffff918506917d90 0000000000000020 ffff918506917d70 ffffffffad89ad55 [ 725.997499] Call Trace: [ 725.997505] [] schedule+0x35/0x80 [ 725.997507] [] rwsem_down_read_failed+0x103/0x160 [ 725.997509] [] call_rwsem_down_read_failed+0x18/0x30 [ 725.997511] [] down_read+0x20/0x40 [ 725.997525] [] SyS_madvise+0x399/0x8b0 [ 725.997528] [] ? getrusage+0x43/0x70 [ 725.997530] [] entry_SYSCALL_64_fastpath+0x1e/0xa8

on 4.8.0-46-generic, now. The system still recovers, but not faster than with the commit mentioned above.

@gkkovacs
Copy link

gkkovacs commented May 3, 2017

We have been experiencing a similar IO starvation issue on Proxmox 4.x since last year. We even opened Proxmox support threads about it, this is the latest: https://forum.proxmox.com/threads/kvm-guests-on-zfs-freeze-during-backup-restore.34362/

We now think it's an IO scheduling issue in ZFS.
It looks like the following issues are about the same problem: #1538 #5857

Reproduction
Basically, if you have a few KVM guests' RAW disks on ZFS Zvols, and you start to restore another KVM guest to the same host (so heavy write IO is happening for extended periods), the running guests will get starved of IO for so long that their virtual memory subsystem will trip up and all kinds of problems start to appear: CPU soft locks, hung tasks, rcu_scheduler stalls, and all around slowed down disk and network IO (Windows guests regularly lose their RDP connections for several seconds). Extended heavy reads also cause some IO starvation, but it's the writes that really cause problems.

Mitigation
We have tried mitigating this by limiting host and guest swapping, increasing guest vm buffers, and tuning power management, but nothing solved it so far.

We have also recognized that QCOW2 disks with cache=writeback are much less sensitive to the IO starvation from the KVM guests' point of view than ZVOL RAW disks with cache=none, but it's not yet clear why:

  • either ZFS schedules parallel file IO better between files compared to parallel IO between ZVOLs,
  • or simply the writeback cache solves the guests problems by caching their swap efficiently to evade detection.

@krichter722
Copy link
Author

VirtualBox 5.1.22 changed the behaviour from 5.1.20 to hang with 1 CPU core at 100% for ever without any noticable I/O activity of the Virtualbox process or ZFS. There's been enough input about other problems on other systems and step to reproduce, thanks for that.

The issue persist on Ubuntu 17.04 with Linux 4.10.0-20-generic and ZFS 0.7.0-rc3_250_gc17486b217 with SPL 0.7.0-rc3_8_g481762f6.

@krichter722
Copy link
Author

Performance has decreased dramatically with SPL 0.7.0-rc4_1_g8f87971e and ZFS 0.7.0-rc4_12_g3d6da72d18 on Ubuntu 17.04 with Linux 4.10.0-21-generic. Simple processes which I suspect require a lot of I/O operations, but not too much data throughput, like git fetch and compiling gcc, get stuck with the symptoms of the issue making ZFS quite unusable with software other than VirtualBox.

@behlendorf
Copy link
Contributor

@krichter722 there's been some discussion in #6127 about this. Could you try setting the zvol_request_sync module option to 1. You can do this at run time. This should effectively revert to the zvols to synchronous IO which is what they defaulted to in -rc3.

@krichter722
Copy link
Author

there's been some discussion in #6127 about this. Could you try setting the zvol_request_sync module option to 1. You can do this at run time. This should effectively revert to the zvols to synchronous IO which is what they defaulted to in -rc3.

@behlendorf I did set zvol_request_sync to 1 (and confirmed that the value has been set). It causes the time of unresponsiveness to decrease, but heavy I/O during an Ubuntu installation on VirtualBox still affects the responsiveness of multiple programs (e.g. Firefox and LibreOffice for more than a minute). While the programs are not responsive only reading with low data rates (few 100 KB/s) occurs while the write rate remains at 0 B/s according to iotop.

@ioquatix
Copy link

During the time of unresponsiveness, what's your load average?

@ioquatix
Copy link

I found on a small box, even doing mild random IO causes my load average to go through the roof (60ish)

@krichter722
Copy link
Author

During the time of unresponsiveness, what's your load average?

@ioquatix It goes up to 27 and is at 20 average with zvol_request_sync set to 1 and up to 25 with it set to 0, so not that bad, but I've seen 150 in other combinations of Kernel and ZFS version.

@ioquatix
Copy link

That's a pretty crazy high load average. So, the question is, why is that happening? What is ZFS trying to schedule and why is it causing a massive load on the CPU?

@ryao
Copy link
Contributor

ryao commented May 20, 2017

@ioquatix I have a fix in development at work. It will be ready within days.

@ioquatix
Copy link

@ryao That's awesome - I'm really interested to know more about it! Thanks for your hard work.

@ryao
Copy link
Contributor

ryao commented May 20, 2017

@ioquatix I wrote some more here:

#6127 (comment)

For some more background, I am now employed by a company that is using zvols on ZFS to implement a HA block storage solution. The current HA solution require sync=always be set on zvols and the work in #6133 (which is still a work in progress) improves sync=always performance. I have it operating at 50,000 IOPS on rather beefy hardware such that the bottlenecks have shifted to other places that affect everyone using zvols. While I have more bugs to fix in #6133 to make it production ready, I am working on alleviating those more general bottlenecks and that should imply an improvement.

The general issue that has my attention right now is the fact that the zvol IO threads break the throttle function such that we end up throttling where we shouldn't and failing to throttle where we should. This causes inconsistent performance and is a kind of thundering herd issue. zvol_request_sync=1 disables the threads, but then we do blocking on synchronous IO (fixed by #6133) and queuing asynchronous IO for writeback is higher latency than it needs to be. The fix I described does a hybrid approach where we make the bits that must be synchronous to keep the throttle working synchronous while making the other bits asynchronous. I expect this in conjunction with #6133 to eliminate the potential for thundering herds that I see right now..

If virtualbox incorrectly makes every IO synchronous (which I vaguely recall is its default), then you would need both fixes and you might want to wait until I have finished solving regressions triggered by #6133 before applying it. If virtualbox properly honors flush semantics and your virtual machine does not flush often enough to make flushes concurrent, then the fix for #6127 that I am developing should be all that you need.

@krichter722
Copy link
Author

krichter722 commented Jun 6, 2017

I just experienced a very low I/O performance during git clean -x -f -d without any impact on other processes in https://github.com/wxWidgets/Phoenix.git after building with python build.py dox build_wx --jobs=8 etg --nodoc sip build_py. The system load went up to 95 and cleaning took 12 minutes for 642MB (according to du -s -h). I was using git 2.11.0 on Ubuntu 17.04 amd64.

@krichter722
Copy link
Author

With spl 0.7.0-13_ge8474f9 and zfs 0.7.0-94_g0c484ab56 on 4.10.0-35-generic the behaviour has changed so that now 1 dp_sync_taskq process/thread is running at 99% in iotop with very low throughput (down to 20KB/s to 2000KB/s for minutes) when before 6 z_fr_iss_[n] threads where at 99% with not that low throughput (see above). Before the I/O responsiveness drops dramatically ZFS now reaches acceptable throughput of 40MB/s to 80MB/s which is nice for a 1 SSD cache partition+ 1HDD pool partition pool during an installation of Ubuntu in VirtualBox.

@behlendorf behlendorf added this to the 0.8.0 milestone Sep 28, 2017
@krichter722
Copy link
Author

krichter722 commented Oct 20, 2017

On 4.13.0-16-generic with SPL 0.7.0-17_g28920ea3 and ZFS 0.7.0-136_g1cfdb0e6e I'm experiencing with I/O slow down with

[ 6767.341035] INFO: task firefox:25090 blocked for more than 120 seconds.
[ 6767.341039]       Tainted: P           OE   4.13.0-16-generic #19-Ubuntu
[ 6767.341040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6767.341042] firefox         D    0 25090      1 0x00000004
[ 6767.341044] Call Trace:
[ 6767.341050]  __schedule+0x28b/0x890
[ 6767.341053]  schedule+0x36/0x80
[ 6767.341061]  cv_wait_common+0x11e/0x140 [spl]
[ 6767.341065]  ? wait_woken+0x80/0x80
[ 6767.341070]  __cv_wait+0x15/0x20 [spl]
[ 6767.341123]  txg_wait_open+0xb0/0x100 [zfs]
[ 6767.341157]  dmu_tx_wait+0x389/0x3a0 [zfs]
[ 6767.341185]  dmu_tx_assign+0x8f/0x480 [zfs]
[ 6767.341208]  dmu_free_long_range_impl+0x2a6/0x410 [zfs]
[ 6767.341228]  dmu_free_long_range+0x6d/0xb0 [zfs]
[ 6767.341258]  zfs_trunc+0x79/0x220 [zfs]
[ 6767.341287]  ? sa_lookup+0xc9/0x100 [zfs]
[ 6767.341316]  zfs_freesp+0xaa/0x480 [zfs]
[ 6767.341346]  ? zfs_zaccess_common+0xda/0x1f0 [zfs]
[ 6767.341380]  ? zfs_zaccess+0x106/0x3d0 [zfs]
[ 6767.341409]  zfs_setattr+0x982/0x1b50 [zfs]
[ 6767.341413]  ? mutex_lock+0x12/0x40
[ 6767.341415]  ? __slab_alloc+0x20/0x40
[ 6767.341416]  ? __kmalloc_node+0x1f6/0x2a0
[ 6767.341421]  ? spl_kmem_zalloc+0xc2/0x170 [spl]
[ 6767.341449]  zpl_setattr+0xf8/0x170 [zfs]
[ 6767.341452]  notify_change+0x2e5/0x430
[ 6767.341454]  do_truncate+0x73/0xc0
[ 6767.341456]  path_openat+0xf88/0x1630
[ 6767.341457]  ? SyS_getdents+0xd4/0x130
[ 6767.341459]  do_filp_open+0x9b/0x110
[ 6767.341460]  ? __check_object_size+0xaf/0x1b0
[ 6767.341462]  ? __alloc_fd+0x46/0x170
[ 6767.341464]  do_sys_open+0x1bb/0x2b0
[ 6767.341465]  ? do_sys_open+0x1bb/0x2b0
[ 6767.341467]  SyS_openat+0x14/0x20
[ 6767.341468]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 6767.341470] RIP: 0033:0x7f3c839e770c
[ 6767.341470] RSP: 002b:00007ffc1555a0c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[ 6767.341471] RAX: ffffffffffffffda RBX: 00007f3c4e29e62b RCX: 00007f3c839e770c
[ 6767.341472] RDX: 0000000000000241 RSI: 00007f3c4f262508 RDI: ffffffffffffff9c
[ 6767.341473] RBP: 0000000000010000 R08: 0000000000000000 R09: 00007ffc1555a300
[ 6767.341473] R10: 00000000000001a4 R11: 0000000000000293 R12: 00007f3c4e295000
[ 6767.341474] R13: 000000000000a000 R14: 00007f3c4e200000 R15: 00007f3c82700040
[ 6767.341505] INFO: task Port0:12738 blocked for more than 120 seconds.
[ 6767.341507]       Tainted: P           OE   4.13.0-16-generic #19-Ubuntu
[ 6767.341508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6767.341509] Port0           D    0 12738   8186 0x00000000
[ 6767.341511] Call Trace:
[ 6767.341513]  __schedule+0x28b/0x890
[ 6767.341515]  schedule+0x36/0x80
[ 6767.341520]  cv_wait_common+0x11e/0x140 [spl]
[ 6767.341522]  ? wait_woken+0x80/0x80
[ 6767.341526]  __cv_wait+0x15/0x20 [spl]
[ 6767.341554]  txg_wait_open+0xb0/0x100 [zfs]
[ 6767.341577]  dmu_tx_wait+0x389/0x3a0 [zfs]
[ 6767.341598]  dmu_tx_assign+0x8f/0x480 [zfs]
[ 6767.341626]  zfs_write+0x403/0xd30 [zfs]
[ 6767.341627]  ? dequeue_task_fair+0x49f/0x640
[ 6767.341656]  zpl_write_common_iovec+0x8c/0xe0 [zfs]
[ 6767.341683]  zpl_iter_write+0xae/0xe0 [zfs]
[ 6767.341685]  new_sync_write+0xe1/0x130
[ 6767.341687]  __vfs_write+0x26/0x40
[ 6767.341688]  vfs_write+0xb1/0x1a0
[ 6767.341690]  SyS_write+0x55/0xc0
[ 6767.341692]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 6767.341693] RIP: 0033:0x7f45a4c5bcc0
[ 6767.341693] RSP: 002b:00007f451ed4afc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 6767.341694] RAX: ffffffffffffffda RBX: 00007f452557d680 RCX: 00007f45a4c5bcc0
[ 6767.341695] RDX: 0000000000001000 RSI: 00007f45641fb000 RDI: 0000000000000026
[ 6767.341696] RBP: 00007f451ed4b840 R08: 0000000000000000 R09: 0000000000000000
[ 6767.341696] R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000060
[ 6767.341697] R13: 0000000034960320 R14: 00007f451ed4b890 R15: 00007f451ed4b888
[ 6769.580475] kauditd_printk_skb: 10 callbacks suppressed

in dmesg. Strangely, the performance is also bad when I delete a VM and its virtual disk data (which I assume shouldn't be different from deleting a file from the filesystem).

@krichter722
Copy link
Author

I got a huge improvement with zfs-0.7.0-349-g5666a994f and spl spl-0.7.0-30-g3673d03 on Ubuntu 17.10 (without a lot of knowledge about why this might be the case I suspect 5666a99 to have introduced the improvement). Concretely I no longer see the slowdown of throughput to down to 20KB/s for minutes. If I torture the system with two VirtualBox VMs, updatedb and copying the gcc git repository (all involving datasets with dedup=on) the system hangs for a few seconds, but that's acceptable for the task and hardware it runs on.

However, VirtualBox Ubuntu and Debian VMs still experience I/O errors and mount their / filesystem read-only and I'm still seeing

[23199.926000] INFO: task txg_sync:3050 blocked for more than 120 seconds.
[23199.926008]       Tainted: P           OE   4.13.0-21-generic #24-Ubuntu
[23199.926011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23199.926028] txg_sync        D    0  3050      2 0x00000000
[23199.926032] Call Trace:
[23199.926042]  __schedule+0x28b/0x890
[23199.926045]  schedule+0x36/0x80
[23199.926048]  schedule_timeout+0x182/0x350
[23199.926053]  ? call_timer_fn+0x130/0x130
[23199.926055]  io_schedule_timeout+0x1e/0x50
[23199.926057]  ? io_schedule_timeout+0x1e/0x50
[23199.926085]  __cv_timedwait_common+0x12c/0x170 [spl]
[23199.926091]  ? wait_woken+0x80/0x80
[23199.926117]  __cv_timedwait_io+0x19/0x20 [spl]
[23199.926246]  zio_wait+0x113/0x260 [zfs]
[23199.926315]  dsl_pool_sync+0xb8/0x420 [zfs]
[23199.926387]  spa_sync+0x43e/0xd20 [zfs]
[23199.926470]  txg_sync_thread+0x2d4/0x4a0 [zfs]
[23199.926549]  ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[23199.926558]  thread_generic_wrapper+0x72/0x80 [spl]
[23199.926563]  kthread+0x125/0x140
[23199.926569]  ? __thread_exit+0x20/0x20 [spl]
[23199.926572]  ? kthread_create_on_node+0x70/0x70
[23199.926577]  ret_from_fork+0x25/0x30
[23199.926627] INFO: task mozStorage #4:8012 blocked for more than 120 seconds.
[23199.926632]       Tainted: P           OE   4.13.0-21-generic #24-Ubuntu
[23199.926633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

With the improvement it becomes possible again that the blame might be on VirtualBox' side.

I turned dedup=off for a few days without much change to the datasets and now turned it back on for a few days. The effect might only have been temporarily because processes don't have to read so much deduplicated data, but I doubt that give the time the system runs fine.

@behlendorf
Copy link
Contributor

@krichter722 my hunch would be it's in fact commit e9a7729 which helped performance. This fix will be included in 0.7.7.

@krichter722
Copy link
Author

my hunch would be it's in fact commit e9a7729 which helped performance. This fix will be included in 0.7.7.

@behlendorf Sounds reasonable. However, I updated my repository almost on a daily basis and I'm certain that I built a version which I included that commit and ran it for more than a week without noticable change. The large improvement experienced might also be caused by a complex combination of commits together with de- and re-enabling dedup. The performance gains make ZFS usable with dedup again which is just awesome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@ioquatix @behlendorf @ryao @krichter722 @gkkovacs and others