From 51184d20d49ff0905f6d8ad9c910c8db25590cb2 Mon Sep 17 00:00:00 2001 From: loli10K Date: Sat, 14 Sep 2019 03:09:59 +0200 Subject: [PATCH] Scrubbing root pools may deadlock on kernels without elevator_change() (#9321) Originally the zfs_vdev_elevator module option was added as a convenience so the requested elevator would be automatically set on the underlying block devices. At the time this was simple because the kernel provided an API function which did exactly this. This API was then removed in the Linux 4.12 kernel which prompted us to add compatibly code to set the elevator via a usermodehelper. Unfortunately changing the evelator via usermodehelper requires reading some userland binaries, most notably modprobe(8) or sh(1), from a zfs dataset on systems with root-on-zfs. This can deadlock the system if used during the following call path because it may need, if the data is not already cached in the ARC, reading directly from disk while holding the spa config lock as a writer: zfs_ioc_pool_scan() -> spa_scan() -> spa_scan() -> vdev_reopen() -> vdev_elevator_switch() -> call_usermodehelper() While the usermodehelper waits sh(1), modprobe(8) is blocked in the ZIO pipeline trying to read from disk: INFO: task modprobe:2650 blocked for more than 10 seconds. Tainted: P OE 5.2.14 modprobe D 0 2650 206 0x00000000 Call Trace: ? __schedule+0x244/0x5f0 schedule+0x2f/0xa0 cv_wait_common+0x156/0x290 [spl] ? do_wait_intr_irq+0xb0/0xb0 spa_config_enter+0x13b/0x1e0 [zfs] zio_vdev_io_start+0x51d/0x590 [zfs] ? tsd_get_by_thread+0x3b/0x80 [spl] zio_nowait+0x142/0x2f0 [zfs] arc_read+0xb2d/0x19d0 [zfs] ... zpl_iter_read+0xfa/0x170 [zfs] new_sync_read+0x124/0x1b0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x4f/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This commit changes how we use the usermodehelper functionality from synchronous (UMH_WAIT_PROC) to asynchronous (UMH_NO_WAIT) which prevents scrubs, and other vdev_elevator_switch() consumers, from triggering the aforementioned issue. Signed-off-by: Brian Behlendorf Signed-off-by: loli10K Issue #8664 Closes #9321 --- module/zfs/vdev_disk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c index 1686ddfce77d..46437f21fb78 100644 --- a/module/zfs/vdev_disk.c +++ b/module/zfs/vdev_disk.c @@ -220,7 +220,7 @@ vdev_elevator_switch(vdev_t *v, char *elevator) char *envp[] = { NULL }; argv[2] = kmem_asprintf(SET_SCHEDULER_CMD, device, elevator); - error = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); + error = call_usermodehelper(argv[0], argv, envp, UMH_NO_WAIT); strfree(argv[2]); #endif /* HAVE_ELEVATOR_CHANGE */ if (error) {