Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Improve zvol initialization code #1477

Closed
wants to merge 2 commits into from

2 participants

@ryao

There is an extremely odd bug that can cause zvols to fail to appear on some systems, but not others. Recently, I was able to consistently reproduce this issue over a period of 1 month. I found several things wrong in how zvols were being initialized, so I fixed them with some inspiration from FreeBSD. I can no longer reproduce the missing zvol issue after applying these fixes.

Pawel Jakub Dawidek Call zvol_create_minors() in spa_open_common() when initializing pool
There is an extremely odd bug that causes zvols to fail to appear on
some systems, but not others. Recently, I was able to consistently
reproduce this issue over a period of 1 month. The issue disappeared
after I applied this change from FreeBSD.

This is from FreeBSD's pool version 28 import, which occurred in
revision 219089.

Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
6a6dcc4
@ryao

Upon further review, it appears that the spa.c changes from FreeBSD should not be necessary because zfs_ioc_pool_import should call zvol_create_minors(). However, the zvols will not appear without the spa.c changes and my trace statements in zfs_ioc_pool_import() to see what zvol_create_minors() is returning (or the error code that prevents it) are not being printed, which seems paradoxical.

This merits investigation, but I currently have other priorities, so I am not sure how much time I will find to put into that. The spa.c changes should be safe to make even without more in-depth investigation. That is because zvol_create_minors() will not create minors for zvols whose minors are already created.

@ryao

These patches are now part of Gentoo's zfs kernel module package.

@behlendorf
Owner

@ryao Where do these changes stand currently? I still need to do a careful review but have they been working without trouble in Gentoo?

@behlendorf

I can get on board with this. Presumably moving the minor creation out of the module init context is what helps here.

spa_import_rootpool() is not used by anything right now and there does not appear to be any mechanism to read zpool.cache as part of module initialization. Provided those two observations are correct, minor creation at module initialization should not have any effect.

There is already code to create the minors when the pool is imported, but it seems to miss a case that this additional code catches. What is currently done appears to be enough for Illumos, but it is not enough for Linux and presumably, FreeBSD. The addition of this code ensures that all cases are handled in testing. My only guess as to why this additional code is necessary is that a race was happening, but I put this down before I understood what that race was.

@ryao ryao Cleanup zvol initialization code
The following error will occur on some (possibly all) kernels because
blk_init_queue() will try to take the spinlock before we initialize it.

[    5.538871] BUG: spinlock bad magic on CPU#0, zpool/4054
[    5.538885]  lock: 0xffff88021a73de60, .magic: 00000000, .owner:
<none>/-1, .owner_cpu: 0
[    5.538888] Pid: 4054, comm: zpool Not tainted 3.9.3 #11
[    5.538890] Call Trace:
[    5.538898]  [<ffffffff81478ef8>] spin_dump+0x8c/0x91
[    5.538902]  [<ffffffff81478f1e>] spin_bug+0x21/0x26
[    5.538906]  [<ffffffff812da097>] do_raw_spin_lock+0x127/0x130
[    5.538911]  [<ffffffff81253301>] ? zvol_probe+0x91/0xf0
[    5.538914]  [<ffffffff8147d851>] _raw_spin_lock_irq+0x21/0x30
[    5.538919]  [<ffffffff812c2c1e>] cfq_init_queue+0x1fe/0x350
[    5.538922]  [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0
[    5.538926]  [<ffffffff812aacb8>] elevator_init+0x78/0x140
[    5.538930]  [<ffffffff812b2677>] blk_init_allocated_queue+0x87/0xb0
[    5.538933]  [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0
[    5.538937]  [<ffffffff812b26d5>] blk_init_queue_node+0x35/0x70
[    5.538941]  [<ffffffff812b271e>] blk_init_queue+0xe/0x10
[    5.538944]  [<ffffffff8125211b>] __zvol_create_minor+0x24b/0x620
[    5.538947]  [<ffffffff81253264>] zvol_create_minors_cb+0x24/0x30
[    5.538952]  [<ffffffff811bd9ca>] dmu_objset_find_spa+0xea/0x510
[    5.538955]  [<ffffffff81253240>] ? zvol_free+0x60/0x60
[    5.538958]  [<ffffffff811bda71>] dmu_objset_find_spa+0x191/0x510
[    5.538962]  [<ffffffff81253240>] ? zvol_free+0x60/0x60
[    5.538965]  [<ffffffff81253ea2>] zvol_create_minors+0x92/0x180
[    5.538969]  [<ffffffff811f8d80>] spa_open_common+0x250/0x380
[    5.538973]  [<ffffffff811f8ece>] spa_open+0xe/0x10
[    5.538977]  [<ffffffff8122817e>] pool_status_check.part.22+0x1e/0x80
[    5.538980]  [<ffffffff81228a55>] zfsdev_ioctl+0x155/0x190
[    5.538984]  [<ffffffff8116a695>] do_vfs_ioctl+0x325/0x5a0
[    5.538989]  [<ffffffff81163f1d>] ? final_putname+0x1d/0x40
[    5.538992]  [<ffffffff8116a950>] sys_ioctl+0x40/0x80
[    5.538996]  [<ffffffff814812c9>] ? do_page_fault+0x9/0x10
[    5.539000]  [<ffffffff81483929>] system_call_fastpath+0x16/0x1b
[    5.541118]  zd0: unknown partition table

We fix this by calling spin_lock_init before blk_init_queue.

The manner in which zvol_init() initializes structures is
suspectible to a race between initialization and a probe on a zvol. We
reorganize zvol_init() to prevent that.

Lastly, calling zvol_create_minors(NULL) in zvol_init() does nothing
because no pools are imported, so we remove it.

Signed-off-by: Richard Yao <ryao@gentoo.org>
caea5cf
@behlendorf
Owner

@ryao #1528 appears to have been caused by these changes. There's a race now where we can accidentally hand out the same minor twice. That's going to need explaining.

@ryao

@behlendorf I do not see how that is possible. Multiple calls to __zvol_create_minor() should fail with EEXIST. It is also not clear to me that these patch was applied in issue #1528.

@ryao

@behlendorf I seemed to have missed your earlier query. No one has reported any problems with these patches in Gentoo.

@ryao

After talking on IRC, it looks like there is some sort of race in issue #1528 and this patch did play a role in it, but I am having trouble seeing how this happened. At this point, I am inclined to think that the compiler did something strange.

@behlendorf

This line actually isn't dead, although perhaps it should be. This will cause us to walk the spa config namespace which contains a list of pool names obtained from the zpool cache file. For each of these pools it is then opened via spa_open() which is why the zvols are create during module load. Doing this all at module load time is probably not the best of ideas but changing it will alter the long standing behavior in ZoL. As a half step we could just move this to the zvol taskq for now.

That is correct. I made a comment in zfsonlinux#1528 (comment) a few days ago, but I have yet to update the commit message to clarify that.

As far as I can tell, Illumos does not have this zvol_create_minors() call and it has no problem making the block devices. With that said, additional attempts to create minors will silently fail, which should make retaining this harmless.

My concern is that it occurs during module init which could perhaps explain some of the weirdness. It appears almost everything is initialized at this point but not quite everything so there's the potential for bugs here. I'm thinking about removing it as you suggest although that will change the current behavior in the sense that block devices won't appear immediately after module load. Some userspace command such as zpool list would need to be run so the pools get opened and are not just listed in the configuration cache. Alternately, we could dispatch a work item to the system taskq to asynchronously open all the pools in the configuration cache.

@behlendorf
Owner

@ryao Refreshed as #1564, please review.

@behlendorf behlendorf closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 28, 2013
  1. @ryao

    Call zvol_create_minors() in spa_open_common() when initializing pool

    Pawel Jakub Dawidek authored ryao committed
    There is an extremely odd bug that causes zvols to fail to appear on
    some systems, but not others. Recently, I was able to consistently
    reproduce this issue over a period of 1 month. The issue disappeared
    after I applied this change from FreeBSD.
    
    This is from FreeBSD's pool version 28 import, which occurred in
    revision 219089.
    
    Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
Commits on Jun 21, 2013
  1. @ryao

    Cleanup zvol initialization code

    ryao authored
    The following error will occur on some (possibly all) kernels because
    blk_init_queue() will try to take the spinlock before we initialize it.
    
    [    5.538871] BUG: spinlock bad magic on CPU#0, zpool/4054
    [    5.538885]  lock: 0xffff88021a73de60, .magic: 00000000, .owner:
    <none>/-1, .owner_cpu: 0
    [    5.538888] Pid: 4054, comm: zpool Not tainted 3.9.3 #11
    [    5.538890] Call Trace:
    [    5.538898]  [<ffffffff81478ef8>] spin_dump+0x8c/0x91
    [    5.538902]  [<ffffffff81478f1e>] spin_bug+0x21/0x26
    [    5.538906]  [<ffffffff812da097>] do_raw_spin_lock+0x127/0x130
    [    5.538911]  [<ffffffff81253301>] ? zvol_probe+0x91/0xf0
    [    5.538914]  [<ffffffff8147d851>] _raw_spin_lock_irq+0x21/0x30
    [    5.538919]  [<ffffffff812c2c1e>] cfq_init_queue+0x1fe/0x350
    [    5.538922]  [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0
    [    5.538926]  [<ffffffff812aacb8>] elevator_init+0x78/0x140
    [    5.538930]  [<ffffffff812b2677>] blk_init_allocated_queue+0x87/0xb0
    [    5.538933]  [<ffffffff81253360>] ? zvol_probe+0xf0/0xf0
    [    5.538937]  [<ffffffff812b26d5>] blk_init_queue_node+0x35/0x70
    [    5.538941]  [<ffffffff812b271e>] blk_init_queue+0xe/0x10
    [    5.538944]  [<ffffffff8125211b>] __zvol_create_minor+0x24b/0x620
    [    5.538947]  [<ffffffff81253264>] zvol_create_minors_cb+0x24/0x30
    [    5.538952]  [<ffffffff811bd9ca>] dmu_objset_find_spa+0xea/0x510
    [    5.538955]  [<ffffffff81253240>] ? zvol_free+0x60/0x60
    [    5.538958]  [<ffffffff811bda71>] dmu_objset_find_spa+0x191/0x510
    [    5.538962]  [<ffffffff81253240>] ? zvol_free+0x60/0x60
    [    5.538965]  [<ffffffff81253ea2>] zvol_create_minors+0x92/0x180
    [    5.538969]  [<ffffffff811f8d80>] spa_open_common+0x250/0x380
    [    5.538973]  [<ffffffff811f8ece>] spa_open+0xe/0x10
    [    5.538977]  [<ffffffff8122817e>] pool_status_check.part.22+0x1e/0x80
    [    5.538980]  [<ffffffff81228a55>] zfsdev_ioctl+0x155/0x190
    [    5.538984]  [<ffffffff8116a695>] do_vfs_ioctl+0x325/0x5a0
    [    5.538989]  [<ffffffff81163f1d>] ? final_putname+0x1d/0x40
    [    5.538992]  [<ffffffff8116a950>] sys_ioctl+0x40/0x80
    [    5.538996]  [<ffffffff814812c9>] ? do_page_fault+0x9/0x10
    [    5.539000]  [<ffffffff81483929>] system_call_fastpath+0x16/0x1b
    [    5.541118]  zd0: unknown partition table
    
    We fix this by calling spin_lock_init before blk_init_queue.
    
    The manner in which zvol_init() initializes structures is
    suspectible to a race between initialization and a probe on a zvol. We
    reorganize zvol_init() to prevent that.
    
    Lastly, calling zvol_create_minors(NULL) in zvol_init() does nothing
    because no pools are imported, so we remove it.
    
    Signed-off-by: Richard Yao <ryao@gentoo.org>
This page is out of date. Refresh to see the latest.
Showing with 26 additions and 12 deletions.
  1. +9 −0 module/zfs/spa.c
  2. +17 −12 module/zfs/zvol.c
View
9 module/zfs/spa.c
@@ -64,6 +64,7 @@
#include <sys/zfs_ioctl.h>
#include <sys/dsl_scan.h>
#include <sys/zfeature.h>
+#include <sys/zvol.h>
#ifdef _KERNEL
#include <sys/bootprops.h>
@@ -2856,6 +2857,7 @@ spa_open_common(const char *pool, spa_t **spapp, void *tag, nvlist_t *nvpolicy,
spa_load_state_t state = SPA_LOAD_OPEN;
int error;
int locked = B_FALSE;
+ int firstopen = B_FALSE;
*spapp = NULL;
@@ -2879,6 +2881,8 @@ spa_open_common(const char *pool, spa_t **spapp, void *tag, nvlist_t *nvpolicy,
if (spa->spa_state == POOL_STATE_UNINITIALIZED) {
zpool_rewind_policy_t policy;
+ firstopen = B_TRUE;
+
zpool_get_rewind_policy(nvpolicy ? nvpolicy : spa->spa_config,
&policy);
if (policy.zrp_request & ZPOOL_DO_REWIND)
@@ -2953,6 +2957,11 @@ spa_open_common(const char *pool, spa_t **spapp, void *tag, nvlist_t *nvpolicy,
mutex_exit(&spa_namespace_lock);
}
+#ifdef _KERNEL
+ if (firstopen)
+ zvol_create_minors(spa->spa_name);
+#endif
+
*spapp = spa;
return (0);
View
29 module/zfs/zvol.c
@@ -1217,6 +1217,9 @@ zvol_alloc(dev_t dev, const char *name)
if (zv == NULL)
goto out;
+ spin_lock_init(&zv->zv_lock);
+ list_link_init(&zv->zv_next);
+
zv->zv_queue = blk_init_queue(zvol_request, &zv->zv_lock);
if (zv->zv_queue == NULL)
goto out_kmem;
@@ -1250,9 +1253,6 @@ zvol_alloc(dev_t dev, const char *name)
sizeof (rl_t), offsetof(rl_t, r_node));
zv->zv_znode.z_is_zvol = TRUE;
- spin_lock_init(&zv->zv_lock);
- list_link_init(&zv->zv_next);
-
zv->zv_disk->major = zvol_major;
zv->zv_disk->first_minor = (dev & MINORMASK);
zv->zv_disk->fops = &zvol_ops;
@@ -1563,30 +1563,35 @@ zvol_init(void)
{
int error;
+ list_create(&zvol_state_list, sizeof (zvol_state_t),
+ offsetof(zvol_state_t, zv_next));
+ mutex_init(&zvol_state_lock, NULL, MUTEX_DEFAULT, NULL);
+
zvol_taskq = taskq_create(ZVOL_DRIVER, zvol_threads, maxclsyspri,
zvol_threads, INT_MAX, TASKQ_PREPOPULATE);
if (zvol_taskq == NULL) {
printk(KERN_INFO "ZFS: taskq_create() failed\n");
- return (-ENOMEM);
+ error = -ENOMEM;
+ goto out1;
}
error = register_blkdev(zvol_major, ZVOL_DRIVER);
if (error) {
printk(KERN_INFO "ZFS: register_blkdev() failed %d\n", error);
- taskq_destroy(zvol_taskq);
- return (error);
+ goto out2;
}
blk_register_region(MKDEV(zvol_major, 0), 1UL << MINORBITS,
THIS_MODULE, zvol_probe, NULL, NULL);
- mutex_init(&zvol_state_lock, NULL, MUTEX_DEFAULT, NULL);
- list_create(&zvol_state_list, sizeof (zvol_state_t),
- offsetof(zvol_state_t, zv_next));
-
- (void) zvol_create_minors(NULL);
-
return (0);
+
+out2:
+ taskq_destroy(zvol_taskq);
+out1:
+ mutex_destroy(&zvol_state_lock);
+ list_destroy(&zvol_state_list);
+ return (error);
}
void
Something went wrong with that request. Please try again.