Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC on already held dp_config_rwlock after zvol_set_volsize #2039

Closed
clefru opened this issue Jan 11, 2014 · 4 comments
Closed

PANIC on already held dp_config_rwlock after zvol_set_volsize #2039

clefru opened this issue Jan 11, 2014 · 4 comments
Milestone

Comments

@clefru
Copy link
Contributor

clefru commented Jan 11, 2014

With a debug build I get the following ASSERT reliably triggered on

zfs set volsize=somebiggernummber pool/zvol

[89598.753361] SPLError: 21252:0:(dsl_pool.c:1032:dsl_pool_config_enter()) ASSERTION(!rrw_held(&dp->dp_config_rwlock, RW_READER)) failed
[89598.756105] SPLError: 21252:0:(dsl_pool.c:1032:dsl_pool_config_enter()) SPL PANIC
[89598.757486] SPL: Showing stack for process 21252
[89598.757489] CPU: 1 PID: 21252 Comm: zfs Tainted: P           O 3.10.25-1-lts #1
[89598.757491] Hardware name:                  /DQ45CB, BIOS CBQ4510H.86A.0079.2009.0414.1340 04/14/2009
[89598.757492]  ffffffffa0363654 ffff88022fa2dbb0 ffffffff814b1e2d ffff88022fa2dbc0
[89598.757496]  ffffffffa01876d7 ffff88022fa2dbe8 ffffffffa0188a2f ffffffffa01a1a51
[89598.757498]  ffff88022fa2dc98 ffff88022ea2ddd8 ffff88022fa2dc10 ffffffffa02a4117
[89598.757501] Call Trace:
[89598.757521]  [] dump_stack+0x19/0x1b
[89598.757528]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[89598.757533]  [] spl_debug_bug+0x7f/0xe0 [spl]
[89598.757553]  [] dsl_pool_config_enter+0x87/0x90 [zfs]
[89598.757570]  [] dsl_pool_hold+0x42/0x50 [zfs]
[89598.757586]  [] dmu_objset_hold+0x26/0xb0 [zfs]
[89598.757604]  [] dsl_prop_get+0x35/0x80 [zfs]
[89598.757620]  [] dsl_prop_get_integer+0x1e/0x20 [zfs]
[89598.757632]  [] zvol_set_volsize+0x1e9/0x3c0 [zfs]
[89598.757639]  [] ? nvpair_value_common.part.21+0xc2/0x150 [znvpair]
[89598.757653]  [] zfs_prop_set_special+0x144/0x4f0 [zfs]
[89598.757668]  [] ? zfs_check_settable.isra.9+0x100/0x450 [zfs]
[89598.757674]  [] ? nvpair_value_uint64+0x33/0x40 [znvpair]
[89598.757680]  [] ? fnvpair_value_uint64+0x1a/0x90 [znvpair]
[89598.757694]  [] zfs_set_prop_nvlist+0x113/0x370 [zfs]
[89598.757709]  [] zfs_ioc_set_prop+0xdf/0x120 [zfs]
[89598.757723]  [] zfsdev_ioctl+0x501/0x570 [zfs]
[89598.757726]  [] ? mutex_unlock+0xe/0x10
[89598.757730]  [] do_vfs_ioctl+0x2dd/0x4b0
[89598.757733]  [] SyS_ioctl+0x81/0xa0
[89598.757736]  [] ? do_page_fault+0xe/0x10
[89598.757739]  [] system_call_fastpath+0x1a/0x1f

Apparently something already holds the reader lock when entering dsl_pool_config_enter, but I couldn't spot anything obvious in the call path down from zfs_ioc_set_prop. I didn't check all the "side calls" that happen before this stack descent.

I am running 5d862cb zfs and 921a35a spl on linux-lts 3.10.25

@clefru
Copy link
Contributor Author

clefru commented Jan 11, 2014

I would assume that dsl_pool_hold gets called on zfs create pool/foo. I did that, and it doesn't trigger that assert.

Also changing other attributes of the zvol doesn't trigger this assert, e.g. changing compression.

@clefru
Copy link
Contributor Author

clefru commented Jan 11, 2014

Ah, it's line 344 of zvol.c that's most likely causing the problem:

    VERIFY(dsl_prop_get_integer(name, "readonly", &readonly, NULL) == 0);

I mistook the dsl_prop_get_integer invocation for line 319 above, but the offset into zvol_set_volsize makes this unlikely (zvol_set_volsize+0x1e9/0x3c0).

Moving the readonly check (Line 344-348) outside the dmu_objset_hold/dmu_objset_rele scope should fix the problem. I'll test that tomorrow.

@clefru
Copy link
Contributor Author

clefru commented Jan 12, 2014

Fixing that bumps me into another lock related panic.

[89598.753361] SPLError: 21252:0:(dsl_pool.c:1032:dsl_pool_config_enter()) ASSERTION(!rrw_held(&dp->dp_config_rwlock, RW_READER)) failed
[89598.756105] SPLError: 21252:0:(dsl_pool.c:1032:dsl_pool_config_enter()) SPL PANIC
[89598.757486] SPL: Showing stack for process 21252
[89598.757489] CPU: 1 PID: 21252 Comm: zfs Tainted: P           O 3.10.25-1-lts #1
[89598.757491] Hardware name:                  /DQ45CB, BIOS CBQ4510H.86A.0079.2009.0414.1340 04/14/2009
[89598.757492]  ffffffffa0363654 ffff88022fa2dbb0 ffffffff814b1e2d ffff88022fa2dbc0
[89598.757496]  ffffffffa01876d7 ffff88022fa2dbe8 ffffffffa0188a2f ffffffffa01a1a51
[89598.757498]  ffff88022fa2dc98 ffff88022ea2ddd8 ffff88022fa2dc10 ffffffffa02a4117
[89598.757501] Call Trace:
[89598.757521]  [] dump_stack+0x19/0x1b
[89598.757528]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[89598.757533]  [] spl_debug_bug+0x7f/0xe0 [spl]
[89598.757553]  [] dsl_pool_config_enter+0x87/0x90 [zfs]
[89598.757570]  [] dsl_pool_hold+0x42/0x50 [zfs]
[89598.757586]  [] dmu_objset_hold+0x26/0xb0 [zfs]
[89598.757604]  [] dsl_prop_get+0x35/0x80 [zfs]
[89598.757620]  [] dsl_prop_get_integer+0x1e/0x20 [zfs]
[89598.757632]  [] zvol_set_volsize+0x1e9/0x3c0 [zfs]
[89598.757639]  [] ? nvpair_value_common.part.21+0xc2/0x150 [znvpair]
[89598.757653]  [] zfs_prop_set_special+0x144/0x4f0 [zfs]
[89598.757668]  [] ? zfs_check_settable.isra.9+0x100/0x450 [zfs]
[89598.757674]  [] ? nvpair_value_uint64+0x33/0x40 [znvpair]
[89598.757680]  [] ? fnvpair_value_uint64+0x1a/0x90 [znvpair]
[89598.757694]  [] zfs_set_prop_nvlist+0x113/0x370 [zfs]
[89598.757709]  [] zfs_ioc_set_prop+0xdf/0x120 [zfs]
[89598.757723]  [] zfsdev_ioctl+0x501/0x570 [zfs]
[89598.757726]  [] ? mutex_unlock+0xe/0x10
[89598.757730]  [] do_vfs_ioctl+0x2dd/0x4b0
[89598.757733]  [] SyS_ioctl+0x81/0xa0
[89598.757736]  [] ? do_page_fault+0xe/0x10
[89598.757739]  [] system_call_fastpath+0x1a/0x1f

which is due to zvol_update_volsize calling dmu_tx_assign with TXG_WAIT.

The first panic is properly fixed by using dsl_prop_get_int_ds. The second panic, I worked around by switching to TXG_NOWAIT. See pull request.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jan 13, 2014
Under Linux the zvol_set_volsize() function was originally written
to use dmu_objset_hold()/dmu_objset_rele().  Subsequently, the
dmu_objset_own()/dmu_objset_disown() interfaces were added but
the ZVOL code wasn't updated to take advantage of them.

This was never an issue but after the dsl_pool_config changes
the code now takes the config lock twice.  The cleanest solution
is to shift to using dmu_objset_own() which takes a long hold
on the dataset and does not hold the dsl pool lock.

This patch also slightly restructures the existing code such
that it more closely resembles the upstream Illumos code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2039
@behlendorf
Copy link
Contributor

@clefru Thanks for getting to the root cause of this issue. Your fix does resolve the issue but it has some minor downsides such as switching to TXG_NOWAIT. I'd like to take this opportunity fix this issue cleanly and sync up with the Illumos version of vol_set_volsize(). I've opened #2048 which does this, it would be great if you could also review and test the patch.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 9, 2015
Under Linux the zvol_set_volsize() function was originally written
to use dmu_objset_hold()/dmu_objset_rele().  Subsequently, the
dmu_objset_own()/dmu_objset_disown() interfaces were added but
the ZVOL code wasn't updated to take advantage of them.

This was never an issue but after the dsl_pool_config changes
the code now takes the config lock twice.  The cleanest solution
is to shift to using dmu_objset_own() which takes a long hold
on the dataset and does not hold the dsl pool lock.

This patch also slightly restructures the existing code such
that it more closely resembles the upstream Illumos code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2039
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants