Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock between zpool export and zfs list #3137

Closed
wants to merge 1 commit into from
Closed

Fix deadlock between zpool export and zfs list #3137

wants to merge 1 commit into from

Conversation

thegreatgazoo
Copy link

Pool reference count is NOT checked in spa_export_common()
if the pool has been imported readonly=on, i.e. spa->spa_sync_on
is FALSE. Then zpool export and zfs list may deadlock:

  1. Pool A is imported readonly.
  2. zpool export A and zfs list are run concurrently.
  3. zfs command gets reference on the spa, which holds a dbuf on
    on the MOS meta dnode.
  4. zpool command grabs spa_namespace_lock, and tries to evict dbufs
    of the MOS meta dnode. The dbuf held by zfs command can't be
    evicted as its reference count is not 0.
  5. zpool command blocks in dnode_special_close() waiting for the
    MOS meta dnode reference count to drop to 0, with
    spa_namespace_lock held.
  6. zfs command tries to get the spa_namespace_lock with a reference
    on the spa held, which holds a dbuf on the MOS meta dnode.
  7. Now zpool command and zfs command deadlock each other.

Also any further zfs/zpool command will block on spa_namespace_lock
forever.

The fix is to always check pool reference count in spa_export_common(),
no matter whether the pool was imported readonly or not.

Signed-off-by: Isaac Huang he.huang@intel.com
Closes #2034

@behlendorf
Copy link
Contributor

The intent of the patch here is definitely right but we're going to need to settle on the cleanest way to rework the code. This is EBUSY check needs to happen after txg_wait_synced() but before vdev_config_dirty(). How about something like this? Or replicating it in an else block? I'm not sure which I like better.

diff --git a/module/zfs/spa.c b/module/zfs/spa.c
index 861a319..71e563b 100644
--- a/module/zfs/spa.c
+++ b/module/zfs/spa.c
@@ -4272,20 +4272,23 @@ spa_export_common(char *pool, int new_state, nvlist_t **
                 * have to force it to sync before checking spa_refcnt.
                 */
                txg_wait_synced(spa->spa_dsl_pool, 0);
+       }

-               /*
-                * A pool cannot be exported or destroyed if there are active
-                * references.  If we are resetting a pool, allow references by
-                * fault injection handlers.
-                */
-               if (!spa_refcount_zero(spa) ||
-                   (spa->spa_inject_ref != 0 &&
-                   new_state != POOL_STATE_UNINITIALIZED)) {
-                       spa_async_resume(spa);
-                       mutex_exit(&spa_namespace_lock);
-                       return (SET_ERROR(EBUSY));
-               }

+       /*
+        * A pool cannot be exported or destroyed if there are active
+        * references.  If we are resetting a pool, allow references by
+        * fault injection handlers.
+        */
+       if (!spa_refcount_zero(spa) ||
+           (spa->spa_inject_ref != 0 &&
+           new_state != POOL_STATE_UNINITIALIZED)) {
+               spa_async_resume(spa);
+               mutex_exit(&spa_namespace_lock);
+               return (SET_ERROR(EBUSY));
+       }
+
+       if (spa->spa_state != POOL_STATE_UNINITIALIZED && spa->spa_sync_on) {
                /*
                 * A pool cannot be exported if it has an active shared spare.
                 * This is to prevent other pools stealing the active spare

Pool reference count is NOT checked in spa_export_common()
if the pool has been imported readonly=on, i.e. spa->spa_sync_on
is FALSE. Then zpool export and zfs list may deadlock:

1. Pool A is imported readonly.
2. zpool export A and zfs list are run concurrently.
3. zfs command gets reference on the spa, which holds a dbuf on
   on the MOS meta dnode.
4. zpool command grabs spa_namespace_lock, and tries to evict dbufs
   of the MOS meta dnode. The dbuf held by zfs command can't be
   evicted as its reference count is not 0.
5. zpool command blocks in dnode_special_close() waiting for the
   MOS meta dnode reference count to drop to 0, with
   spa_namespace_lock held.
6. zfs command tries to get the spa_namespace_lock with a reference
   on the spa held, which holds a dbuf on the MOS meta dnode.
7. Now zpool command and zfs command deadlock each other.

Also any further zfs/zpool command will block on spa_namespace_lock
forever.

The fix is to always check pool reference count in spa_export_common(),
no matter whether the pool was imported readonly or not.

Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes #2034
@thegreatgazoo
Copy link
Author

There seemed no need to check spa ref count when spa->spa_state == POOL_STATE_UNINITIALIZED. So I added a goto to skip the whole branch for that case, which also simplified the conditionals a bit.

@behlendorf
Copy link
Contributor

@thegreatgazoo Good idea, I like this a lot better. My only minor quibble would be that it probably makes sense to pull the 'Objsets may be open...' comment out of the 'if ()' conditional for readability. I'm happy do that in the merge if you like.

@behlendorf behlendorf added this to the 0.6.4 milestone Mar 2, 2015
@behlendorf
Copy link
Contributor

@thegreatgazoo thanks for the fix! I've refreshed the comment and merged this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

umount hangs in dnode_special_close()
3 participants