Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs mount -a race #8450

Closed
c0d3z3r0 opened this issue Feb 24, 2019 · 7 comments
Closed

zfs mount -a race #8450

c0d3z3r0 opened this issue Feb 24, 2019 · 7 comments

Comments

@c0d3z3r0
Copy link
Contributor

c0d3z3r0 commented Feb 24, 2019

System information

Type Version/Name
Distribution Name Debian
Distribution Version testing
Linux Kernel 4.20.5 and 4.20.10
Architecture x64
ZFS/SPL Version master / 0.8.0-rc3-g0a1086319

Describe the problem you're observing

Expected behaviour of zfs mount -a is that it detects dependencies and waits for these dependencies to be available before doing any mounts below. For example dpool/data/test depends on dpool/data, when dpool/data is mounted on /var/data and dpool/data/test gets mounted on /var/data/test AFTER /var/data is available to the system.

This is what happens with zfs mount -a:

root@debian:[~]# zfs mount
rpool/ROOT                      /
root@debian:[~]# ls /var/data/
root@debian:[~]# zfs mount -a
root@debian:[~]# zfs mount
rpool/ROOT                      /
dpool/data                       /var/data
dpool/data/test                /var/data/test
# ZFS says mount was ok; but it is not!
root@debian:[~]# ls /var/data/
testfile_dpool_data
# test is missing!
# Unmount everything again
root@debian:[~]# zfs umount dpool/data
umount: /var/data/test: not mounted.
cannot unmount '/var/data/test': umount failed
root@debian:[~]# zfs umount dpool/data/test
umount: /var/data/test: not mounted.
cannot unmount '/var/data/test': umount failed
# Uhm. What!? Check linux mount
root@debian:[~]# mount | grep 'var/data'
dpool/data on /var/data type zfs (rw,relatime,xattr,noacl)
dpool/data/test on /var/data/test type zfs (rw,relatime,xattr,noacl)
# Ok, do a manual unmount
root@debian:[~]# umount /var/data/test
umount: /var/data/test: no mount point specified.
# Hmm, ok, that's caused by the (unwanted) overlayed mount. Unmount it first.
root@debian:[~]# umount /var/data     
root@debian:[~]# ls /var/data
test
root@debian:[~]# zfs mount
rpool/ROOT                      /
dpool/data/test                 /var/data/test
# Ok, this is correct. Then let's unmount dpool/data/test
root@debian:[~]# zfs umount dpool/data/test
root@debian:[~]# zfs mount
rpool/ROOT                      /
root@debian:[~]# ls /var/data/
test
root@debian:[~]# ls /var/data/test/
root@debian:[~]#

Doing it manually:

root@debian:[~]# zfs mount
rpool/ROOT                      /
root@debian:[~]# ls /var/data/
root@debian:[~]# zfs mount dpool/data
root@debian:[~]# ls /var/data/
testfile_dpool_data test
root@debian:[~]# ls /var/data/test
root@debian:[~]# zfs mount dpool/data/test
root@debian:[~]# ls /var/data/test
testfile_dpool_data_test
root@debian:[~]# zfs unmount /var/data/test; zfs unmount /var/data
root@debian:[~]# ls /var/data/
root@debian:[~]# 

The problem here IMHO is that zfs does not wait for the dependency mount (dpool/data) to be really available and mistakenly does a overlayed mount when the sub-zfs (dpool/data/test) becomes available first.

Update:

When setting ZFS_SERIAL_MOUNT=1, everything works fine. So the problem is not a missing wait/sleep/whatever but it's the parallization of dependent mounts (zfs_foreach_mountpoint).

Update 2: See #8450 (comment)

@c0d3z3r0
Copy link
Contributor Author

@kpande I don't want to do an overlay mount. In my example I want to mount dpool/data to /var/data and dpool/data/test to /var/data/test but here dpool/data gets mounted after dpool/data/test which results in an (unwanted) overlay mount

@c0d3z3r0
Copy link
Contributor Author

The solution would be to detect those dependencies in zfs_foreach_mountpoint and put them in the same thread

@c0d3z3r0
Copy link
Contributor Author

c0d3z3r0 commented Feb 24, 2019

yes, it can not be an "unwanted" overlay mount, as by default, ZFS doesn't mount in non-empty directories.

Yes, it should not mount on non-empty directories but it does. This is a race condition caused by parallelization.
I have added some debug output. Here we see that both empty-dir checks happen at the same time, way before anything is mounted at all.

514825 Create mountpoint: /var/data/test
 514826 Check empty: /var/data
 514910 Check empty: /var/data/test
 514916 Do mount: /var/data/test
 515100 Do mount: /var/data
 515645 mtab: /var/data/test
 515748 mtab: /var/data

By delaying the test thread a bit the problem is gone.
Debug output:

67366 mtab: /var/data
 67405 Check empty: /var/data
 67408 Do mount: /var/data
 67776 Check empty: /var/data/test
 67807 Do mount: /var/data/test
 74833 mtab: /var/data/test

As I said, there must be some tasklist prepared where dependent mountpoints have to be in the same one.

why not just set canmount=off on dpool/data and leave canmount=on for dpool/data/test? I assume you don't actually want to store data in dpool/data, just use it to organise filesystems?

Well, I'm currently setting up that system from scratch... I am not sure how the final directory structure will be but, yes, maybe I simply should not do it like this.... However, the race condition should be fixed anyways.

@c0d3z3r0
Copy link
Contributor Author

The comment for zfs_mount_task says

 * Thread pool function to mount one file system. On completion, it finds and
 * schedules its children to be mounted. This depends on the sorting done in
 * zfs_foreach_mountpoint(). Note that the degenerate case (**chain of entries
 * each descending from the previous**) will have **no parallelism since we always
 * have to wait for the parent to finish mounting** before we can schedule
 * its children.

This already implements what I meant with a tasklist...

zfs_foreach_mountpoint:
  for i in non_descendant_idx:
    zfs_dispatch_mount

zfs_dispatch_mount:
  zfs_mount_task

zfs_mount_task:
  for i in non_descendant_idx:
    mnt_func
    zfs_dispatch_mount

zfs_mount_task dispatches the mounting of a mountpoint's children, so it needs to wait until the mountpoint is populated as described above. This seems to be missing...
I am not really sure how to check if the mount is done already so I added an usleep(10*1000) (10ms) after the mnt_func() call for testing. This fixes the issue...

Then I tried this but it did not solve the issue since zfs_is_mounted always returns true there... hmmmm...

diff --git a/lib/libzfs/libzfs_mount.c b/lib/libzfs/libzfs_mount.c
index 649c232aa..42632f02e 100644
--- a/lib/libzfs/libzfs_mount.c
+++ b/lib/libzfs/libzfs_mount.c
@@ -1431,9 +1431,23 @@ zfs_mount_task(void *arg)
        verify(zfs_prop_get(handles[idx], ZFS_PROP_MOUNTPOINT, mountpoint,
            sizeof (mountpoint), NULL, NULL, 0, B_FALSE) == 0);
 
+       printf("mount task %s\n", mountpoint);
+
        if (mp->mnt_func(handles[idx], mp->mnt_data) != 0)
                return;
 
+       //const char * name = zfs_get_name(handles[idx]);
+       //printf("1 hdl: %s, mp: %s, mnt: %d\n", name, mountpoint, zfs_is_mounted(handles[idx], NULL));
+
+       if (zfs_is_mountable(handles[idx], mountpoint, sizeof (mountpoint), NULL)) {
+               //printf("%s mountable\n", name);
+               while (!zfs_is_mounted(handles[idx], NULL))
+                       usleep(1000);
+       }
+
+       //printf("2 hdl: %s, mp: %s, mnt: %d\n", name, mountpoint, zfs_is_mounted(handles[idx], NULL));
+
+
        /*
         * We dispatch tasks to mount filesystems with mountpoints underneath
         * this one. We do this by dispatching the next filesystem with a

@c0d3z3r0 c0d3z3r0 changed the title zfs mount -a issues zfs mount -a race Mar 8, 2019
@c0d3z3r0 c0d3z3r0 mentioned this issue Jun 15, 2019
12 tasks
kusumi added a commit to kusumi/zfs that referenced this issue Jun 16, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same mount point, and this puts threads under race
condition. This appeared as mount issue on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8)
on mount.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 16, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 17, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 20, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 20, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 21, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 23, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 23, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jun 23, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jul 2, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jul 2, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching can spawn >1 threads
for datasets with the same top level directory, and this puts threads
under race condition. This appeared as mount issue on ZoL for ZoL having
different timing regarding mount(2) execution due to fork(2)/exec(2) of
mount(8) on mount. Fix it by libzfs_path_contains() returning true for
same paths.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jul 4, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching in
zfs_foreach_mountpoint() can spawn >1 threads for datasets which should
run in a single thread, and this puts threads under race condition. This
appeared as mount issue on ZoL for ZoL having different timing regarding
mount(2) execution due to fork(2)/exec(2) of mount(8) on mount.
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of `zfs_foreach_mountpoint(...,
handles, ...)` input which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directory.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test0/a and "/a/b"
  * ThreadB for "/a" for test1
 and in case of openzfs#8833, ThreadA won the race. ThreadB was created because
 "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list which contains "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. ThreadB was created because
 "/var/data" wasn't considered as `"/" contains "/var/data"`. In other
 words, when there is at least one "/" in the input list, it must be
 single threaded, because every directory is a child of "/", meaning they
 all depend on "/" either directly or indirectly.

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result create another thread
when mounts should be done in a single thread. Fix a conditional in
libzfs_path_contains() to consider above two cases.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jul 4, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching in
zfs_foreach_mountpoint() can spawn >1 threads for datasets which should
run in a single thread, and this puts threads under race condition.
This appeared as a mount issue on ZoL for ZoL having different timing
regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts are reordered by the race condition.

There are currently two known patterns of `zfs_foreach_mountpoint(...,
handles, ...)` input which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directory.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test0/a and "/a/b"
  * ThreadB for "/a" for test1
 and in case of openzfs#8833, ThreadA won the race. ThreadB was created because
 "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list which contains "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. ThreadB was created because
 "/var/data" wasn't considered as `"/" contains "/var/data"`. In other
 words, if there is at least one "/" in the input list, it must be
 single threaded since every directory is a child of "/", meaning they
 all directly or indirectly depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result create another thread
when mounts should be done in a single thread. Fix a conditional in
libzfs_path_contains() to consider above two cases.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jul 4, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching selects sets of mount points that don't
 have dependencies on other sets, hence threads can/should go lock-less
 and shouldn't race with other threads for other sets. Each thread
 dispatched corresponds to top level directory which may or may not have
 datasets mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1) is
 to mount datasets for each set of mount points. The mount points within
 each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that initial thread dispatching in
zfs_foreach_mountpoint() can spawn >1 threads for datasets which should
run in a single thread, and this puts threads under race condition.
This appeared as a mount issue on ZoL for ZoL having different timing
regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts are reordered by the race condition.

There are currently two known patterns of `zfs_foreach_mountpoint(...,
handles, ...)` input which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directory.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. ThreadB was created because
 "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list which contains "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. ThreadB was created because
 "/var/data" wasn't considered as `"/" contains "/var/data"`. In other
 words, if there is at least one "/" in the input list, it must be
 single threaded since every directory is a child of "/", meaning they
 all directly or indirectly depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result create another thread
when mounts should be done in a single thread. Fix a conditional in
libzfs_path_contains() to consider above two cases.

Closes openzfs#8450
Closes openzfs#8833

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
kusumi added a commit to kusumi/zfs that referenced this issue Jul 6, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
TulsiJain pushed a commit to TulsiJain/zfs that referenced this issue Jul 20, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
TulsiJain pushed a commit to TulsiJain/zfs that referenced this issue Jul 20, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 13, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 22, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Aug 23, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 17, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) openzfs#8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of openzfs#8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of openzfs#8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes openzfs#8450
Closes openzfs#8833
Closes openzfs#8878
tonyhutter pushed a commit that referenced this issue Sep 26, 2019
Strategy of parallel mount is as follows.

1) Initial thread dispatching is to select sets of mount points that
 don't have dependencies on other sets, hence threads can/should run
 lock-less and shouldn't race with other threads for other sets. Each
 thread dispatched corresponds to top level directory which may or may
 not have datasets to be mounted on sub directories.

2) Subsequent recursive thread dispatching for each thread from 1)
 is to mount datasets for each set of mount points. The mount points
 within each set have dependencies (i.e. child directories), so child
 directories are processed only after parent directory completes.

The problem is that the initial thread dispatching in
zfs_foreach_mountpoint() can be multi-threaded when it needs to be
single-threaded, and this puts threads under race condition. This race
appeared as mount/unmount issues on ZoL for ZoL having different
timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8).
`zfs unmount -a` which expects proper mount order can't unmount if the
mounts were reordered by the race condition.

There are currently two known patterns of input list `handles` in
`zfs_foreach_mountpoint(..,handles,..)` which cause the race condition.

1) #8833 case where input is `/a /a /a/b` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list with two same top level directories.
 There is a race between two POSIX threads A and B,
  * ThreadA for "/a" for test1 and "/a/b"
  * ThreadB for "/a" for test0/a
 and in case of #8833, ThreadA won the race. Two threads were created
 because "/a" wasn't considered as `"/a" contains "/a"`.

2) #8450 case where input is `/ /var/data /var/data/test` after sorting.
 The problem is that libzfs_path_contains() can't correctly handle an
 input list containing "/".
 There is a race between two POSIX threads A and B,
  * ThreadA for "/" and "/var/data/test"
  * ThreadB for "/var/data"
 and in case of #8450, ThreadA won the race. Two threads were created
 because "/var/data" wasn't considered as `"/" contains "/var/data"`.
 In other words, if there is (at least one) "/" in the input list,
 the initial thread dispatching must be single-threaded since every
 directory is a child of "/", meaning they all directly or indirectly
 depend on "/".

In both cases, the first non_descendant_idx() call fails to correctly
determine "path1-contains-path2", and as a result the initial thread
dispatching creates another thread when it needs to be single-threaded.
Fix a conditional in libzfs_path_contains() to consider above two.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Closes #8450
Closes #8833
Closes #8878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@c0d3z3r0 and others