-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behaviour of canmount=off #8833
Comments
I am not sure but it seems that systemd.mount could solve this issue, by setting the zfs mountpoint of the dataset to legacy, placing it into the fstab file, then tell systemd which mountpoint you want to be mounted first and after creating a dependency between the mountpoints. |
Take a look at @SoongNoonien Can you see if that resolves your use case? |
Thanks for your efforts but I do not use systemd. |
I took a closer look on this. Now I'm pretty sure, it is the parallel mounting. I looked in the commit history and found the last commit before "OpenZFS 8115 - parallel zfs mount" (a10d50f) which simply is "Tag 0.8.0-rc2" (af2e841). Version overview:System on which the issue is not reproduceable:
System on which the issue is reproduceable:
I hope this helps. |
The problem here seems to be that Below is basically same as what op has pasted here, except that it creates regfiles to make this clearer.
As shown below, p2/f2 mount has been shadowed by p1/f1 mount, hence
Having a small delay in
With above change, now both regfiles are visible. The last
|
@kusumi do you happen to know if the reason this wasn't reproducible on illumos is simply down to the timing? Adding a delay might make this less likely but it would be nice to have a properly robust solution. |
@behlendorf and yes, random delay is definitely not what we want to do to fix it. I did it to showcase the problem is in timing of pthreads which eventually call /bin/mount. |
@behlendorf Basically, a dataset whose mountpoint is descendant of another mountpoint gets mounted at a cost of extra
Edited:
|
There is an env var called This is undocumented (only appears in source code), so may not be stable i/f in the future. |
The main reason parallel mount behaves differently on ZoL is that pthreads fork/execs another process for /bin/mount, whereas illumos and FreeBSD calls mount(2) and nmount(2) respectively. By taking a global lock around /bin/mount (and also /bin/umount) call, ZoL will behave similarly to other implementations. See openzfs#8833 for details. Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Parallel mount behaves differently in ZoL by each pthread forking another process to exec /bin/mount, whereas illumos and FreeBSD directly use mount(2) and nmount(2) syscall respectively. This can cause parallel mount to mount datasets in incorrect order depending on the timing /bin/mount runs, and results in shadowing dataset(s) whose mountpoint is descendant of another. Having a global lock around /bin/mount (and also /bin/umount) call adjusts the timing (eliminates the timing difference caused by fork/exec) and makes its behavior similar to illumos and FreeBSD. Note that this isn't to "fix" a race condition by proper locking. Parallel mount by design doesn't guarantee ordering of threads when multiple datasets have the same mount point. Adjusting is all it can do. See openzfs#8833 for details. Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
I think I've seen the underlying issue on openSUSE a couple of times -- sometimes on-boot |
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same mount point, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching can spawn >1 threads for datasets with the same top level directory, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. Fix it by libzfs_path_contains() returning true for same paths. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching in zfs_foreach_mountpoint() can spawn >1 threads for datasets which should run in a single thread, and this puts threads under race condition. This appeared as mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8) on mount. `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of `zfs_foreach_mountpoint(..., handles, ...)` input which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directory. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test0/a and "/a/b" * ThreadB for "/a" for test1 and in case of openzfs#8833, ThreadA won the race. ThreadB was created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list which contains "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. ThreadB was created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, when there is at least one "/" in the input list, it must be single threaded, because every directory is a child of "/", meaning they all depend on "/" either directly or indirectly. In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result create another thread when mounts should be done in a single thread. Fix a conditional in libzfs_path_contains() to consider above two cases. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching in zfs_foreach_mountpoint() can spawn >1 threads for datasets which should run in a single thread, and this puts threads under race condition. This appeared as a mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts are reordered by the race condition. There are currently two known patterns of `zfs_foreach_mountpoint(..., handles, ...)` input which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directory. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test0/a and "/a/b" * ThreadB for "/a" for test1 and in case of openzfs#8833, ThreadA won the race. ThreadB was created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list which contains "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. ThreadB was created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is at least one "/" in the input list, it must be single threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result create another thread when mounts should be done in a single thread. Fix a conditional in libzfs_path_contains() to consider above two cases. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching selects sets of mount points that don't have dependencies on other sets, hence threads can/should go lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that initial thread dispatching in zfs_foreach_mountpoint() can spawn >1 threads for datasets which should run in a single thread, and this puts threads under race condition. This appeared as a mount issue on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts are reordered by the race condition. There are currently two known patterns of `zfs_foreach_mountpoint(..., handles, ...)` input which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directory. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. ThreadB was created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list which contains "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. ThreadB was created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is at least one "/" in the input list, it must be single threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result create another thread when mounts should be done in a single thread. Fix a conditional in libzfs_path_contains() to consider above two cases. Closes openzfs#8450 Closes openzfs#8833 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878 Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com>
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) openzfs#8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of openzfs#8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) openzfs#8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of openzfs#8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes openzfs#8450 Closes openzfs#8833 Closes openzfs#8878
Strategy of parallel mount is as follows. 1) Initial thread dispatching is to select sets of mount points that don't have dependencies on other sets, hence threads can/should run lock-less and shouldn't race with other threads for other sets. Each thread dispatched corresponds to top level directory which may or may not have datasets to be mounted on sub directories. 2) Subsequent recursive thread dispatching for each thread from 1) is to mount datasets for each set of mount points. The mount points within each set have dependencies (i.e. child directories), so child directories are processed only after parent directory completes. The problem is that the initial thread dispatching in zfs_foreach_mountpoint() can be multi-threaded when it needs to be single-threaded, and this puts threads under race condition. This race appeared as mount/unmount issues on ZoL for ZoL having different timing regarding mount(2) execution due to fork(2)/exec(2) of mount(8). `zfs unmount -a` which expects proper mount order can't unmount if the mounts were reordered by the race condition. There are currently two known patterns of input list `handles` in `zfs_foreach_mountpoint(..,handles,..)` which cause the race condition. 1) #8833 case where input is `/a /a /a/b` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list with two same top level directories. There is a race between two POSIX threads A and B, * ThreadA for "/a" for test1 and "/a/b" * ThreadB for "/a" for test0/a and in case of #8833, ThreadA won the race. Two threads were created because "/a" wasn't considered as `"/a" contains "/a"`. 2) #8450 case where input is `/ /var/data /var/data/test` after sorting. The problem is that libzfs_path_contains() can't correctly handle an input list containing "/". There is a race between two POSIX threads A and B, * ThreadA for "/" and "/var/data/test" * ThreadB for "/var/data" and in case of #8450, ThreadA won the race. Two threads were created because "/var/data" wasn't considered as `"/" contains "/var/data"`. In other words, if there is (at least one) "/" in the input list, the initial thread dispatching must be single-threaded since every directory is a child of "/", meaning they all directly or indirectly depend on "/". In both cases, the first non_descendant_idx() call fails to correctly determine "path1-contains-path2", and as a result the initial thread dispatching creates another thread when it needs to be single-threaded. Fix a conditional in libzfs_path_contains() to consider above two. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8450 Closes #8833 Closes #8878
This issue seams to be still existing. I formated my luks encrypted root partition as described in the official openzfs guide. However mounting all volumes does not have any effect: While mounting some subvolumes (with property "canmount=on') works, those with property "canmount=off" fail. Mounting all subvolumes separately by looping over all listed volumes was a temporary solution but doesn't work a second time mounting and chrooting (in) my filesystem. After unmounting all subvolumes they are still mounted: After unmounting all listed subvolumes in the same way as I mounted them above results in the hidden volumes (home, opt, …) showing up again!!!
Can you explain me the reason for this strange behavior when mounting subvolumes and please fix that bug because that is an issue that holds back newcomers like me in using openzfs. |
System information
Describe the problem you're observing
I'm using the
canmount=off
property to make children of different pools appear in the same directory as described in the man page. Unfortunately, this seems to be kind of broken since 0.8.0. I've isolated the problem to this list of commands:Apparently
zfs mount -a
isn't able to resolve the mountpoints anymore. I found two different workarounds for this issue.One can simply avoid using canmount to inherit mountpoints by setting the mountpoints manually. For this example
zfs set mountpoint=none test1
andzfs set mountpoint=/a/b test1/b
.The stranger solution is swapping the names of the pools. Or simply choose any new name for
test0
to make it appear aftertest1
inzfs list
. (e.g.test2
) So the name order of the pools seems to have an impact on the disentanglement of mountpoints.I was able to reproduce the issue on Gentoo and Arch but not on Openindiana.
The text was updated successfully, but these errors were encountered: