Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zfs umount -f" fails when pool is busy #2435

Closed
FransUrbo opened this issue Jun 28, 2014 · 15 comments
Closed

"zfs umount -f" fails when pool is busy #2435

FransUrbo opened this issue Jun 28, 2014 · 15 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Milestone

Comments

@FransUrbo
Copy link
Contributor

Related to #2434

Way to reproduce:

#!/bin/sh                                                                                                                                                     

rm -f /var/tmp/zfs_test-1
truncate -s 25000m /var/tmp/zfs_test-1

zpool create test /var/tmp/zfs_test-1
zfs create test/test
zfs set mountpoint=/var/tmp/test test/test

cd /var/tmp/test
less < /dev/null > /dev/null 2>&1 &

zfs umount test/test
res=$?
echo "zfs umount test/test: '$res'" ; mount | grep zfs ; echo

[ "$res" -ne "0" ] && zfs umount -f test/test
echo "zfs umount test/test: '$?'" ; echo ; mount | grep zfs

Results in:

# /tmp/testme.ksh
umount: /var/tmp/test: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
cannot unmount '/var/tmp/test': umount failed
zfs umount test/test: '1'
test on /test type zfs (rw,relatime,xattr,noacl)
test/test on /var/tmp/test type zfs (rw,relatime,xattr,noacl)

umount2: Device or resource busy
umount: /var/tmp/test: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
cannot unmount '/var/tmp/test': umount failed
zfs umount test/test: '1'

test on /test type zfs (rw,relatime,xattr,noacl)
test/test on /var/tmp/test type zfs (rw,relatime,xattr,noacl)
@kernelOfTruth
Copy link
Contributor

hm, interesting

thanks for coming up with a re-producer and test for that

this likely is the behavior I saw on my system:

the most probable cause here is gam_server - it doesn't shut down fast enough during shutdown & therefore delays shutting down of additional processes (xfce4, kde4, etc. related)

and thus after the next reboot on import attempt I see:

zpool status
pool: WD30EFRX
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 132K in 0h0m with 0 errors on Thu June 8 18:24:19 2014
config:

NAME              STATE     READ WRITE CKSUM
WD30EFRX          ONLINE       0     0     0
  mirror-0        ONLINE       0     0     0
    wd30efrx      ONLINE       0     0     0
    wd30efrx_002  ONLINE       0     0     5

errors: No known data errors

my (preliminary) "solution" is to kill it of via a script in /etc/local.d/
which of course can't be the final "fix"

@FransUrbo
Copy link
Contributor Author

The problem here (in both issues) is that Linux don't do force unmounts. The option exist

       -f     Force unmount (in case of an unreachable NFS system).  (Requires kernel 2.1.116 or later.)

but it doesn't work. Not like it does in *BSD/Solaris etc anyway...

@behlendorf
Copy link
Contributor

Not like it does in *BSD/Solaris etc anyway...

This is the key bit. The expected behavior for --force differs between Illumos and Linux. According to the Illumos man page this option will unmount the filesystem but its dangerous and may result in data loss.

     -f

         Forcibly unmount a file system.

         Without this option, umount does not allow a file system
         to  be  unmounted  if a file on the file system is busy.
         Using this option can cause data loss  for  open  files;
         programs  which  access  files after the file system has
         been unmounted will get an error (EIO).

Under Linux the --force option means something slightly different and by convention only applies to network filesystems like NFS. It means to unmount the filesystem even if the server is unreachable and therefore we may loose data in our writeback cache. For a local filesystem you can never force unmount it while it has processes actively using it.

       -f     Force unmount (in case of an unreachable NFS system).  (Requires
              kernel 2.1.116 or later.)

However, what you can do on Linux is a lazy unmount. This will remove the filesystem from the mounted namespace so no new processes can access it. However, any process which has file handles open will be able to use them until they close them. Once all open file handles for the filesystem are closed the unmount will complete.

There is no Illumos equivalent behavior but this is implemented for ZoL. Although it looks like we need to document that better and perhaps add a -l option to the zfs unmount command.

       Note  that  a  file  system cannot be unmounted when it is ‘busy’ - for
       example, when there are open files on it, or when some process has  its
       working  directory  there,  or  when  a swap file on it is in use.  The
       offending process could even be umount itself - it opens libc, and libc
       in  its  turn may open for example locale files.  A lazy unmount avoids
       this problem.

So I don't actually think this is a bug. Everything is working as expected under Linux. What we should probably do is update the test case to either use lazy unmounts.

umount -l pool/dataset

@FransUrbo
Copy link
Contributor Author

However, what you can do on Linux is a lazy unmount.

Does 'zfs umount -f' do a lazy unmount today? If not, it should... A '-l' shouldn't be needed.

As long as '-f' means 'lazy unmount' (anything else is pointless - we can't forcibly unmount a filesystem) on Linux (and it's documented as such), all should be fine. The problem with 'zfs unmount -f' today, is that it fails - i.e., it does nothing more than 'zfs unmount' (without the '-f'). WIth an error. It shouldn't. It should do a lazy unmount and if that succeeds, then exit with 'ok'....

So maybe the issue should be retitled to something like:

"zfs unmount -f" should mean "do a lazy unmount"

A "zfs unmount -f" is expected to succeed, not fail...

@behlendorf
Copy link
Contributor

Maybe. But a lazy unmount isn't equivalent to an Illumos style forced unmount. If we pretend it is this will just cause more confusion. It would be better to remove the force option and add the lazy option since we can never replicate the Illumos force behavior. The Linux VFS enforces this.

For example, a lazy unmount may succeed and remove the filesystem from the namespace. However, the super block associated with the filesystem will not be destroyed until the last filesystem user closes their open file handles. That means some operations such as zfs destroy will still fail.

@FransUrbo
Copy link
Contributor Author

That means some operations such as zfs destroy will still fail.

Right... Well, then I also vote for removing the -f option (exit with
error if used) and add a -l instead...

@behlendorf behlendorf added the Bug label Jul 15, 2014
@ioi-c
Copy link

ioi-c commented Aug 17, 2014

@FransUrbo Hello, I have make the zpool use the sdb of a diskarray throw the network.
But the network of diskarray maybe down sometimes and then the zpool will not be umount or export .
How can let the zpool recongnize the sdb or export the zpool and re-import the zpool when the network recorvery?
Thank you!

@FransUrbo
Copy link
Contributor Author

@0602114042 This is not a support forum. Please ask your question on the mailing list. And do NOT hijack an issue/thread by asking a completely unrelated question!! This is considered very, very rude!

@behlendorf behlendorf modified the milestones: 0.6.5, 0.6.4 Feb 6, 2015
@behlendorf behlendorf added Bug - Minor and removed Bug labels Feb 6, 2015
@behlendorf behlendorf modified the milestones: 0.7.0, 0.6.5 Jul 16, 2015
@zfsbot
Copy link

zfsbot commented Sep 10, 2015

crazy question, can we change the Linux VFS behaviour?

@kernelOfTruth
Copy link
Contributor

@zfsbot you mean to ( really ) force umount anyway ?

@behlendorf
Copy link
Contributor

One way to mitigate this issue would be to update zpool destroy -f and zfs destroy -f to retry a few times on EBUSY. Many of the ZFS Test Suite tests are disabled because of this and making improvement would allow us to enable them.

@Mic92
Copy link
Contributor

Mic92 commented Aug 10, 2016

@behlendorf do you think retrying a few times help? Is this a temporary race condition which eventually resolves or could the process then try for ever? I have a problem with privates mount namespaces, which do not receives updates from there parent namespace. Systemd creates those to implement private /tmp mounts.

@behlendorf
Copy link
Contributor

@Mic92 it depends on on why the mount point is busy. If some process does have a file open in the filesystem it will always return EBUSY. This is the expected behavior for all Linux filesystems.

@kernelOfTruth
Copy link
Contributor

kernelOfTruth commented Aug 16, 2016

somewhat related, not umount, but export,

referencing:

openzfs/openzfs#175 7301 zpool export -f should be able to interrupt file freeing

@behlendorf
Copy link
Contributor

Closing, without directly modifying the Linux kernel this EBUSY behave can't be changed. Therefore, the ZFS Test Suite is being systematically updated to expect this behavior on Linux. The helper functions log_must_retry and log_must_busy have been added to the ZFS Test Suite to simplify updating the necessary test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
Development

No branches or pull requests

6 participants