Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind mounts in fstab work improperly when pointing to ZFS #971

Closed
bhodgens opened this issue Sep 17, 2012 · 20 comments
Closed

bind mounts in fstab work improperly when pointing to ZFS #971

bhodgens opened this issue Sep 17, 2012 · 20 comments
Labels
Component: Share "zfs share" feature Type: Feature Feature request or new feature

Comments

@bhodgens
Copy link

If using bind mounts from within fstab to a location within a zpool, the mount point will behave oddly, not listing the full/actual contents of the destination (just directories, from what I can tell - might this be due to fstab being parsed and the source getting mounted prior to zfs taking control, with an odd behavior resulting?)

If mounted via mount after the system has fully booted (eg. mount -o bind /home /zpool/home), it behaves properly.

To replicate, make an entry in fstab and reboot (or export the pool, run the mount, and reimport):
/zpool/home /home none bind,defaults 0 0

Distro: Ubuntu 12.04.1
zfsonlinux: 0.6.0rc10 from PPA

I'm using the zfs-mount and zfs-share scripts, not the 'mountall' modified binary from the PPA (which appears to require the mount=legacy option?)

@cwedgwood
Copy link
Contributor

(i'm uisng bind mounts as a transitional mechanism without issues)

when it's not working can you show what

grep home /proc/mounts

says?

@ghost
Copy link

ghost commented Jul 15, 2013

Allow me to shed some light on this.

Let's consider an old-school nfs4 export using a native Linux filesystem, one share called 'pmr':

/dev/groups/pmr /storage/pmr      xfs    inode64,logdev=/dev/ssdcache/pmr,logbufs=8  1 2
/storage/pmr    /exports/pmr      none   rw,bind         0 0
/exports     [nfs4 export root settings]
/exports/pmr [per-share settings]

When the system is booting, the xfs filesystem will be mounted first, followed by a bind mount from /storage/pmr to /exports/pmr. The latter then is exported via /etc/exports using nfs4 and we're all happy.

Now consider a zfs-based scenario.

Since there are no zfs entries in fstab, it becomes:

/storage/pmr    /exports/pmr      none   rw,bind         0 0

When the system boots, a bind-type mount will be created from /storage/pmr to /exports/pmr which is effectively mounting the underlying filesystem (most likely / ) to the bind point and exporting that. The clients will see the contents of an empty directory as the exporter uses the / bind mount. On the server, the confused administrator will see the actual zfs and will scratch their head.

I don't think this is a bug in zfs rather a race condition between the distribution's native localfs init script and zfs. Perhaps localfs should depend on zfs and not the other way around.

Alternatively, the zfs service should parse some file that will tell it how the binds go and bind after mounting the zfs filesystem. Perhaps a file in /etc/zfs/ like 'binds' would work.

Personally (sysadmin cap on) /etc/zfs/binds would work for me (together with a bit of parsing in /etc/init.d/zfs) as it's sufficiently low-tech and doesn't require changes in the actual zfs stack.

@ghost
Copy link

ghost commented Jul 15, 2013

Proposed patch (only lsb script, others are most likely derivative):

--- etc/init.d/zfs.lsb.in.orig  2013-07-15 12:47:20.055257882 +0100
+++ etc/init.d/zfs.lsb.in       2013-07-15 12:49:44.732137370 +0100
@@ -29,6 +29,7 @@
 ZFS="@sbindir@/zfs"
 ZPOOL="@sbindir@/zpool"
 ZPOOL_CACHE="@sysconfdir@/zfs/zpool.cache"
+ZFS_NFS4_BINDS="@sysconfdir@/zfs/binds"

 # Source zfs configuration.
 [ -r '/etc/default/zfs' ] &&  . /etc/default/zfs
@@ -78,6 +79,26 @@
                log_end_msg $?
        fi

+        # Create (optional) binds to the NFS4 export tree
+        if [ -e "$ZFS_NFS4_BINDS" ] ; then
+                log_begin_msg "Binding NFS4 mounts"
+                sed -e "s/#.*//" -e "/^$/d" $ZFS_NFS4_BINDS | while read LINE
+                do
+                        MODE="`echo $LINE | awk '{print $1}'`"
+                        SRC="`echo $LINE | awk '{print $2}'`"
+                        DEST="`echo $LINE | awk '{print $3}'`"
+                        case $MODE in
+                                bind)   MOUNTPOINT="`zfs get mountpoint $SRC | grep "$SRC" | awk '{print $3}'`"
+                                        mount -o $MODE $MOUNTPOINT $DEST
+                                        log_end_msg $?
+                                        ;;
+                                *)      echo "Unknown bind mode ($MODE) in $ZFS_NFS4_BINDS. Aborting."
+                                        exit 4
+                                        ;;
+                        esac
+                done
+        fi
+
        touch "$LOCKFILE"
 }

@@ -85,6 +106,25 @@
 {
        [ ! -f "$LOCKFILE" ] && return 3

+       if [ -e "$ZFS_NFS4_BINDS" ] ; then
+                log_begin_msg "Detaching NFS4 binds"
+                sed -e "s/#.*//" -e "/^$/d" $ZFS_NFS4_BINDS | while read LINE
+                do
+                        MODE="`echo $LINE | awk '{print $1}'`"
+                        SRC="`echo $LINE | awk '{print $2}'`"
+                        DEST="`echo $LINE | awk '{print $3}'`"
+                        case $MODE in
+                                bind)   MOUNTPOINT="`zfs get mountpoint $SRC | grep "$SRC" | awk '{print $3}'`"
+                                        umount $DEST
+                                        log_end_msg $?
+                                        ;;
+                                *)      echo "Unknown bind mode ($MODE) in $ZFS_NFS4_BINDS. Aborting."
+                                        exit 4
+                                        ;;
+                        esac
+                done
+        fi
+
        log_begin_msg "Unmounting ZFS filesystems"
        "$ZFS" umount -a
        log_end_msg $?

$MODE may look redundant but perhaps could be kept for future expansion, maybe there could be other bind types.

The /etc/zfs/binds file would look like this:

#    zpool[/dataset]        mountpoint
bind storage/pmr            /exports/pmr

Of course the distribution source would only contain the first line. I believe this is consistent with other files in /etc/zfs.

Cheers,
grok

@FransUrbo
Copy link
Contributor

@bhodgens I agree with @jzachwieja first comment - this is not a fault in ZoL. It simply runs later than the initial mounts (which is basically the first thing that happens at bootup). There's no way we can have ZoL run that early.

@jzachwieja I'm not sure I agree with your solution though. It seems way to 'hackish' - it shouldn't be the responsibility of the ZoL init script to do this. Instead, I think it's up to the system administrator to write sufficient code to solve this in /etc/rc.local (which I think is an old UNIX standard - it's available, but empty currently, on my Debian GNU/Linux Wheezy at least).

I vote to close this as a 'not a ZoL issue'. @behlendorf ?

@ghost
Copy link

ghost commented Jun 8, 2014

@FransUrbo /etc/rc.local does not exist and is not called in all distributions. Even more, systemd based distributions (good luck finding one without it these days) won't have it by definition.

Are you suggesting that instead of editing a config file (present, documented) you would rather ask everyone to roll their own code, manually create bind mounts? That doesn't sound like a sane systems management practice.

When ZoL filesystem needs to be exported over NFS4, a bind mount must be created. No standard mechanism in GNU/Linux will allow for it if the filesystem is not present in /etc/fstab. Since it's ZFS that's 'special', I will argue that it is its own responsibility to provide the functionality required for other parts of the system to continue to function.

If you don't like my solution, that's fine, please provide a better one or show where exactly am I incorrect. Saying something is 'hackish' and then suggesting that sysadmins 'sort it out in rc.local' isn't constructive.

Regards,
jz

@FransUrbo
Copy link
Contributor

@FransUrbo /etc/rc.local does not exist and is not called in all distributions. Even more, systemd based distributions (good luck finding one without it these days) won't have it by definition.

Fair enough. Don't they have another way of executing local code, much like rc.local?
Are you suggesting that instead of editing a config file (present, documented) you would rather ask everyone to roll their own code, manually create bind mounts? That doesn't sound like a sane systems management practice.

It is, if the majority isn't using bind mounts. Why should we force stuff upon the majority, which only the minority is using?

The init script(s) is complicated enough as it is without adding even more complexity (that will have to be maintained - for a very limited number of users).

The majority (?) isn't even using NFS, so why force ZoL to go around kludges in another piece of software (nfs daemon etc)? Soon(er than later), we'd have to do the same for iSCSI, Samba and what not. ZoL is not 'do everything for everyone'.

When ZoL filesystem needs to be exported over NFS4, a bind mount must be created.

Why? This just doesn't compute. What's wrong with 'zfs set sharenfs=on ....'?
Since it's ZFS that's 'special', I will argue that it is its own responsibility to provide the functionality required for other parts of the system to continue to function.

Perhaps, but it's not our responsibility to correct for bugs/issues in other software.
please provide a better one or show where exactly am I incorrect.

I did.
Saying something is 'hackish' and then suggesting that sysadmins 'sort it out in rc.local' isn't constructive.

I disagree - it's VERY constructive. And very correct. Also very 'UNIX'. This isn't point-and-click software...

@ghost
Copy link

ghost commented Jun 8, 2014

@FransUrbo

What use is a filesystem that cannot be exported over network?

NFS4 exports are different from NFS3 exports. There is a certain, established standard of creating them in GNU/Linux, there exists a well documented process that is different from Solaris-isms still present in ZoL.

I wasn't aware zfs set sharenfs=on is able to produce NFS4 mounts. Could you please quote options required to make that happen? How do you define the mount tree? This is different from NFS3.

What bugs in other software are you referring to? Exporting NFS4 works perfectly fine in GNU/Linux. Since ZoL provides PV, VG and LV management as well as filesystem mount points in a way that is abstracted from the current device paradigm on Linux, certain steps need to be taken to make those two work together.

While you are free to disagree, I still haven't seen a patch that solves the problem. GNU/Linux nfs-kernel-server (and this is ZFS on Linux) requires mount points bound into a central exports tree. Since binding is done early (and you can't make the zfs init script depend on $localfs) ZoL needs to catch up.

NFS4 provides capabilities like idmapd (how would you propose to integrate zfs set sharenfs with starting idmapd, are there hooks for that? How do I call them?), caching, subtree checks, consistent filesystem IDs and performance improvements over NFS3.

The logical way to do it (and I have consulted this with a number of Linux Sysadmins before presenting it here) is for the init script to have a mechanism to create the required bound mounts to the exports tree. The section in the init script is self-contained, fails safe (no action if the config file isn't present) and does introduce required compatibility with the host operating system. In one file that is owned by the ZFS package.

If you continue to disagree, please produce a patch that solves the issue for NFS4 and ZoL or provide a way of exporting NFS4, including all the required export options like the following excerpt from a production environment:

/exports     172.5.125.0/24(ro,async,wdelay,insecure,root_squash,no_subtree_check,fsid=0)
/exports     172.5.124.0/25(ro,async,wdelay,insecure,root_squash,no_subtree_check,fsid=0)
/exports/pmr 172.5.125.0/24(rw,async,wdelay,root_squash,no_subtree_check)
/exports/pmr 172.5.124.0/25(rw,sync,wdelay,no_root_squash,no_subtree_check)

Please understand, rc.local is the last resort, it isn't available on all distributions, some don't even have an equivalent script and requiring systems administrators to manually do those steps is error-prone. Perhaps one can do it on their home computer but hardly in an enterprise environment where consistency and sustainability is key.

@FransUrbo
Copy link
Contributor

What use is a filesystem that cannot be exported over network?

Much! But that's besides the point, because a ZFS dataset can be exported over the network just fine without any such kludges as bind mounts and/or init script hacks.
Could you please quote options required to make that happen?

I did. You need to slow down and read what's given to you.
If you continue to disagree, please produce a patch that solves the issue for NFS4 and ZoL

Don't need to. It works just fine now. There are however a few issues with the nfs part of ZoL, which doesn't translate ZFS options to ZoL very well in some cases.

This is on the todo list for us to work out (we haven't decided if we should redesign libshare or move it to the zevent deamon).

NFS (in ZoL) isn't my forté, but start looking at #1029.

If you have further discussion about this, take it to the list. This isn't a support forum, and I consider this besides the point - the issue is about bind mounts not working from fstab. And there's very little we can do about that.

If you have a specific problem/bug/issue with NFS in ZoL (after reading the documentation, checking Admin Guides AND actually testing your way around), then feel free to create a new issue about this.

Just for the record, I sometimes use NFS (both v3 and v4) for testing on my test rigs, and both works just fine. Although I use rather straight forward rules, nothing like what you showed, so it's perfectly possible that such a complex share won't be possible. But take it to the list and I'm sure someone will give you hints either way.

Perhaps one can do it on their home computer but hardly in an enterprise environment where consistency and sustainability is key.

"Not our problem" might sound harsh, but we just can't add support for everything and everyone. We have to draw the line somewhere, and I personally (although that is not necessary the view of ZoL!) think this is where the line should be drawn - not complicate the already [way to!] complicated init procedure for ZoL any further and that way support software outside of our control.

@ghost
Copy link

ghost commented Jun 8, 2014

@FransUrbo

The issue #1029 you referred me to is highlighting the problem I've solved; no way to correctly set up NFS4 shares using Solaris-isms under Linux.

Could you please quote options required to make that happen?

I did. You need to slow down and read what's given to you.

Unless you meant the four dots at the end of zfs set sharenfs=on, I must have missed it.

I'm not going to continue this conversation with you as it's no longer productive.

@FransUrbo
Copy link
Contributor

@jzachwieja Just because you're incapable of doing it, doesn't mean it isn't possible (I just did it again)....

@bhodgens @behlendorf How should we proceed? Should we close this issue or possible tag it as 'Documentation'?

@behlendorf
Copy link
Contributor

Unfortunately, I don't think we can wash our hands of blame here so easily. The only reason this is an issue is because ZFS behaves differently than other Linux filesystems. That makes it our problem like it or not.

I also agree with @jzachwieja that forcing people to add something to their rc.local file isn't an acceptable solution. That file is for local non-standard customizations and isn't something we can depend on. We either need to handle this in the init script, the systemd units, or better yet the zfs utilities since we made this problem.

There are two cases:

  1. mountpoint=legacy - The easy case. All of the 'zfs mount/share' code is disabled and administrators are responsible for configuring their /etc/fstab like a conventional Linux system. This should work fine today.

  2. mountpoint=/path - The hard case. In this case we're responsible for managing the bind mounts as NFS4 expects. This could be done in the init scripts or systemd units but that unfortunately only covers the zfs start/stop cases. If we want 'zfs share|unshare` to work properly for NFS4 mounts it's going to need to be aware of the bind mounts.

I'm not an NFS4 expert but I don't see any reason why this logic can't be pulled in to the zfs utilities. My suggestion would be to add another property to the dataset called bindpoint. The zfs utilities themselves can then perform the needed bind mounts if required. @jzachwieja you're proposed patch could leverage ZFS user properties today rather than rely on a config file and I think that's probably a reasonable short/medium term solution.

But as @FransUrbo alluded too, longer term we're seriously thinking about moving all of this infrastructure to the ZED. Our hope is that it will make all of this machinery more transparent, flexible, and maintainable. But we still have some infrastructure to build before that's possible. Then we'll want to prototype it out to see if it's a good idea or not. So it's a ways off.

@FransUrbo
Copy link
Contributor

@behlendorf Wouldn't adding a bindmount deviate from the other implementations?

If we insist that this is something we should deal with (and I'm still not convinced), then how about a separate init script that deals with this? I think that's a bad idea, but at least it's better than adding a new property...

@FransUrbo
Copy link
Contributor

Btw, shouldn't be required to adding a new property. We could always do what https://github.com/zfsonlinux/zfs-auto-snapshot do - use a com.sun:bindmount which doesn't require additional code in ZFS..

@behlendorf
Copy link
Contributor

Yes, if we added a bindpoint property that would be a deviation from the other implementations. However, this isn't really a problem as long as we don't have a naming conflict with one of the other implementations. In fact, we've done it already to handle some other Linux specific concerns such as SELinux.

That said, I think you're right the best way to go about this for now would be to use generic user property. That way we can avoid any changes to the ZFS code proper until we figure out what the best way to handle this is. We should probably use something like org.zfsonlinux:bindpoint. There's nothing preventing us from turning that in to just bindpoint at a latter date if that's ends up being the right thing to do.

zfs set org.zfsonlinux:bindpoint=value dataset
zfs get org.zfsonlinux:bindpoint dataset

I suspect we'll need to add additional NFS related properties at some point as well. The current translation scheme we're using only works for the simplest configurations. It would be nice to just be able to provide the native linux options, many of which don't have Illumos equivalents. But I digress.

As for a separate script that's OK by me. The systemd support was broken up in to multiple units and the ubuntu packaging ships a zfs-mount and zfs-share script.

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 6, 2014
@behlendorf behlendorf added Difficulty - Medium Type: Feature Feature request or new feature and removed Bug labels Oct 6, 2014
@behlendorf behlendorf added the Component: Share "zfs share" feature label Oct 29, 2014
@jbnance
Copy link

jbnance commented Aug 17, 2017

Is there realistic potential for this issue to be addressed? I'm in the same scenario - NFSv4 + bind mounts - and would very much like to have a reasonable, manageable solution.

@afontenot
Copy link

afontenot commented Aug 26, 2017

@jbnance IMO, the correct Linux solution to this problem is described at the bottom of this article on the Arch wiki.

Basically, you want to add a mount option in your fstab that will cause the system to wait for the file system to be mounted.

/mnt/zfspool/music	/srv/nfs/music		none	bind,x-systemd.requires=zfs-mount.service	0 0

@jbnance
Copy link

jbnance commented Aug 27, 2017

@afontenot thanks for the info, I'll try it out. If it works as described on other distros (especially EL7+) I would agree. I've been using the _netdev option thus far to try and deal with this so the x-systemd stuff is a natural progression.

@neocogent
Copy link

I had this problem on a media nas box and the solution suggest by @afontenot worked perfectly. My nfs shares could not reach into the zpool without manually "fixing" after boot but adding the mount option gave me a clean boot that worked fine first time.

Adding a mount option (x-systemd.requires=zfs-mount.service) seems to me a simple and sensible solution for this issue. It probably just needs to be documented and better known.

@tfgm-bud
Copy link

tfgm-bud commented Oct 19, 2018

@afontenot, Just chiming in to say, the approach of changing the bind mounts in /etc/fstab, worked great for me on Ubuntu 16.04 with the HWE kernel (https://wiki.ubuntu.com/Kernel/LTSEnablementStack). Not sure if the HWE kernel is required but highly recommended as the ZFS version as of today on the HWE is v0.7.5.

@gmelikov
Copy link
Member

I prefer to close this issue, 0.8 will have a better systemd integration, and there is a workaround #971 (comment)

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…nzfs#971)

Bumps [rustls-pemfile](https://github.com/rustls/pemfile) from 1.0.2 to 1.0.3.
- [Commits](https://github.com/rustls/pemfile/commits)

---
updated-dependencies:
- dependency-name: rustls-pemfile
  dependency-type: indirect
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Share "zfs share" feature Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

9 participants