ZFS Solaris Behaviors #274

behlendorf opened this Issue Jun 13, 2011 · 12 comments

5 participants

ZFS on Linux member

This is mainly just a tracking bug to jot some some Solaris behaviors which may not make sense under Linux. In general, I want to keep the Solaris behaviors because quite a bit of thought has gone in to them. Also there are large number of people already familiar with ZFS and they have certain expectations about how things work. However, there are also some behaviors which may not make sense under Linux. These are the issues which we need to take a hard look at consider altering the default behavior.

  • zfs import - By default under Solaris the top level /dev directory is scanned for pools. Under Linux it may be wiser to make the default directory /dev/disk/by-id/ when it exists. The Linux port relies on udev which is the standard Linux mechanism to prevent device reordering. Since we don't want device reordering to cause problems for people it may make sense to make this the default even though this differs from Solaris.

  • Allow mounting over non-empty mount point. Under Solaris ZFS prevents you from mounting over a non-empty mount point, under Linux this is allowed and even expected behavior. There's no harm in permitting it other than consistency with Solaris.

ZFS on Linux member

@behlendorf: For the first bullet, would you mind if I put dajhorn/pkg-zfs@6da0b25 into the PPA? Device enumeration problems are recurring and have caused a non-trivial support load.

It is also worth noting that the behavior described in the second bullet has caused several init bugs to be discovered and has discouraged users from interleaving mounts between different filesystems. (eg: Some of the systemd threads.)

ZFS on Linux member

I'm not opposed to making this change in the ppa or even in the upstream source. However, I suspect the right fix is going to be a little more involved than just changing the define. For example the ability to use the shorr names might depend on that define remaining /dev, but I need to check the source to be sure.

I'm still on vacation for a few more days, hense why I've been so quiet. When I get back latter this weej I'll try and clear some of the back log of pull requests!

ZFS on Linux member

@dajhorn: I took a closer look at what your proposing and I think your right it's not quite as bad as I'd feared. However, what I think your really want is to just update default\_dir in zpool\_find\_import\_impl() to say /dev/disk/by-id/. This way only newly imported pools get impacted which is basically what we want. We can do this by adding a new #define IDISK_ROOT.

Additionally, you could add code to make\_leaf\_vdev() to print a warning if a pool is created using the top level /dev/<device> names. A brief explanation why this is a bad idea might be enough to resolve most of the support issues.


I might add that actually log devices and cache devices have different behaviours:
I've added them both using symlink /dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-XXXXXXXXXXXX-partN .
The symlink of the log device gets resolved.
In zpool status I see:

          sdd1                                        ONLINE       0     0     0
          ata-OCZ-VERTEX3-OCZ-XXXXXXXXXXXXXXX-part2  ONLINE       0     0     0

Tried removing and re-adding many times and using different symlinks. Same behaviour.
In my situation it's not a "problem". I know that now zfs expects log device always in sdd1. but you might like to solve this, if it causes troubles to other people.

ZFS on Linux member

@dajhorn Even better than changing the default to /dev/disk/by-id would be making the default configurable. We're probably long over due for adding support for a /etc/zfs/zfs.conf file. Of the top of my head here's a list of a few things it makes sense to have configurable on a per-system basis. They should be self explanitory and this is a non-exhaustive list.

  • ZPOOL_IMPORT_DIRECTORY = /dev/disk/by-id

  • ZPOOL_IMPORT_AUTO = yes | no

  • ZPOOL_CACHE_FILE = /etc/zfs/zpool.cache | none

  • ZFS_MOUNT_AUTO = yes | no

@maxximino That sounds like a bug we'll need to get sorted out.


PLEASE make the default /dev/disk/by-id, I am tired of my root pool being imported as /dev/sda :-(

ZFS on Linux member

@Rudd-O As mentioned above I'm OK with this but it needs to be tested. Can you try making the following minor change and let us know if you notice any unexpected side effects, there shouldn't be any but that's why we test.

diff --git a/lib/libzfs/libzfs_import.c b/lib/libzfs/libzfs_import.c
index 7048a52..1436dc8 100644
--- a/lib/libzfs/libzfs_import.c
+++ b/lib/libzfs/libzfs_import.c
@@ -995,7 +995,7 @@ zpool_find_import_impl(libzfs_handle_t *hdl, importargs_t *
        size_t pathleft;
        struct stat64 statbuf;
        nvlist_t *ret = NULL, *config;
-       static char *default_dir = DISK_ROOT;
+       static char *default_dir = "/dev/disk/by-id";
        int fd;
        pool_list_t pools = { 0 };
        pool_entry_t *pe, *penext;
ZFS on Linux member

You will (as you noted) need to export, destroy the cache file, and reimport the pool for the default the take effect.

ZFS on Linux member

The first bullet point is #965 which is now fixed in head.

ZFS on Linux member

..and #473 which has been in the tree for a long time fixed the second bullet. Let's close this catch all issue and open new ones for any lingering Solaris'isms which need to be tweaked.

@behlendorf behlendorf closed this Oct 17, 2012
@kernelOfTruth kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Mar 1, 2015
@ryao ryao Fix race in spl_kmem_cache_reap_now()
The current code contains a race condition that triggers when bit 2 in
spl.spl_kmem_cache_expire is set, spl_kmem_cache_reap_now() is invoked
and another thread is concurrently accessing its magazine.

spl_kmem_cache_reap_now() currently invokes spl_cache_flush() on each
magazine in the same thread when bit 2 in spl.spl_kmem_cache_expire is
set. This is unsafe because there is one magazine per CPU and the
magazines are lockless, so it is impossible to guarentee that another
CPU is not using its magazine when this function is called.

The solution is to only touch the local CPU's magazine and leave other
CPU's magazines to other CPUs.

Reported-by: DHE
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #274
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment