Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

ZFS Solaris Behaviors #274

Closed
behlendorf opened this Issue · 12 comments

5 participants

@behlendorf
Owner

This is mainly just a tracking bug to jot some some Solaris behaviors which may not make sense under Linux. In general, I want to keep the Solaris behaviors because quite a bit of thought has gone in to them. Also there are large number of people already familiar with ZFS and they have certain expectations about how things work. However, there are also some behaviors which may not make sense under Linux. These are the issues which we need to take a hard look at consider altering the default behavior.

  • zfs import - By default under Solaris the top level /dev directory is scanned for pools. Under Linux it may be wiser to make the default directory /dev/disk/by-id/ when it exists. The Linux port relies on udev which is the standard Linux mechanism to prevent device reordering. Since we don't want device reordering to cause problems for people it may make sense to make this the default even though this differs from Solaris.

  • Allow mounting over non-empty mount point. Under Solaris ZFS prevents you from mounting over a non-empty mount point, under Linux this is allowed and even expected behavior. There's no harm in permitting it other than consistency with Solaris.

@dajhorn
Collaborator

@behlendorf: For the first bullet, would you mind if I put dajhorn/pkg-zfs@6da0b25 into the PPA? Device enumeration problems are recurring and have caused a non-trivial support load.

It is also worth noting that the behavior described in the second bullet has caused several init bugs to be discovered and has discouraged users from interleaving mounts between different filesystems. (eg: Some of the systemd threads.)

@behlendorf
Owner

I'm not opposed to making this change in the ppa or even in the upstream source. However, I suspect the right fix is going to be a little more involved than just changing the define. For example the ability to use the shorr names might depend on that define remaining /dev, but I need to check the source to be sure.

I'm still on vacation for a few more days, hense why I've been so quiet. When I get back latter this weej I'll try and clear some of the back log of pull requests!

@behlendorf
Owner

@dajhorn: I took a closer look at what your proposing and I think your right it's not quite as bad as I'd feared. However, what I think your really want is to just update default\_dir in zpool\_find\_import\_impl() to say /dev/disk/by-id/. This way only newly imported pools get impacted which is basically what we want. We can do this by adding a new #define IDISK_ROOT.

Additionally, you could add code to make\_leaf\_vdev() to print a warning if a pool is created using the top level /dev/<device> names. A brief explanation why this is a bad idea might be enough to resolve most of the support issues.

@maxximino

I might add that actually log devices and cache devices have different behaviours:
I've added them both using symlink /dev/disk/by-id/ata-OCZ-VERTEX3_OCZ-XXXXXXXXXXXX-partN .
The symlink of the log device gets resolved.
In zpool status I see:

        logs
          sdd1                                        ONLINE       0     0     0
        cache
          ata-OCZ-VERTEX3-OCZ-XXXXXXXXXXXXXXX-part2  ONLINE       0     0     0

Tried removing and re-adding many times and using different symlinks. Same behaviour.
In my situation it's not a "problem". I know that now zfs expects log device always in sdd1. but you might like to solve this, if it causes troubles to other people.

@behlendorf
Owner

@dajhorn Even better than changing the default to /dev/disk/by-id would be making the default configurable. We're probably long over due for adding support for a /etc/zfs/zfs.conf file. Of the top of my head here's a list of a few things it makes sense to have configurable on a per-system basis. They should be self explanitory and this is a non-exhaustive list.

  • ZPOOL_IMPORT_DIRECTORY = /dev/disk/by-id

  • ZPOOL_IMPORT_AUTO = yes | no

  • ZPOOL_CACHE_FILE = /etc/zfs/zpool.cache | none

  • ZFS_MOUNT_AUTO = yes | no

@maxximino That sounds like a bug we'll need to get sorted out.

@Rudd-O

PLEASE make the default /dev/disk/by-id, I am tired of my root pool being imported as /dev/sda :-(

@behlendorf
Owner

@Rudd-O As mentioned above I'm OK with this but it needs to be tested. Can you try making the following minor change and let us know if you notice any unexpected side effects, there shouldn't be any but that's why we test.

diff --git a/lib/libzfs/libzfs_import.c b/lib/libzfs/libzfs_import.c
index 7048a52..1436dc8 100644
--- a/lib/libzfs/libzfs_import.c
+++ b/lib/libzfs/libzfs_import.c
@@ -995,7 +995,7 @@ zpool_find_import_impl(libzfs_handle_t *hdl, importargs_t *
        size_t pathleft;
        struct stat64 statbuf;
        nvlist_t *ret = NULL, *config;
-       static char *default_dir = DISK_ROOT;
+       static char *default_dir = "/dev/disk/by-id";
        int fd;
        pool_list_t pools = { 0 };
        pool_entry_t *pe, *penext;
@Rudd-O
@Rudd-O
@behlendorf
Owner

You will (as you noted) need to export, destroy the cache file, and reimport the pool for the default the take effect.

@ryao

The first bullet point is #965 which is now fixed in head.

@behlendorf
Owner

..and #473 which has been in the tree for a long time fixed the second bullet. Let's close this catch all issue and open new ones for any lingering Solaris'isms which need to be tweaked.

@behlendorf behlendorf closed this
@kernelOfTruth kernelOfTruth referenced this issue from a commit in kernelOfTruth/zfs
@ryao ryao Fix race in spl_kmem_cache_reap_now()
The current code contains a race condition that triggers when bit 2 in
spl.spl_kmem_cache_expire is set, spl_kmem_cache_reap_now() is invoked
and another thread is concurrently accessing its magazine.

spl_kmem_cache_reap_now() currently invokes spl_cache_flush() on each
magazine in the same thread when bit 2 in spl.spl_kmem_cache_expire is
set. This is unsafe because there is one magazine per CPU and the
magazines are lockless, so it is impossible to guarentee that another
CPU is not using its magazine when this function is called.

The solution is to only touch the local CPU's magazine and leave other
CPU's magazines to other CPUs.

Reported-by: DHE
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #274
251e7a7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.