Complete systemd support, including autotools support, configure support and RPM support #547

Closed
wants to merge 167 commits into
from

Conversation

Projects
None yet
8 participants
Contributor

Rudd-O commented Feb 1, 2012

I will let the DEB experts and others add whatever is needed to package the systemd support (which shouldn't be big because Ubuntu is stuck with Upstart).

Tested on my local machines with ZFS as a root file system.

Rudd-O and others added some commits Mar 11, 2011

@Rudd-O Rudd-O initial commit 67352b0
@Rudd-O Rudd-O add things that other people need to do, that are not me. I am happy …
…with how this is.
23296f6
@Rudd-O Rudd-O dox changes 90503cc
@Rudd-O Rudd-O added todo list item c377a18
@Rudd-O Rudd-O remove hardcoded arcsize isntructions in favor of documenting modprob…
…e.conf file, and prevent mount of imported datasets
c82a31b
@Rudd-O Rudd-O no need to umount zfs filesystems anymore, since they wont be mounted 41fac95
@Rudd-O Rudd-O simple doc addition 35d8659
@Rudd-O Rudd-O commit autodetection of root file system based on bootfs property dur…
…ing the initramdisk process. also, right before the transition between initrd and real root, only import THE ONE pool that holds the root fs, and only mount THE ONE filesystem that is the root. The rest imported pools can be imported normally using an initscript that will correctly read and obey zpool.cache, and filesystems will be mounted with zfs mount -a in said initscript.
61b05e8
@Rudd-O Rudd-O initscript added 733d27a
@Rudd-O Rudd-O organizing dracut files 95513ad
@Rudd-O Rudd-O organizing the dracut files 32395e8
@Rudd-O Rudd-O fix mangling of slashes in cmdline parser 5373c1a
@Rudd-O Rudd-O eliminate the reliance on /sbin binaries e5558ab
@Rudd-O Rudd-O announce syncing and fix mount options of already-mounted mounted roo…
…t filesystem so they will appear on /etc/mtab
9096300
@Rudd-O Rudd-O there is no uniq in the root file system, only in /usr, which preclud…
…es this initscript from working before /usr has been mounted
008393b
@Rudd-O Rudd-O failure in sort leading to malfunction. repairing. 2454e5e
@Rudd-O Rudd-O read all mounted file systems before continuing on boot to work aroun…
…d zfs returning EPERM when cding into a file systme as nonroot before root does
488cbcc
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 8b01984
@Rudd-O Rudd-O added myself to authors file to cherrypick some changes from brian 446f874
@behlendorf @Rudd-O behlendorf Always allow '-o remount,ro'
Allow the mount(8) utility to always operate on all datasets when
remounting them read-only.  This critical for rc.sysinit/umountroot
which remounts the root filesystem read-only during shutdown to
ensure everything is correctly flushed to disk.

Fix minor typo, the check to set zfsutil should use the bitwise
'&'.  I must have accidentally hit the adjacent '*' and obviously
neither the compiler or my code review caught this.  Fix it now.
349c9e3
@behlendorf @Rudd-O behlendorf Strip 'zfsutil,remount' from /etc/mtab, add SELinux struct entries
When updating /etc/mtab we should be careful and strip certain
options.  In particular, we need to strip 'zfsutil' because if
we don't the mount utility will helpfull provide it to the
mount helper when we issue mount(8) again.  This subverts the
check that the caller is zfs(8) and not mount(8).
291fc95
@behlendorf @Rudd-O behlendorf Register .sync_fs handler
Register the missing .sync_fs handler.  This is a noop in most cases
because the usual requirement is that sync just be initiated.  As part
of the DMU's normal transaction processing txgs will be frequently
synced.  However, when the 'wait' flag is set the requirement is that
.sync_fs must not return until the data is safe on disk.  With the
addition of the .sync_fs handler this is now properly implemented.
d220b01
@behlendorf @Rudd-O behlendorf Register .remount_fs handler
Register the missing .remount_fs handler.  This handler isn't strictly
required because the VFS does a pretty good job updating most of the
MS_* flags.  However, there's no harm in using the hook to call the
registered zpl callback for various MS_* flags.  Additionaly, this
allows us to lay the ground work for more complicated argument parsing
in the future.
fc83910
@behlendorf @Rudd-O behlendorf Add init scripts
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed.  Unfortunately, every distribution
has their own idea of the _right_ way to do things.  Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway.  I have
instead added support to provide multiple distribution specific
init scripts.

The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.

Currently, there is zfs.fedora and a more generic zfs.lsb init
script.  Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.

This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
9a315fc
@Rudd-O Rudd-O mountzfs contents of script put in place for zfs.fedora 0abb85d
@Rudd-O Rudd-O Add dracut support (modified to merge correctly)
To simplify the process of using zfs as your root filesystem a
zfs-drucat sub-package has been added.  This sub-package adds a zfs
dracut module which allows your initramfs to be rebuilt with zfs
support.  The process for doing this is still complicated but there
is clearly interest from the community about getting this working
well and documented.  This should help lay some of the groundwork.

Longer term these changes should be pushed in the upstream dracut
package.  Once that occurs this subpackage will no longer be
required for new systems, however we may want to conditionally
build this package in the future for systems running older
dracut versions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
d5d43be
@Rudd-O Rudd-O expanded fix to make recently-mounted filesystems readable by nonroot…
… users
02f2d58

Is this 'ls' needed to handle issue #164?

Owner

Rudd-O replied Mar 21, 2011

Unfortunately it is needed to work around https://github.com/behlendorf/zfs/issues/164 . But remember that it won't handle all the cases -- what about pools and file systems created AFTER the initscript has run? So it remains a bug and cannot be closed. But at least I don't have to drop to a console, log in as root, cd into every directory, every time I boot the machine, before I can log on as my regular user.

OK, I just wanted to make sure. Getting to the root cause of this then and getting it fixed needs to happen.

Rudd-O and others added some commits Mar 22, 2011

@Rudd-O Rudd-O comply with mtab locking and use more robust logic 578321c
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs
Initscript updated with my changes.

Conflicts:
	dracut/90zfs/check
	dracut/90zfs/install
	dracut/90zfs/installkernel
	dracut/90zfs/mount-zfs.sh
	etc/init.d/zfs.fedora
	module/zfs/zpl_super.c
308e018
@Rudd-O Rudd-O Path and return code compliance with Fedora standards: eb89368
@Rudd-O Rudd-O gitignore added for cmds mount.zfs and zvol_id e9f17b1
@Rudd-O Rudd-O Compliance with FCNewInit/Initscripts: sanity-check the existence of …
…ZFS modules and tools within the action function
eab2bd7
@Rudd-O Rudd-O Re-adding atomic.S in asm-generic, lost in commit 488cbcc 9549dd1
@behlendorf @Rudd-O behlendorf Linux 2.6.29 compat, .freeze_fs/.unfreeze_fs
The .freeze_fs/.unfreeze_fs hooks were not added until Linux 2.6.29
Since these hooks are currently unused they are being removed to
allow support of older kernels.
5469b88
@Rudd-O Rudd-O Remove dracut README that was moved by Brian to the root directory 1f09551
@Rudd-O Rudd-O Wrong file deleted. Revert "Remove dracut README that was moved by Br…
…ian to the root directory"

This reverts commit 1f09551.
c7c89b6
@Rudd-O Rudd-O We keep the README file written by Brian, and delete the file written…
… by me, which is obsolete.
2f39b69
@Rudd-O Rudd-O avoid reregistering mounts if mtab points to /proc/mounts, otherwise …
…reregistering will end up in catastrophe
21b2659
@Rudd-O Rudd-O Add support for mounting zvols registered in /etc/fstab bace2de
@Rudd-O Rudd-O fixed evaluating error in readlink detection of proc mounts symlinked…
… from etc mtab
4ece2db
@Rudd-O Rudd-O make it explicit that we now re-register both filesystems and volumes a517f53
@Rudd-O Rudd-O small typo wrote the file system type incorrectly in mtab at reregist…
…ering time
354ca13
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 7070119
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 0adfa52
@Rudd-O Rudd-O Permit both mountpoint=legacy and mountpoint=/ in initrd 08ac479
@Rudd-O Rudd-O Preparation for systemd: does not need local_fs to be a dependency fo…
…r this initscript to be started
c7deb8c
@Rudd-O Rudd-O In Fedora 15, mtab will be a link to /proc/self/mounts rather than /p…
…roc/mounts
bce0d4d
@Rudd-O Rudd-O Merging upstream / behlendorf changes 921ad47
@gunnarbeutner @Rudd-O gunnarbeutner Fixed a use-after-free bug in zfs_zget().
Fixed a bug where zfs_zget could access a stale znode pointer when
the inode had already been removed from the inode cache via iput ->
iput_final -> ... -> zfs_zinactive but the corresponding SA handle
was still alive.
eb72538
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs cb83863
@Rudd-O Rudd-O Be more verbose in what ZFS dracut is doing, and smarter in what moun…
…t command to use when mounting the root filesystem (legacy needs different treatmnet)
478c29a
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs efa98ad
@Rudd-O Rudd-O improve dracut detection of root volume b73bf46
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 737f491
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 888b449
@pendor @behlendorf pendor Update zfs.gentoo init script
* Update paths to zpool/zfs tools,
* Log less for non-error conditions,
* Don't be fatal if umount fails at shutdown -- final init remount
  will take care of it if /usr or / are in use

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
540d5eb
@pendor @behlendorf pendor Update for Dracut-010
Update Dracut module for Dracut-010 and fix race conditions that
caused boot to fail on MP systems.  Add support for zfs_force flag
and parsing of spl_hostid from kernel command line.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
fa334df
@pendor @behlendorf pendor Document initramfs process
Add documentation for Dracut and the initramfs process.  This includes
detailing the basic boot process and options available.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
10cd09e

Kernel parameters are subject to decimal/octal/hexadecimal interpretation, so this example should be spl_hostid=0x00bab10c.

Rudd-O and others added some commits Jul 6, 2011

@Rudd-O Rudd-O Fix script trying to mount hardcoded pool laptop, and force import of…
… pools otherwise the pool does not get imported since it does not get exported before last reboot
5d8ba13
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs b27e4e2
@Rudd-O Rudd-O Merge remote-tracking branch 'behlendorf/dracut'
Conflicts:
	dracut/90zfs/mount-zfs.sh
	dracut/90zfs/parse-zfs.sh
	dracut/90zfs/zfs-genrules.sh
	dracut/README.dracut.markdown

Resolved by choosing upstream changes.
02bab31
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs
Conflicts:
	dracut/README.dracut.markdown
c23989d
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master' 4842b4e
@Rudd-O Rudd-O Merge remote-tracking branch 'behlendorf/master' 9ddc79b
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master' 93b27e2
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/zfsonlinux/zfs af61384
@behlendorf behlendorf Fix NULL deref in balance_pgdat()
Be careful not to unconditionally clear the PF_MEMALLOC bit in
the task structure.  It may have already been set when entering
zpl_putpage() in which case it must remain set on exit.  In
particular the kswapd thread will have PF_MEMALLOC set in
order to prevent it from entering direct reclaim.  By clearing
it we allow the following NULL deref to potentially occur.

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<ffffffff8109c7ab>] balance_pgdat+0x25b/0x4ff

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #287
80c97dd
@Rudd-O Rudd-O update file mode to match upstream 31be97a
@Rudd-O Rudd-O Merge remote-tracking branch 'behlendorf/issue-287' d515386
@Rudd-O Rudd-O ZFS now sports basic systemd support. Yes, that is right, parallel-mo…
…unted file systems at boot, without need for fstab, completely free of configuration.
3812746
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 30d2da9
@Rudd-O Rudd-O added more items to the todo list 6e0b292
@Rudd-O Rudd-O amending one more possibility to the list of to do items 08cb73e
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 887ba97
@Rudd-O Rudd-O Merge remote-tracking branch 'behlendorf/master' c6b0c22
@Rudd-O Rudd-O Add grub2 patches to make menu generation even work. However, I could…
… not for the life of me get /boot on ZFS dataset to work.
c53cec0
@Rudd-O Rudd-O Correct patch to reflect /boot on ZFS. GRUB_DEVICE and GRUB_DEVICE_BO…
…OT variables reflect the devices backing / and /boot respectively, thus they should be adjusted as such.
134a8c1
@Rudd-O Rudd-O Added HOWTO to install Fedora 16 to a root pool 819f686
@Rudd-O Rudd-O formatting fixes for the ZFS on root guide f1047d7
@Rudd-O Rudd-O Formatting changes in HOWTO a6b793f
@Rudd-O Rudd-O systemd todo items updated 3a198a1
@Rudd-O Rudd-O rehashed TO DO list a77e2bf
@Rudd-O Rudd-O temporarily remove unit file to suppress the zfs initscript 74c7452
@Rudd-O Rudd-O added suspension of remount-rootfs.service if zfs is on root c44332b
@Rudd-O Rudd-O detect and gracefully handle if there is no ZFS support or there is n…
…o zpool.cache
60e50a6
@Rudd-O Rudd-O make output to stdout nicer 0b7f8b8
@Rudd-O Rudd-O re-disable the zfs initscript as it may mount undesirable file systems d779802
@Rudd-O Rudd-O in an attempt to make systemd read our overridden remount-rootfs.serv…
…ice, link it into the wants directory
8891dfe
@Rudd-O Rudd-O no need to wait for syslog in order to have an effect 2d4d688
@Rudd-O Rudd-O remount-rootfs.service cannot be overridden so best is just to ignore…
… it by commenting that code
292e44b
@behlendorf @Rudd-O behlendorf Limit maximum ashift value to 12
While we initially allowed you to set your ashift as large as 17
(SPA_MAXBLOCKSIZE) that is actually unsafe.  What wasn't considered
at the time is that each uberblock written to the vdev label ring
buffer will be of this size.  Now the buffer is statically sized
to 128k and we need to be able to fit several uberblocks in it.
With a large ashift that becomes a problem.

Therefore I'm reducing the maximum configurable ashift value to 12.
This is large enough for the 4k sector drives and small enough that
we can still keep the most recent 32 uberblock in the vdev label
ring buffer.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #425
3050e6d
@behlendorf @Rudd-O behlendorf Implement SA based xattrs
The current ZFS implementation stores xattrs on disk using a hidden
directory.  In this directory a file name represents the xattr name
and the file contexts are the xattr binary data.  This approach is
very flexible and allows for arbitrarily large xattrs.  However,
it also suffers from a significant performance penalty.  Accessing
a single xattr can requires up to three disk seeks.

  1) Lookup the dnode object.
  2) Lookup the dnodes's xattr directory object.
  3) Lookup the xattr object in the directory.

To avoid this performance penalty Linux filesystems such as ext3
and xfs try to store the xattr as part of the inode on disk.  When
the xattr is to large to store in the inode then a single external
block is allocated for them.  In practice most xattrs are small
and this approach works well.

The addition of System Attributes (SA) to zfs provides us a clean
way to make this optimization.  When the dataset property 'xattr=sa'
is set then xattrs will be preferentially stored as System Attributes.
This allows tiny xattrs (~100 bytes) to be stored with the dnode and
up to 64k of xattrs to be stored in the spill block.  If additional
xattr space is required, which is unlikely under Linux, they will be
stored using the traditional directory approach.

This optimization results in roughly a 3x performance improvement
when accessing xattrs which brings zfs roughly to parity with ext4
and xfs (see table below).  When multiple xattrs are stored per-file
the performance improvements are even greater because all of the
xattrs stored in the spill block will be cached.

However, by default SA based xattrs are disabled in the Linux port
to maximize compatibility with other implementations.  If you do
enable SA based xattrs then they will not be visible on platforms
which do not support this feature.

----------------------------------------------------------------------
   Time in seconds to get/set one xattr of N bytes on 100,000 files
------+--------------------------------+------------------------------
      |            setxattr            |            getxattr
bytes |  ext4     xfs zfs-dir  zfs-sa  |  ext4     xfs zfs-dir  zfs-sa
------+--------------------------------+------------------------------
1     |  2.33   31.88   21.50    4.57  |  2.35    2.64    6.29    2.43
32    |  2.79   30.68   21.98    4.60  |  2.44    2.59    6.78    2.48
256   |  3.25   31.99   21.36    5.92  |  2.32    2.71    6.22    3.14
1024  |  3.30   32.61   22.83    8.45  |  2.40    2.79    6.24    3.27
4096  |  3.57  317.46   22.52   10.73  |  2.78   28.62    6.90    3.94
16384 |   n/a 2342.39   34.30   19.20  |   n/a   45.44  145.90    7.55
65536 |   n/a 2941.39  128.15  131.32* |   n/a  141.92  256.85  262.12*

Legend:
* ext4      - Stock RHEL6.1 ext4 mounted with '-o user_xattr'.
* xfs       - Stock RHEL6.1 xfs mounted with default options.
* zfs-dir   - Directory based xattrs only.
* zfs-sa    - Prefer SAs but spill in to directories as needed, a
              trailing * indicates overflow in to directories occured.

NOTE: Ext4 supports 4096 bytes of xattr name/value pairs per file.
NOTE: XFS and ZFS have no limit on xattr name/value pairs per file.
NOTE: Linux limits individual name/value pairs to 65536 bytes.
NOTE: All setattr/getattr's were done after dropping the cache.
NOTE: All tests were run against a single hard drive.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #443
344fbf1
@schakrava @Rudd-O schakrava Allow leading digits in userquota/groupquota names
While setting/getting userquota and groupquota properties, the input
was not treated as a possible username or groupname if it had a
leading digit. While useradd in linux recommends the regexp
[a-z_][a-z0-9_-]*[$]? , it is not enforced. This causes problem for
usernames with leading digits in them. We need to be able to support
getting and setting properties for this unconventional but possible
input category

I've updated the code to validate the username or groupname directly
via the API. Also, note that I moved this validation to the beginning
before the check for SID names with @. This also supports usernames
with @ character in them which are valid. Only when input with @ is
not a valid username, it is interpreted as a potential SID name.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #428
769e160
@prakashsurya @Rudd-O prakashsurya In autoconf v2.68, AC_LANG_PROGRAM must be quoted
This change updates the AC_LANG_PROGRAM autoconf macro invocations to be
wrapped in quotes. As of autoconf version 2.68, the quotes are necessary
to prevent warnings from appearing. Specifically, the autoconf v2.68
Forward Porting Notes specifies:

    It is important to note that you need to ensure that the call to
    AC_LANG_SOURCE is quoted and not expanded, otherwise that will
    cause the warning to appear nonetheless.

Finally, because of the additional quoting we can drop the extra
quotas used by the ZFS_AC_CONFIG_USER_STACK_GUARD autoconf check.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #464
76e66e8
@behlendorf @Rudd-O behlendorf Allow xattrs on symlinks
The Solaris version of ZFS does not allow xattrs to be set on
symlinks due to the way they implemented the attropen() system
call.  Linux however implements xattrs through the lgetxattr()
and lsetxattr() system calls which do not have this limitation.

The only reason this hasn't always worked under ZFS on Linux
is that the xattr handlers were not registered for symlink type
inodes.  This was done simply to be consistent with the Solaris
behavior.

Upon futher reflection I believe this should be allowed under
Linux.  The only ill effect would be that the xattrs on symlinks
will not be visible when the pool is imported on a Solaris
system.  This also has the benefit that it allows for SELinux
style security xattr labeling which expects to be able to set
xattrs on all inode types.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #272
5a846b4
@prakashsurya @Rudd-O prakashsurya Fix configure tests to play nice with GCC 4.6
As of GCC 4.6, specific kernel 2.6.32 header files do not compile
cleanly without warnings. One specific example of this is the
arch/x86/include/asm/percpu.h file. Thus, a few of the configure tests
were getting hung up on this and the '-Wno-unsued-but-set-variables'
compile option had to be introduced.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #459
ca1fe7b
@gunnarbeutner @Rudd-O gunnarbeutner Added comments for libshare's NFS functions.
Some of the functions' purpose wasn't immediately obvious without
additional explanations. This commit adds these missing comments.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
e0cf7af
@dajhorn @Rudd-O dajhorn Support path_id changes in udev 174.
The /lib/udev/path_id helper became a builtin command in the udev 174
release, so test whether path_id is external in the zpool_id script.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #429
5e44270
@dajhorn @Rudd-O dajhorn Quote variables in the zpool_id script.
For consistency and safety, quote all variables in the zpool_id
script. This accomodates a `-c CONFIG` parameter value with
whitespace in the path name.

Also fix a typo in the usage synopsis for `-h`.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #439
dbe0628
@dajhorn @Rudd-O dajhorn Demote egrep to grep in the zpool_id script.
Direct invocation of GNU egrep is deprecated by its man page, and the
its argument in the zpool_id script is not an extended expression.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
a23073b
@dajhorn @Rudd-O dajhorn Demote the whackbang in the zpool_id script.
The zpool_id script is posixly correct and does not use bash
features, so change its whackbang from /bin/bash to /bin/sh.

Debian policy also stipulates that system scripts be dash compatible.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
42c12e9
@dajhorn @Rudd-O dajhorn Source /etc/default/zfs after setting defaults.
Let the administrator override all script variables by sourcing the
/etc/default/zfs file after the default values are set.

The spelling mistake in the old path name makes it unlikely that this
bug affected any users.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #371
e2bcb00
@dajhorn @Rudd-O dajhorn Quote variables in the zfs.lsb script.
For consistency and safety, quote all variables in the zfs.lsb script.
This protects in the unlikely case that any of the file names contain
whitespace.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #439
f2863bf
@behlendorf @Rudd-O behlendorf Update default ARC memory limits
In the upstream OpenSolaris ZFS code the maximum ARC usage is
limited to 3/4 of memory or all but 1GB, whichever is larger.
Because of how Linux's VM subsystem is organized these defaults
have proven to be too large which can lead to stability issues.

To avoid making everyone manually tune the ARC the defaults are
being changed to 1/2 of memory or all but 4GB.  The rational for
this is as follows:

* Desktop Systems (less than 8GB of memory)

  Limiting the ARC to 1/2 of memory is desirable for desktop
  systems which have highly dynamic memory requirements.  For
  example, launching your web browser can suddenly result in a
  demand for several gigabytes of memory.  This memory must be
  reclaimed from the ARC cache which can take some time.  The
  user will experience this reclaim time as a sluggish system
  with poor interactive performance.  Thus in this case it is
  preferable to leave the memory as free and available for
  immediate use.

* Server Systems (more than 8GB of memory)

  Using all but 4GB of memory for the ARC is preferable for
  server systems.  These systems often run with minimal user
  interaction and have long running daemons with relatively
  stable memory demands.  These systems will benefit most by
  having as much data cached in memory as possible.

These values should work well for most configurations.  However,
if you have a desktop system with more than 8GB of memory you may
wish to further restrict the ARC.  This can still be accomplished
by setting the 'zfs_arc_max' module option.

Additionally, keep in mind these aren't currently hard limits.
The ARC is based on a slab implementation which can suffer from
memory fragmentation.  Because this fragmentation is not visible
from the ARC it may believe it is within the specified limits while
actually consuming slightly more memory.  How much more memory get's
consumed will be determined by how badly fragmented the slabs are.

In the long term this can be mitigated by slab defragmentation code
which was OpenSolaris solution.  Or preferably, using the page cache
to back the ARC under Linux would be even better.  See issue #75
for the benefits of more tightly integrating with the page cache.

This change also fixes a issue where the default ARC max was being
set incorrectly for machines with less than 2GB of memory.  The
constant in the arc_c_max comparison must be explicitly cast to
a uint64_t type to prevent overflow and the wrong conditional
branch being taken.  This failure was typically observed in VMs
which are commonly created with less than 2GB of memory.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #75
ca1d2b3
@behlendorf @Rudd-O behlendorf Set zvol_major/zvol_threads permissions
The zvol_major and zvol_threads module options were being created
with 0 permission bits.  This prevented them from being listed in
the /sys/module/zfs/parameters/ directory, although they were
visible in `modinfo zfs`.  This patch fixes the issue by updating
the permission bits to 0444.  For the moment these options must
be read-only because they are used during module initialization.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #392
66b934b
@Rudd-O Garrett D'Amore Illumos #734: Use taskq_dispatch_ent() interface
It has been observed that some of the hottest locks are those
of the zio taskqs.  Contention on these locks can limit the
rate at which zios are dispatched which limits performance.

This upstream change from Illumos uses new interface to the
taskqs which allow them to utilize a prealloc'ed taskq_ent_t.
This removes the need to perform an allocation at dispatch
time while holding the contended lock.  This has the effect
of improving system performance.

Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Alexey Zaytsev <alexey.zaytsev@nexenta.com>
Reviewed by: Jason Brian King <jason.brian.king@gmail.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References to Illumos issue:
  https://www.illumos.org/issues/734

Ported-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #482
aacfefc
@prakashsurya @Rudd-O prakashsurya Add make rule for building Arch Linux packages
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:

    $ ./configure
    $ make pkg     # Alternatively, one can run 'make arch' as well

on the Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the zfs userland utilities and
another for the zfs kernel modules. The new packages can then be
installed by running:

    # pacman -U $package.pkg.tar.xz

In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be build using the 'sarch' make
rule.

NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #491
e48f7fa
@dajhorn @Rudd-O dajhorn Update the character class in the zpool man page.
ZoL and all Solaris derivatives allow pool names to contain the colon
and space characters. Update the man page to reflect current behavior.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #438
475b9fd
@dajhorn @Rudd-O dajhorn Linux 3.2 compat: set_nlink()
Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit

  SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170

Use the new set_nlink() kernel function instead.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #462
c19c252
@dajhorn @Rudd-O dajhorn Add LIBSELINUX to mount_zfs_LDFLAGS.
Regenerating the autotools configuration on Debian and Ubuntu systems
causes compilation to fail with this error message:

  cmd/mount_zfs/../../cmd/mount_zfs/mount_zfs.c:403:
    undefined reference to `is_selinux_enabled'

In the automake template, set "mount_zfs_LDFLAGS = ... $(LIBSELINUX)"
so that the /sbin/mount.zfs utility is linked to libselinux.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
85c3fb4
@prakashsurya @Rudd-O prakashsurya Move Arch Linux's VENDOR check above Ubuntu's
If the lsb-release package is installed on an Arch Linux distribution,
the configure step will incorrectly detect the running distribution as
Ubuntu. This is a result of both distributions providing an
/etc/lsb-release file, and the Ubuntu VENDOR check being performed
first.

Since the Arch Linux test check's for a file more specific to the Arch
Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's
check provides a quick and easy solution.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
578535b
@behlendorf @Rudd-O behlendorf Linux 3.1 compat, super_block->s_shrink
The Linux 3.1 kernel has introduced the concept of per-filesystem
shrinkers which are directly assoicated with a super block.  Prior
to this change there was one shared global shrinker.

The zfs code relied on being able to call the global shrinker when
the arc_meta_limit was exceeded.  This would cause the VFS to drop
references on a fraction of the dentries in the dcache.  The ARC
could then safely reclaim the memory used by these entries and
honor the arc_meta_limit.  Unfortunately, when per-filesystem
shrinkers were added the old interfaces were made unavailable.

This change adds support to use the new per-filesystem shrinker
interface so we can continue to honor the arc_meta_limit.  The
major benefit of the new interface is that we can now target
only the zfs filesystem for dentry and inode pruning.  Thus we
can minimize any impact on the caching of other filesystems.

In the context of making this change several other important
issues related to managing the ARC were addressed, they include:

* The dnlc_reduce_cache() function which was called by the ARC
to drop dentries for the Posix layer was replaced with a generic
zfs_prune_t callback.  The ZPL layer now registers a callback to
drop these dentries removing a layering violation which dates
back to the Solaris code.  This callback can also be used by
other ARC consumers such as Lustre.

  arc_add_prune_callback()
  arc_remove_prune_callback()

* The arc_reduce_dnlc_percent module option has been changed to
arc_meta_prune for clarity.  The dnlc functions are specific to
Solaris's VFS and have already been largely eliminated already.
The replacement tunable now represents the number of bytes the
prune callback will request when invoked.

* Less aggressively invoke the prune callback.  We used to call
this whenever we exceeded the arc_meta_limit however that's not
strictly correct since it results in over zeleous reclaim of
dentries and inodes.  It is now only called once the arc_meta_limit
is exceeded and every effort has been made to evict other data from
the ARC cache.

* More promptly manage exceeding the arc_meta_limit.  When reading
meta data in to the cache if a buffer was unable to be recycled
notify the arc_reclaim thread to invoke the required prune.

* Added arcstat_prune kstat which is incremented when the ARC
is forced to request that a consumer prune its cache.  Remember
this will only occur when the ARC has no other choice.  If it
can evict buffers safely without invoking the prune callback
it will.

* This change is also expected to resolve the unexpect collapses
of the ARC cache.  This would occur because when exceeded just the
arc_meta_limit reclaim presure would be excerted on the arc_c
value via arc_shrink().  This effectively shrunk the entire cache
when really we just needed to reclaim meta data.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #466
Closes #292
1ad103a
@Rudd-O Rudd-O To match upstream, the old boot from zfs README was removed c3021a2
@dajhorn @Rudd-O dajhorn Avoid using awk in the zpool_id script.
Some implementations of `awk` incorrectly parse the \< and \> regex
symbols, so use a `while read` loop and regular globbing instead.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #259
dad8dc7
@dajhorn @Rudd-O dajhorn Apply the ZoL coding standard to zpl_xattr.c
Make the indenting in the zpl_xattr.c file consistent with the Sun
coding standard by removing soft tabs.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
e30639f
@rlaager @Rudd-O rlaager Treat /dev/vd* as whole disks
Correctly detect /dev/vd devices as whole disks and attempt to
create an EFI partition table.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
d4e1d69
@Rudd-O Suman Chakravartula Add overlay(-O) mount option support
Linux supports mounting over non-empty directories by default.
In Solaris this is not the case and -O option is required for
zfs mount to mount a zfs filesystem over a non-empty directory.

For compatibility, I've added support for -O option to mount
zfs filesystems over non-empty directories if the user wants
to, just like in Solaris.

I've defined MS_OVERLAY to record it in the flags variable if
the -O option is supplied.  The flags variable passes through
a few functions and its checked before performing the empty
directory check in zfs_mount function.  If -O is given, the
check is not performed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #473
1d861ac
@behlendorf @Rudd-O behlendorf Linux 3.2 compat, security_inode_init_security()
The security_inode_init_security() API has been changed to include
a filesystem specific callback to write security extended attributes.
This was done to support the initialization of multiple LSM xattrs
and the EVM xattr.

This change updates the code to use the new API when it's available.
Otherwise it falls back to the previous implementation.

In addition, the ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY
autoconf test has been made more rigerous by passing the expected
types.  This is done to ensure we always properly the detect the
correct form for the security_inode_init_security() API.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #516
8c0e01e
@prakashsurya @Rudd-O prakashsurya Run ZFS_AC_PACMAN only if $VENDOR is "arch"
Unfortunately, Arch's package manager `pacman` shares it's name with a
popular arcade video game. Thus, in order to refrain from executing the
video game when we mean to execute the package manager, ZFS_AC_PACMAN is
now only run when $VENDOR is determined to be "arch".

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #517
5b47eba
@behlendorf @Rudd-O behlendorf Increase link count limit to 2^31-1
Originally, the per-file link limit was set to 65536 because the
exact Linux VFS limit was unclear.  Internally ZFS is able to
support 64-bit link counts.  After a more careful investigation
the limit can be safely raised to 2^31-1.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #514
63b292e
@Rudd-O Rudd-O silly variable removed e55d7e1
@Rudd-O Rudd-O cleanups in the script for the escaper and the generator command line 9afbcdb
@Rudd-O Rudd-O gendir needed to be quoted 07ea36e
@Rudd-O Rudd-O fix escaping of mountpoints and remove swap from the list of mountpoi…
…nts as well
ae1513b
@Rudd-O Rudd-O clean up root pool filesystem exclusion rule code 528bda4
@Rudd-O Rudd-O Add support in the build system for systemd 83a9b61
@Rudd-O Rudd-O minor casing bug causing no rule to build the generator script d5a8b20
@Rudd-O Rudd-O Added the ability to substitute the systemd and generator dir variabl…
…e macros in the build system to the script
976a1f5
@Rudd-O Rudd-O Make the script executable after the rule is done, delete the files u…
…pon clean
f867397
@Rudd-O Rudd-O Add identifyer for systemddir inside the generator script 737a2bc
@Rudd-O Rudd-O ZFS spec file updated to build systemd generators 7c62ff7
@Rudd-O Rudd-O MAkefile.am now distributes the README 1dc4ef2
@Rudd-O Rudd-O extensive documentation rewrite ccb1d9c
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master' e3b262b
@Rudd-O Rudd-O Added Makefile.in changes stemming from systemd addition f4256cc
@Rudd-O Rudd-O Adding the resulting configure script with extra systemd saucy sauce …
…to the repository
b6c0bcc
Contributor

Rudd-O commented Feb 1, 2012

The systemd support masks out the initscript, so if we do something important in the initscript, it might need to be moved to independent unit files that we must install. For example, the NFS exports should happen after all the relevant unit files have loaded, while the SMB exports should happen only after smb.service has successfully started.

Rudd-O and others added some commits Feb 1, 2012

@Rudd-O Rudd-O Add to-do note about late mounting of pools 0bfdae0
@Rudd-O Rudd-O Merge branch 'master' of git://github.com/behlendorf/zfs 2b04a74
@behlendorf @Rudd-O behlendorf Configure merged with the 3.3 test 8ce7476
@Rudd-O Rudd-O Systemd support altered to use .requires rather than .wants to follow…
… the policy established by Lennart Poettering about filesystems being required for local-fs.target to succeed
372b489
@Rudd-O Rudd-O Make it so the generated files by systemd support are ignored 00fbb11
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master'
Conflicts:
	configure
3a93033
@Rudd-O Rudd-O we use fstab-decode to decode escapes in filesystem now 7d1e262
@Rudd-O Rudd-O Remove addition of zpool.cache to the initramfs c5cda80
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master' cd7db78
@Rudd-O Rudd-O Makefile.in and configure changes using autogen.sh in my tree. e871a25
@Rudd-O Rudd-O Added support for exporting pools during shutdown 9d75454
@Rudd-O Rudd-O Revert "Remove addition of zpool.cache to the initramfs"
This reverts commit c5cda80.
7edb829
@Rudd-O Rudd-O Import pools one by one just in case importation of a pool fails, so …
…ZFS does not abort early. Some filesystems may be missing in that case, but at least the generator won't abort early leaving NO filesystems mounted.
38f300a
@Rudd-O Rudd-O Autodetect whether dracut shutdown hooks are available or not bfcf0f5
@Rudd-O Rudd-O Regenerate zpool.cache when generating initramfs 0aae20d
@Rudd-O Rudd-O Patch for grub2 to fix detection of ZFS root updated. It now uses the…
… real root file system taken from mtab. Not guaranteed to work on chroots unless mtab links to /proc/mounts.
3dd487c
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master'
Conflicts:
	config/ltmain.sh
	configure
1373f52
@Rudd-O Rudd-O Fix spurious printout and release of LUKS-backed block devices used b…
…y pools
c751844
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master'
Conflicts:
	configure
25fe901
@Rudd-O Rudd-O Support for auto-mounting /usr during Dracut (for systems with a sepa…
…rate /usr)
34b1495
@Rudd-O Rudd-O Added option to remount root file system without zfsutil option.
Helps with Fedora systemd boot on root ZFS.
7aa1dd2
@Rudd-O Rudd-O Systemd build fixes b30ca36
@Rudd-O Rudd-O Configure updated to include systemd compile-time flags 93230e3
@Rudd-O Rudd-O Updated documentation to reflect the fact that remount-rootfs.service…
… no longer errors out if the root file system isn't set to legacy.
a9cbbf1
@Rudd-O Rudd-O Fix support for auto-mounting /usr in initramfs -- wrong grep command…
… caused it to malfunction
71e6807
@Rudd-O Rudd-O Fedora 17 compatibility for Dracut d1ecba5
@Rudd-O Rudd-O Fedora 17 fix: if no var-run.mount or var-lock.mount are available, s…
…imply don't include them as dependencies of a putative /var ZFS file system
997fb8f
@Rudd-O Rudd-O Fedora 17 improvement: hide error message in grep trying to grep file…
… /usr/share/dracut/dracut-functions that no longer exists in F17
c8b5a40
@Rudd-O Rudd-O Merge remote-tracking branch 'zfsonlinux/master'
Conflicts:
	configure
af2a5ff
Owner

behlendorf commented Jul 23, 2012

I'm not opposed to merging this support. But there are a few requirements which need to be met first.

  1. Rebase. I need the patch rebased as a single commit, or small patch stack, against master. That makes it much easier to review.

  2. Testing. I don't have the time to rigorously test the proposed changes. But if they look good, and a few people seriously test them, then we get can it merged.

  3. Compatibility. Lastly we just need to make sure we don't accidentally break something which currently works today.

Contributor

Rudd-O commented Jul 23, 2012

I will be closing this pull request and opening a new one with the rebased changes.

Rudd-O closed this Jul 23, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment