@behlendorf behlendorf released this Jul 26, 2017 · 1732 commits to master since this release

Assets 5

New Features

  • Resumable zfs send/receive - Allow an interrupted zfs receive to be resumed if the stream was prematurely terminated (e.g. due to remote system or network failure).

  • Compressed zfs send/receive - Use the zfs send -c option to directly send the compressed data in the ARC or on-disk to another pool without needing to decompress it.

  • Multiple Import Protection - Prevents a shared pool in a fail-over configuration from being imported on different hosts at the same time. When the multihost pool property is on, perform an activity check prior to importing the pool to verify it is not in use.

  • Customized zpool iostat|status columns - Additional columns can be added to the zpool iostat and zpool status output to show more information. Several useful scripts are provided which can report drive temperature, SMART data, enclosure LED status, and more. Administrators and users can add additional scripts to meet their needs.

  • Latency and request size histograms - Use the zpool iostat -l option to show on-the-fly latency stats and zpool iostat -w to generate a histogram showing the total latency of each IO. The zpool iostat -r option can be used to show the size of each IO. These statistics are available per-disk to aid in finding misbehaving devices.

  • Scrub Pause - The zpool scrub -p option can be used to pause/resume an active scrub without having to cancel it.

  • Delegations - The zfs allow and zfs unallow subcommands can be used to delegate ZFS administrative permissions for the file systems to non-privileged users.

  • Large dnodes - This feature improves metadata performance allowing extended attributes, ACLs, and symbolic links with long target names to be stored in the dnode. This benefits workloads such as SELinux, distributed filesystems like Lustre and Ceph, and any application which makes use of extended attributes.

  • User/group object accounting and quota - This feature adds per-object user/group accounting and quota limits to the existing space accounting and quota functionality. The zfs userspace and zfs groupspace subcommands have been extended to set quota limits and report on object usage.

  • Cryptographic checksums - Stronger SHA-512, Skein, or Edon-R checksums are available.

  • JBOD Management

    • Automatic drive online - Newly detected devices which are determined to be part of an imported pool are automatically brought online.
    • Automatic drive replacement - When the autoreplace pool property is on, any new device found in the same physical location as a device that previously belonged to the pool, is automatically formatted and replaced.
    • Automatic hot spares - When a device is faulted start a rebuild to a hot-spare device if available.
    • Fault LEDs - Set the fault LED for a device when it's faulted, clear it when it has been replaced.
    • Drive health monitoring - Automatically fault a device when an excessive number of read, write, or checksum errors are detected.
    • Force fault - Use zpool offline -f to proactively fault a problematic device.
    • Multipath aware - Can be used with advanced multipath configurations.

Performance

  • ARC Buffer Data (ABD) - Allocates ARC data buffers using scatter lists of pages instead of virtual memory. This approach minimizes fragmentation on the system allowing for a more efficient use of memory. The reduced demand for virtual memory also improves stability and performance on 32-bit architectures.
  • Compressed ARC - Cached file data is compressed by default in memory and uncompressed on demand. This allows for an larger effective cache which improves overall performance.
  • Vectorized RAIDZ - Hardware optimized RAIDZ which reduces CPU usage.
    Supported SIMD instructions: sse2, ssse3, avx2, avx512f, and avx512bw, neon, neonx2
  • Vectorized checksums - Hardware optimized Fletcher-4 checksums which reduce CPU usage.
    Supported SIMD instructions: sse2, ssse3, avx2, avx512f, neon
  • GZIP compression offloading - Hardware optimized GZIP compression offloading with QAT accelerator.
  • Metadata performance - Overall improved metadata performance. Optimizations include a multi-threaded allocator, batched quota updates, improved prefetching, and streamlined call paths.
  • Faster RAIDZ resilver - When resilvering RAIDZ intelligently skips sections of the device which don't need to be rebuilt.

Changes in Behavior

  • Non-privileged users are allowed to run zpool list, zpool iostat, zpool status, zpool get, zfs list, and zfs get. These commands no longer need to be added to the /etc/sudoers file.
  • The permissions of the /dev/zfs device have changed from 0600 to 0666 to let ZFS do access control in kernel space and make zfs allow and zfs unallow work properly. If you have been changing permissions / group owner of the device file yourself your change won't work correctly anymore and breaks proper behavior of zfs allow. From this release forward you should be able to satisfy your use-case with the officially supported zfs allow command.
  • By default task queues are now dynamic and worker threads will be created and destroyed as needed. This allows the system to automatically tune itself to ensure the optimal number of threads are used for the active workload which can result in a performance improvement.
  • Accessing snapshots over NFS now requires the crossmnt option be added to the /etc/exports file. The nfsd service is now aware that snapshots are different filesystems. A result of this change is that older distributions, like CentOS 6.x, can no longer provide access to snapshots over NFS.

Supported Kernels

  • Compatible with 2.6.32 - 4.12 Linux kernels.

Module Options

  • The default values for the module options were selected to yield good performance for the majority of workloads and configurations. They should not need to be tuned for most systems but are available for performance analysis and tuning. See the zfs-module-parameters(5) man page for a more complete description of the options and what they control.
  • Added:
    • dbuf_cache_hiwater_pct - Percent over dbuf_cache_max_bytes when dbufs must be evicted
    • dbuf_cache_lowater_pct - Percent below dbuf_cache_max_bytes when dbufs stop being evicted
    • dbuf_cache_max_bytes - Maximum size in bytes of the dbuf cache
    • dbuf_cache_max_shift - Cap the size of the dbuf cache to a log2 fraction of arc size
    • dmu_object_alloc_chunk_shift - CPU-specific allocator grabs 2^N objects at once
    • send_holes_without_birth_time - Ignore hole_birth txg for zfs send
    • zfetch_max_distance - Max bytes to prefetch per stream
    • zfs_abd_scatter_enabled - Toggle whether ABD allocations must be linear
    • zfs_abd_scatter_max_order - Maximum order allocation used for a scatter ABD
    • zfs_arc_dnode_limit - Minimum bytes of dnodes in ARC
    • zfs_arc_dnode_limit_percent - Percent of ARC meta buffers for dnodes
    • zfs_arc_dnode_reduce_percent - Percentage of excess dnodes to try to unpin
    • zfs_arc_meta_limit_percent - Percent of arc size for arc meta limit
    • zfs_arc_pc_percent - Percent of pagecache to reclaim ARC to
    • zfs_compressed_arc_enabled - Disable compressed arc buffers
    • zfs_deadman_checktime_ms - Dead I/O check interval in milliseconds
    • zfs_delete_blocks - Delete files larger than N blocks asynchronously
    • zfs_dmu_offset_next_sync - Enable forcing txg sync to find holes
    • zfs_free_bpobj_enabled - Enable processing of the free_bpobj
    • zfs_metaslab_segment_weight_enabled - Enable segment-based metaslab selection
    • zfs_metaslab_switch_threshold - Metaslab selection max buckets before switching
    • zfs_multihost_fail_intervals - Max allowed period without a successful mmp write
    • zfs_multihost_history - Historical statistics for last N multihost writes
    • zfs_multihost_import_intervals - Number of zfs_multihost_interval periods to wait for activity
    • zfs_multihost_interval - Milliseconds between mmp writes to each leaf
    • zfs_multilist_num_sublists - Number of sublists used in each multilist
    • zfs_per_txg_dirty_frees_percent - Percentage of dirtied blocks from frees in one TXG
    • zfs_sync_taskq_batch_pct - Percentage of CPUs to run an IO worker thread
    • zfs_vdev_mirror_non_rotating_inc - Non-rotating media load increment for non-seeking I/O's
    • zfs_vdev_mirror_non_rotating_seek_inc - Non-rotating media load increment for seeking I/O's
    • zfs_vdev_mirror_rotating_inc - Rotating media load increment for non-seeking I/O's
    • zfs_vdev_mirror_rotating_seek_inc - Rotating media load increment for seeking I/O's
    • zfs_vdev_mirror_rotating_seek_offset - Offset in bytes from the last I/O to trigger seek increment
    • zfs_vdev_queue_depth_pct - Queue depth percentage for each top-level vdev
    • zfs_vdev_raidz_impl - Select RAIDZ implementation.
    • zil_slog_bulk - Limit in bytes slog sync writes per commit
    • zio_dva_throttle_enabled - Throttle block allocations in the ZIO pipeline
    • zvol_request_sync - Synchronously handle bio requests
    • zvol_threads - Max number of threads to handle I/O requests
    • zvol_volmode - Default volmode property value
    • spl_max_show_tasks - Max number of tasks shown in taskq proc
    • spl_panic_halt - Cause kernel panic on assertion failures
  • Removed:
    • l2arc_nocompress - Skip compressing L2ARC buffers
    • zfetch_block_cap - Max number of blocks to fetch at a time
    • zfs_arc_num_sublists_per_state - Number of sublists used in each of the ARC state lists
    • zfs_disable_dup_eviction - Disable duplicate buffer eviction
    • zfs_vdev_mirror_switch_us - Switch mirrors every N microseconds
    • zil_slog_limit - Max commit bytes to separate log device
  • Changed:
    • zfs_admin_snapshot - Enable mkdir/rmdir/mv in .zfs/snapshot