Skip to content

history_event-zfs-list-cacher.sh not skipping receives on existing dataset #17451

@RobertMe

Description

@RobertMe

System information

Type Version/Name
Distribution Name Debian
Distribution Version 12 / Bookworm
Kernel Version 6.1.0-37-amd64 (/6.1.140-1)
Architecture x86_64
OpenZFS Version 2.2.5 (from backports with a hold)

Describe the problem you're observing

When send & receiving (a lot) of datasets to update snapshots there are (a lot) of invocations to history_event-zfs-list-cacher.sh leading to (a lot) of zfs list ... invocations as started by the script. But to me it seems like these are not needed. The purpose of the zed script is to rebuild the filesystem cache file. And this script already tries hard to minimize the number of times it does something, by only continuing on certain event types, and when possible also filtering by the info of the events (i.e. if it's set or inherit it checks the related property whether it's one which is used in the cache file). Furthermore it also rules out any events on snapshots, based on ZEVENT_HISTORY_DSNAME having a @ i.e. snapshot separator in its value.
But my observation is that on receiving snapshots (to an existing dataset) it does not contain a snapshot name in history_dstname. Thus leading to the full "cacher" script being executed (thus including the zfs list ... call which seems to be rather slow / CPU heavy on my system).

As an example this is a finish receiving event I'm seeing which the "cacher" would act on:

Jun 10 2025 19:05:51.578452914 sysevent.fs.zfs.history_event
        version = 0x0
        class = "sysevent.fs.zfs.history_event"
        pool = "tank"
        pool_guid = 0x22115a8ac87b45e2
        pool_state = 0x0
        pool_context = 0x0
        history_hostname = "server"
        history_dsname = "tank/backup/local/rpool/home/root/%recv"
        history_internal_str = "snap=pyznap_2025-06-10_18:45:03_frequent"
        history_internal_name = "finish receiving"
        history_dsid = 0xbd9d
        history_txg = 0x16f1886
        history_time = 0x684865ef
        time = 0x684865ef 0x227a7db2 
        eid = 0xd61e8

My proposal would thus be to stop execution of the "cacher" script when history_dsname (possibly combined with history_internal_str?) lead to the conclusion of being a send to an existing dataset.

Describe how to reproduce the problem

  1. Open a process monitor and optionally filter on "zfs list"
  2. (Side by side) send a snapshot to an existing destination location

Expected result is that the process monitor does not show any executions of "zfs list", thus skipping the regeneration of the cache file.

Include any warning/errors/backtraces from the system logs

-

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions