Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: add new option to zfs snapshot subcommand to skip creation of zero sized snapshots #6041

Closed
mailinglists35 opened this issue Apr 19, 2017 · 24 comments
Labels
Type: Feature Feature request or new feature

Comments

@mailinglists35
Copy link

mailinglists35 commented Apr 19, 2017

Let's have this scenario

root@linux:~# zfs list -r -t all test
NAME           USED  AVAIL  REFER  MOUNTPOINT
test           126K  55.9M    23K  /test
test/fs1        46K  55.9M    23K  /test/fs1
test/fs1/fs2    23K  55.9M    23K  /test/fs1/fs2

root@linux:~# zfs snap -r test@snap1

root@linux:~# zfs list -r -t all test
NAME                 USED  AVAIL  REFER  MOUNTPOINT
test                 126K  55.9M    23K  /test
test@snap1              0      -    23K  -
test/fs1              46K  55.9M    23K  /test/fs1
test/fs1@snap1          0      -    23K  -
test/fs1/fs2          23K  55.9M    23K  /test/fs1/fs2
test/fs1/fs2@snap1      0      -    23K  -

root@linux:~# touch /test/fs1/file1

I would like to have a new option, let's call it -z for this example, that when used would produce the new situation:

root@linux:~# zfs snap -r -z test@snap2

root@linux:~# zfs list -r -t all test
NAME                 USED  AVAIL  REFER  MOUNTPOINT
test                 290K  55.7M    23K  /test
test@snap1              0      -    23K  -
test/fs1              59K  55.7M    23K  /test/fs1
test/fs1@snap1        13K      -    23K  -
test/fs1@snap2          0      -    23K  -
test/fs1/fs2          23K  55.7M    23K  /test/fs1/fs2
test/fs1/fs2@snap1      0      -    23K  -

Basically what I'd like to be able to ask 'zfs snapshot' when -z parameter is present is:

"before you create a snapshot, regardless if is requested recursively or not, check if any previous snapshot exists and if it does then check if the newly to-be-created snapshot would reference any data that the previous snapshot doesn't .

if it doesn't reference new data since previous snapshot, silently skip creation of the requested snapshot and report a successful operation"

@mailinglists35
Copy link
Author

because all existing snapshotting tools fail to be able to skip creation of zero sized snapshots and none are planning to do so :(

@DeHackEd
Copy link
Contributor

While I see the reasoning, I'm going to need an override. If you're trying to use zfs send -R to backup your pool (or subset thereof) and first run zfs snapshot -r to make the snapshots you may find yourself in a bad situation if snapshots were skipped as a result of this feature. Or even if you make a snapshot for immediate sending only to find it not created.

Please don't silently not do what the user expects to happen.

@mailinglists35
Copy link
Author

mailinglists35 commented Apr 19, 2017

I fail to see how this optional feature would affect existing workflows.
Like you said, if you run zfs snapshot -r then nothing is changed if this feature is available, as you would get a recursive snapshot for all datasets if the newly proposed option is omitted (which should be default)

@mailinglists35
Copy link
Author

Please don't silently not do what the user expects to happen

but the users expects that to happen! when the user explicitly asks for -z parameter, the user expects that zero sized snapshots are not created.

when the user does not ask for -z parameter, the snapshot is created and that is what the user is expecting.

@DeHackEd
Copy link
Contributor

Ugh, I missed the part where it's a -z parameter. I thought it was presented as being the new default behaviour. Sorry, it's been one of those mornings.

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Apr 21, 2017
@GregorKopka
Copy link
Contributor

While the created snapshots can be culled afterwards in case they're 'empty' (compared to the last one) this takes way longer (and also bloats zpool history) compared to be able to not take them at all. And even when the functionality can be scripted (as mentioned) I suppose a -z option would have a way better runtime.

So for me this would be a very good improvement as it would massively cut down the amount of redundant snapshots, at least on most of the systems I maintain, so I could retire the script (and the related overhead) that currently has to remove them.

@loli10K
Copy link
Contributor

loli10K commented Feb 11, 2018

This can now be achieved using ZFS Channel Programs (zfs program) which was recently merged: should we close this?

@mailinglists35
Copy link
Author

mailinglists35 commented Feb 11, 2018

i tried reading about zcp but i find it very hard to understand what it really does and how can it help for this issue.
could you post an ELI5 example how can this be achieved using zcp?
thanks!

@loli10K
Copy link
Contributor

loli10K commented Feb 11, 2018

The following ZCP script was briefly tested on a throwaway VM, feel free to adapt it to your environment:

arg = ...
argv = arg["argv"]
root = argv[1]
name = argv[2]
function snapshot_r(root, name)
   for child in zfs.list.children(root) do
      snapshot_r(child, name)
   end
   if (zfs.get_prop(root , "written") > 0) then
      zfs.sync.snapshot(root .. "@" .. name)
   end
end
snapshot_r(root, name)

Test output (snapshot is taken only the first time because written > 0):

root@linux:~# truncate -s 512m /var/tmp/file1
root@linux:~# zpool create $POOLNAME -O mountpoint=/$POOLNAME -f /var/tmp/file1
root@linux:~# for i in {1..2}; do
>    zfs create $POOLNAME/$i
>    for j in {1..2}; do
>       zfs create $POOLNAME/$i/$j
>    done
> done
root@linux:~# zfs list -r $POOLNAME -t all
NAME           USED  AVAIL  REFER  MOUNTPOINT
testpool       256K   368M    24K  /testpool
testpool/1      72K   368M    24K  /testpool/1
testpool/1/1    24K   368M    24K  /testpool/1/1
testpool/1/2    24K   368M    24K  /testpool/1/2
testpool/2      72K   368M    24K  /testpool/2
testpool/2/1    24K   368M    24K  /testpool/2/1
testpool/2/2    24K   368M    24K  /testpool/2/2
root@linux:~# zfs get written -r $POOLNAME
NAME          PROPERTY  VALUE    SOURCE
testpool      written   24K      -
testpool/1    written   24K      -
testpool/1/1  written   24K      -
testpool/1/2  written   24K      -
testpool/2    written   24K      -
testpool/2/1  written   24K      -
testpool/2/2  written   24K      -
root@linux:~# cat > zfs_snapshot-r_written_only.zcp <<'EOF'
> arg = ...
> argv = arg["argv"]
> root = argv[1]
> name = argv[2]
> function snapshot_r(root, name)
>    for child in zfs.list.children(root) do
>       snapshot_r(child, name)
>    end
>    if (zfs.get_prop(root , "written") > 0) then
>       zfs.sync.snapshot(root .. "@" .. name)
>    end
> end
> snapshot_r(root, name)
> EOF
root@linux:~# zfs program $POOLNAME zfs_snapshot-r_written_only.zcp -- $POOLNAME snap-$(date +%s)
Channel program fully executed with no return value.
root@linux:~# zfs program $POOLNAME zfs_snapshot-r_written_only.zcp -- $POOLNAME snap-$(date +%s)
Channel program fully executed with no return value.
root@linux:~# zfs program $POOLNAME zfs_snapshot-r_written_only.zcp -- $POOLNAME snap-$(date +%s)
Channel program fully executed with no return value.
root@linux:~# zfs program $POOLNAME zfs_snapshot-r_written_only.zcp -- $POOLNAME snap-$(date +%s)
Channel program fully executed with no return value.
root@linux:~# zfs list -r $POOLNAME -t all
NAME                           USED  AVAIL  REFER  MOUNTPOINT
testpool                       264K   368M    24K  /testpool
testpool@snap-1518347971         0B      -    24K  -
testpool/1                      72K   368M    24K  /testpool/1
testpool/1@snap-1518347971       0B      -    24K  -
testpool/1/1                    24K   368M    24K  /testpool/1/1
testpool/1/1@snap-1518347971     0B      -    24K  -
testpool/1/2                    24K   368M    24K  /testpool/1/2
testpool/1/2@snap-1518347971     0B      -    24K  -
testpool/2                      72K   368M    24K  /testpool/2
testpool/2@snap-1518347971       0B      -    24K  -
testpool/2/1                    24K   368M    24K  /testpool/2/1
testpool/2/1@snap-1518347971     0B      -    24K  -
testpool/2/2                    24K   368M    24K  /testpool/2/2
testpool/2/2@snap-1518347971     0B      -    24K  -
root@linux:~# zfs get written -r $POOLNAME
NAME                          PROPERTY  VALUE    SOURCE
testpool                      written   0        -
testpool@snap-1518347971      written   24K      -
testpool/1                    written   0        -
testpool/1@snap-1518347971    written   24K      -
testpool/1/1                  written   0        -
testpool/1/1@snap-1518347971  written   24K      -
testpool/1/2                  written   0        -
testpool/1/2@snap-1518347971  written   24K      -
testpool/2                    written   0        -
testpool/2@snap-1518347971    written   24K      -
testpool/2/1                  written   0        -
testpool/2/1@snap-1518347971  written   24K      -
testpool/2/2                  written   0        -
testpool/2/2@snap-1518347971  written   24K      -
root@linux:~# 

@GregorKopka
Copy link
Contributor

What is the exact meaning of

Channel programs may only be run with root privileges.

from https://illumos.org/man/1M/zfs-program ?

In case I understand it correnctly: Having -z as an option to the normal zfs snapshot command would allow users to execute it (if zfs allow'ed) - while the channel program workaround is limited to root... ?

@loli10K
Copy link
Contributor

loli10K commented Feb 11, 2018

@GregorKopka exactly.

@mailinglists35
Copy link
Author

if (zfs.get_prop(root , "written") > 0) then
zfs.sync.snapshot(root .. "@" .. name)

hm, so what is the effort difference between above and the following code in a modified zfs-auto-snapshot script? the end user still has to verify if the written property is non-zero. so what's the benefit of zfs program vs the already working modified zfs-auto-snapshot?

+                # Check if size check is > 0
+                size_check_skip=0
+                if [ "$opt_min_size" -gt 0 ]
+                then
+                        bytes_written=`zfs get -Hp -o value written $ii`
+                        kb_written=$(( $bytes_written / 1024 ))
+                        if [ "$kb_written" -lt "$opt_min_size" ]
+                        then
+                                size_check_skip=1
+                                if [ $opt_verbose -gt 0 ]
+                                then
+                                        echo "Skipping target $ii, only $kb_written kB written since last snap. opt_min_size is $opt_min_size"
+                                fi
+                        fi
+                fi
+
+                if [ -n "$opt_do_snapshots" -a "$size_check_skip" -eq 0 ]

my issue was request about simplifying the user effort with a zfs command with a new parameter... that would be very simple for the end user... the recommened approach is no less difficult than the code path in zfs-auto-snapshot...

@GregorKopka
Copy link
Contributor

@loli10K then the channel programs approach dosn't solve it as it dosn't support delegation to normal users (which the original request would). While technically impressive it dosn't help a normal user.

The workaround with pruning a zfs list to create a zfs snapshot command line in userland (plenty of parsing in both directions) seems to be way more costly, plus I suspect there are limits on the length of the command line of a zfs snapshot invocation (while no real limit on the amount of datasets exist, apart from space in the pool), so my guess is that there's a limit on the amount of snapshots that can be created within one atomic zfs snapshot operation (unless using -r, which defeats the point).

Thus I support it as a flag, or through root being able to prepare channel programs that can then be called by normal users (with at least checking their allow'ed abilities vs. the sub-operations taken by the channel progam). A flag would be more practical, imho.

@loli10K
Copy link
Contributor

loli10K commented Feb 11, 2018

so what is the effort difference between above and the following code in a modified zfs-auto-snapshot script? the end user still has to verify if the written property is non-zero. so what's the benefit of zfs program vs the already working modified zfs-auto-snapshot?

From zfs-program(8):

The ZFS channel program interface allows ZFS administrative operations to be run programmatically as a Lua script. The entire script is executed atomically, with no other administrative operations taking effect concurrently. A library of ZFS calls is made available to channel program scripts. Channel programs may only be run with root privileges.

The bold part is the important one.

From my perspective zfs snapshot -z is more difficult to implement (requires both cli args parsing and libzfs modifications) and less powerful than ZCP (would be limited to zero-sized snapshots): given we now have zfs program available i find it difficult to justify the need to work on adding more, less powerful, options to zfs snapshot, but since it doesn't seem to address your specific use case feel free to open a pull request.

As a side now, keep in mind this option would probably prevent sending snapshots with the -R option since 6635624 (#5196).

@GregorKopka
Copy link
Contributor

@kpande Could you please elaborate why this was closed?

@GregorKopka
Copy link
Contributor

@kpande Unless I missed the ZoL release that enabled running ZCP without root privileges: please reopen.

@GregorKopka
Copy link
Contributor

Might be a separate issue, but without being able to delegate to non-root users this isn't solved.

@mailinglists35

This comment has been minimized.

@mailinglists35

This comment has been minimized.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

7 participants
@behlendorf @GregorKopka @DeHackEd @mailinglists35 @loli10K and others