Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to convert step-holds to step-bookmarks in the pruner when it tries to destroy a step-held snapshot #288

Open
JMoVS opened this issue Mar 17, 2020 · 22 comments

Comments

@JMoVS
Copy link
Contributor

JMoVS commented Mar 17, 2020

As requested on IRC, here's my use case:

I have a laptop with an internal disk and zpool. This zpool is zrepl configured as a push job, pushing to my USB connected 2-way mirror zpool. Now I primarily care about the fact that incremental backup is always possible, so that means that a bookmark from which to resume from is always available.

With the new holding logic, I would need the pruner to also clear all holds (as a setting).

The reason is as follows:
If replication happens, it creates the holds. Now when I disconnect and export the external mirror zpool, zrepl will still hold those snapshots. I have configured zrepl to only ever hold 20 snapshots on my laptop to conserve space. And it happens that I'm away for a week or two.

In this use case, I would prefer for zrepl to release the hold, destroy the snapshots and then when the mirror targt pool becomes availble again, delete the resumable send state and send an incremental backup from the bookmark. This way, my local machine would always only ever have 20 snapshots.

@problame problame added this to the 0.3 milestone Mar 18, 2020
@problame
Copy link
Member

I figure the best way to address this need is an option on the pruner that converts any existing replication holds to bookmarks. Converting means

  • create a step-bookmark of the step-hold's snapshot
  • release the hold
  • retry the prune-operation
  pruning:
    hold_handling:
      step_holds: convert_to_step_bookmark
    keep_sender: ...
    keep_receiver: ...

Note that the last_received_hold is on the receiving side and cannot be converted to a bookmark because it is used as from for subsequent replications.

@problame
Copy link
Member

Would this address your needs?
(note to myself: rename the issue once it is clear that above proposal is what we want)

@JMoVS
Copy link
Contributor Author

JMoVS commented Mar 18, 2020

it is used as from and it might also simply be not accessible and offline by the time. Now is it possible to resume a resumable send from a bookmark?

@problame
Copy link
Member

it is used as from and it might also simply be not accessible and offline by the time.
What is the it you are referring to there?

Now is it possible to resume a resumable send from a bookmark?
Yes, even if the transfer that got interrupted used a snapshot as from.
(The resume token just contains the fromguid, the resume-send code looks up snapshots, then bookmarks with that guid).

@JMoVS
Copy link
Contributor Author

JMoVS commented Mar 22, 2020

Hmm ok, if resuming a send from a bookmark works, then your strategy of creating a bookmark for every hold and then releasing that hold, enabling the pruner to delete the snapshot would solve this problem. It might even make sense to make this the default option (I don't really see any downsides).

"it" was referring to snapshots/resume tokens that are residing on the sink pool. As I regularly detach my sink pool, "it" would not be accessible most of the time.

@problame problame changed the title Use case and the need for pruning to destroy snapshots that have not been fully sent option to convert step-holds to step-bookmarks in the pruner when it tries to destroy a step-held snapshot Mar 24, 2020
@problame
Copy link
Member

One more idea about this feature, but possibly, this is a separate feature:
Why can't we just use deferred destroys (zfs destroy -d)?

The implementation would

  • filter out all snapshots that have defer_destroy=on before evaluating the keep filters.
  • run zfs destroy -d on those snapshots that are not kept
  • be done.

zrepl only ever keeps at most two holds per send-side dataset.
Any other situation is a (separate) bug in zrepl.


But, if I understand you correctly, deferred destroys wouldn't address your problem, since you don't want the snapshot to stick around any longer, right?

@JMoVS
Copy link
Contributor Author

JMoVS commented Mar 26, 2020

zrepl currently holds like 200 snapshots per dataset for me currently. So this might also be a separate bug.

But yes, you understood correctly, I effectively want to make an "offsite" backup which I connect to semi-regularly and because I only connect semiregularly, I don't want to waste the space on my main machine. That's why I want the snapshots to be destroyed and only bookmarks to be left.

@problame
Copy link
Member

All right, wrt to the 200 holds, keep an eye on #282 - I'll be pushing some code there soon, although @InsanePrawn 's WIP implementation in that branch already works.

@problame
Copy link
Member

@JMoVS in #293, there are now subcommands for removing stale holds zrepl holds release-stale.

There should also be some minor performance improvements (less zfs CLI invocations).

Would you mind testing that PR so we can get it merged soon? Please provide any feedback on the PR in the comments there, unless it relates to this issue.

@problame
Copy link
Member

problame commented May 1, 2020

We discussed this in voice chat tonight:

  • Absence of keep: not_replicated should be taken literally: no exceptions for snapshots that are not replicated, including those that are part of an ongonig send and thus have a step hold. Thus, in absence of keep: not_replicated, we should convert the step holds into step bookmarks, regardless of whether initial or incremental send. It's what the user wants if they don't specify this option.

  • Introduce keep: step_holds which keeps step holds (and step bookmarks, but we don't prune those).

    • Step holds are per definition included in `keep: not_replicated since a step hold is always younger than the replication cursor used to determine whether a snapshot is replicated or not.
    • Howeve,r this keep rule can be specified separately as well if a user just wants to keep step holds for resumability, but avoid stacking up an infinite amount of snapshots on the sender as they'd do with keep: not_replicated.
  • Boils down to the following decision tree that should be part of the docs:

<JMoVS> Decision tree:
<JMoVS> Q1: Do you want ALL SNAPSHOTS that are created ON YOUR SOURCE to ALL end up on the destination?
<JMoVS> => Activate keep-non-replicated and STOP here
<JMoVS> Q2: Do you want backups started to resume at all times? (eg spotty network connection) This means that we would keep the snapshot that is currently replicated on the source and resume the backup when the connection is available again.
<JMoVS> => Activate keep-step-holds and STOP here
<JMoVS> Q3: You really don't care and just want the last 20 snapshots to be retained on your source no matter what?
<JMoVS> => Just leave the config as is without any of the other options mentioned above

@problame
Copy link
Member

problame commented May 1, 2020

To clarify: the current situation is a bug: if "keep-non-replicated" is not set, step-holds created by zrepl must be destroyed by zrepl, because the user really didn't want snapshots to accumulate on the sending side.

A single hold violating this policy might not be all that bad, but hold leaks such as #316 exacerbate the problem.

@InsanePrawn
Copy link
Contributor

I think ideally we need something better than 'keep all not replicated' snapshots: a different set of pruning rules.
(Otherwise a remote down for maintenance might cause massive amounts of snapshots to build up if short snap intervals are configured, which is not something zfs handles too well)

Also, not_replicated or whatever succeeds it should optionally take a list of JobIDs to support unconvential/unsupported setups, e.g. pruning in snap jobs or multiple jobs replicating from one source dataset.

@problame
Copy link
Member

problame commented May 2, 2020

(Otherwise a remote down for maintenance might cause massive amounts of snapshots to build up if short snap intervals are configured, which is not something zfs handles too well)

That's what a configuration that specifies keep: step_holds but no keep: not_replicated would solve AFAICT.

Also, not_replicated or whatever succeeds it should optionally take a list of JobIDs to support unconvential/unsupported setups, e.g. pruning in snap jobs or multiple jobs replicating from one source dataset.

  • Yes, that's very important, and the current state of master (i.e. per-job replication cursors) would indeed break existing snapjob + replication-only-job setups like yours.

@problame
Copy link
Member

problame commented May 7, 2020

Also, not_replicated or whatever succeeds it should optionally take a list of JobIDs to support unconvential/unsupported setups, e.g. pruning in snap jobs or multiple jobs replicating from one source dataset.

Yes, that's very important, and the current state of master (i.e. per-job replication cursors) would indeed break existing snapjob + replication-only-job setups like yours.

Huh, actually, I'm not sure how your (@InsanePrawn ) setup looks like. Which job does the pruning? The snap job or the replicating job?

@InsanePrawn
Copy link
Contributor

Which job does the pruning? The snap job or the replicating job?

Both. It depends. It's complicated!

My abbreviated experimental configs for the three instances I'm using to test the 0.3 work is this, as is, on disk right now. I've played with the interval lengths a couple of times, these might be stupid for a number of reasons, I can't remember.

node 1 -> node 2 -> node 3

  • node 1:
    • source job:
      • no pruning, snapshotting: manual
    • snap job:
      • 3m snap interval
      • prunes locally with a grid + keeps non-zrepl snaps
  • node 2:
    • pull (from node 1) job:
      • 5m interval
      • pruning: keep everything on the sender, grid on receiver aka locally (hint: keep=all for newest 30m to smooth races, my zfs is slow af) (matches all snapshots, not just zrepl_.*) <- could technically be split into a snap job I guess, but that's job-unaware right now, I see no benefit at the cost of added raciness.
    • source job:
      • snapshotting: "manual"
      • offers FS with snaps pulled from node 1
  • node 3:
    • pull (from node 2) job:
      • pruning: like 2: don't touch sender, just prune receiver with a grid, this grid is in much lower time resolution though, think second level backup. Could also be split out, but again, why?

(Yes, this replicates from the source to the primary backup and from the primary to the secondary. Some blogs might tell you that's a bad thing to do, because corruption occurring on the primary replicates onto the secondary, etc. Still unsure whether you can have one of those nasty uncorrectable blocks be successfully sent and received by ZFS.)
Now we're going a little OT, but if I were to make up one example for pruning rules optionally taking multiple JobIDs, it would help if you were to serve one source FS to a number of clients, such as your primary and secondary backup server[s?]. (Or maybe your replication strategy is 'a little more horizontally')
Then you might define a source job per client and keep those pruned Just Right by offloading their pruning to a central snap job that applies its specific rule sets for replicated and non-replicated snapshots each, accounting for all those JobIDs in Just The Right Way.
For the first iteration of this feature, form the semantic union of those jobs while evaluating keep rules.
Stretch goal for zrepl 2025.04 Cluster Edition: allow some form of logical AND and OR groupings between jobs? Complexity to the max!

@problame
Copy link
Member

@JMoVS : I was talking to @InsanePrawn yesterday and he suggested that, for the 0.3 release, it might be sufficient to have a per-job option that disables step holds for incremental sends:

jobs:
- type: push
  ...
  send:
    step_holds:
      disable_incremental: true # defaults to false

Suppose you yank the external drive during an incremental @from -> @to step:

  • restarting that step or future incrementals @from -> @to_later` will be possible
    • because the replication cursor bookmark points to @from until the step is complete
  • resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
    • in that case, the replication algorithm should determine that the resumable state on the receiving side is useless because @to no longer exists on the sending side, and consequently clear it, and restart an incremental step @from -> @to_later

Do you agree?
The advantage of this solution is that it doesn't require (substantial) changes to the pruner at this point, which I would appreciate given that I wanted to release 0.3 (= what is in problame/develop about two months ago :D )

@JMoVS
Copy link
Contributor Author

JMoVS commented Jun 1, 2020

yes, that would work I think. As long as at all times there will be a bookmark to increment from. ;-)

problame added a commit that referenced this issue Jun 1, 2020
This is a stop-gap solution until we re-write the pruner to support
rules for removing step holds.

Note that disabling step holds for incremental sends does not affect
zrepl's guarantee that incremental replication is always possible:

Suppose you yank the external drive during an incremental @from -> @to step:

* restarting that step or future incrementals @from -> @to_later` will be possible
  because the replication cursor bookmark points to @from until the step is complete
* resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
    * in that case, the replication algorithm should determine that the resumable state
      on the receiving side isuseless because @to no longer exists on the sending side,
      and consequently clear it, and restart an incremental step @from -> @to_later

refs #288
@problame
Copy link
Member

problame commented Jun 1, 2020

@JMoVS problame/develop bb9444a now contains a commit with the config syntax proposed above.
The new option has platformtest coverage and I tested your use-case manually.
I think it is safe to test on your setup.
Please report back ASAP whether the code works for you, I'll

  • document the option and use case then

EDIT
Please make sure to actually test the situation where you yank-out the HDD in the middle of replication, then see whether there are step holds or not (there shouldn't for incrementals). You can list all zfs abstractions on filesystem dataset/path with zrepl zfs-abstraction list --fs dataset/path:ok

@problame
Copy link
Member

problame commented Jun 1, 2020

Side note on the actual topic of this issue: the WIP branch problame/pruner_rewrite_step_holds contains a pruner rewrite that is

  • local to a given side of a replication setup
  • supports keep: step_holds
    • that keep rule should actually support an option younger_than: or sth for the following use case:
      • resumability is desirable, but only for a day or so, afterwards it'd be better to drop the hold and restart in the future
      • useful for the laptop + external HDD use case ("I plug in the HDD for backup every other day => set the option to 4 days as a compromise")
      • useful for flaky networks: replication every 4h, mid-way network failures 50% of the time => set it to 32h and have a greatly reduced probability of needed to re-start a transfer

problame added a commit that referenced this issue Jun 1, 2020
This is a stop-gap solution until we re-write the pruner to support
rules for removing step holds.

Note that disabling step holds for incremental sends does not affect
zrepl's guarantee that incremental replication is always possible:

Suppose you yank the external drive during an incremental @from -> @to step:

* restarting that step or future incrementals @from -> @to_later` will be possible
  because the replication cursor bookmark points to @from until the step is complete
* resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
    * in that case, the replication algorithm should determine that the resumable state
      on the receiving side isuseless because @to no longer exists on the sending side,
      and consequently clear it, and restart an incremental step @from -> @to_later

refs #288
problame added a commit that referenced this issue Jun 14, 2020
This is a stop-gap solution until we re-write the pruner to support
rules for removing step holds.

Note that disabling step holds for incremental sends does not affect
zrepl's guarantee that incremental replication is always possible:

Suppose you yank the external drive during an incremental @from -> @to step:

* restarting that step or future incrementals @from -> @to_later` will be possible
  because the replication cursor bookmark points to @from until the step is complete
* resuming @from -> @to will work as long as the pruner on your internal pool doesn't come around to destroy @to.
    * in that case, the replication algorithm should determine that the resumable state
      on the receiving side isuseless because @to no longer exists on the sending side,
      and consequently clear it, and restart an incremental step @from -> @to_later

refs #288
@problame problame modified the milestones: 0.3, 0.4 Jun 14, 2020
@problame problame modified the milestones: 0.4, 0.5 Feb 20, 2021
@candlerb
Copy link
Contributor

I'm not sure the problem I've just noticed is the same, but I'll report it here anyway in case.

Problem as seen by zrepl status (for several datasets, this is one example):

Pruning Sender:                                                                                           ▒║
...
│                   │║   zfs/lxd/containers/extproxy                      ERROR: destroys failed @zrepl_20220217_095249_000: dataset is busy

Reproducing the problem manually:

root@nuc2:~# zfs list -r -t snap zfs/lxd/containers/extproxy
NAME                                                    USED  AVAIL     REFER  MOUNTPOINT
zfs/lxd/containers/extproxy@zrepl_20220217_095249_000   241M      -      848M  -
zfs/lxd/containers/extproxy@zrepl_20220719_134614_000  75.7M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220720_134614_000  72.6M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220721_134614_000  55.9M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220722_134614_000  55.9M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220723_134614_000  4.11M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220724_074620_000  4.08M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220725_134614_000  56.8M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220726_134614_000  53.7M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220726_194614_000  1.16M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220727_014614_000  1.14M      -     1.09G  -
zfs/lxd/containers/extproxy@zrepl_20220727_074614_000  27.3M      -     1.09G  -
root@nuc2:~# zfs destroy zfs/lxd/containers/extproxy@zrepl_20220217_095249_000
cannot destroy snapshot zfs/lxd/containers/extproxy@zrepl_20220217_095249_000: dataset is busy

I found the clue to the solution here:

root@nuc2:~# zfs holds zfs/lxd/containers/extproxy@zrepl_20220217_095249_000
NAME                                                   TAG                         TIMESTAMP
zfs/lxd/containers/extproxy@zrepl_20220217_095249_000  zrepl_STEP_J_nuc2_storage1  Thu Feb 17 15:09 2022
root@nuc2:~# zfs release zrepl_STEP_J_nuc2_storage1 zfs/lxd/containers/extproxy@zrepl_20220217_095249_000
root@nuc2:~# zfs destroy zfs/lxd/containers/extproxy@zrepl_20220217_095249_000
root@nuc2:~#

I wondered if 17th Feb 2022 was around time time I last upgraded zrepl. But it was actually over a month earlier that I did so, according to /var/log/apt/history.log.6.gz:

Start-Date: 2022-01-09  13:11:44
Commandline: apt-get dist-upgrade
Upgrade: zrepl:amd64 (0.4.0, 0.5.0)
End-Date: 2022-01-09  13:11:50

@problame
Copy link
Member

@candlerb I think @JMoVS is asking for a feature, but your writing suggests that you think something is not working right. If so, what do you feel like is not working right? Step holds are an integral part of zrepl, see https://zrepl.github.io/configuration/overview.html#how-replication-works

@candlerb
Copy link
Contributor

candlerb commented Aug 7, 2022

I think something's not working right: a stale hold was left in place for an ancient (17 Feb 2022) snapshot, meaning that snapshot pruning gave an error when trying to destroy the snapshot, so this snapshot was left around indefinitely.

NAME                                                    USED  AVAIL     REFER  MOUNTPOINT
zfs/lxd/containers/extproxy@zrepl_20220217_095249_000   241M      -      848M  -
                                  ^^^^^^^^
zfs/lxd/containers/extproxy@zrepl_20220719_134614_000  75.7M      -     1.09G  -
...

I don't know the reason why this hold was left there. Perhaps zrepl was prematurely aborted mid-replication. However, in such situations, I would expect it to be picked up and removed either by a subsequent replication, or by the snapshot pruning process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants