Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatic conflict resolution #408

Open
2 tasks
problame opened this issue Jan 3, 2021 · 2 comments
Open
2 tasks

automatic conflict resolution #408

problame opened this issue Jan 3, 2021 · 2 comments
Projects

Comments

@problame
Copy link
Member

problame commented Jan 3, 2021

zrepl should be able to resolve replication conflicts automatically.

Background / Use Cases

Replication conflicts are situations where there is something on the receiving side that prevents replication of one or more file systems from the sender to the receiver.
Examples and use cases:

UX

  • zrepl should continue to opt for non-destructive-by-default behavior
  • Renames should be detected automatically, e.g. by tracking GUIDs that are invariant to replication cc Handle filesystem destroy / rename #70.
    Renames are thus not actually part of the conflict resolution UX.
  • With the daemon, no user interaction is required for conflict resolution.
  • Instead, a conflict resolution policy defined in the config is applied automatically to all filesystems matched by a job.
  • If the conflict resolution gives up, the user is informed via logs, metrics and zrepl status.
  • Optional The zrepl signal (or similar) command allows the user to override the conflict resolution policy on a per-filesystem basis for the next replication attempt.

Config Syntax:

type: push # active side
...
conflict_resolution:

  # what to do if the receiver diverged and we can't replicate:
  receiver_diverged:

    # if there are (only) local modifications but no snapshots:
    unsnapshotted_modifications:
      # Policy Option #1: fail filesystems affected by this
      policy:
        type: fail
      # Policy Option #2: rollback filesystem to latest snapshot
      policy:
        type: rollback_to_latest_snapshot
        max_written: 10MiB # max value for the `written` zfs property of the dataset (fail otherwise)

    # if there are snapshots on the receiver that prevent replication
    receiver_only_snapshots:
      # Policy Option #1: fail filesystems affected by this
      policy:
        type: fail

      # Policy Option #2: destroy the receiver-only snapshots that prevent replication
      policy:
        type: destroy
        max_time: 1d # the max time window that is allowed to be rolled
        max_count: 10 # the max number of snapshots that are allowed to be rolled back. -1 for inifinity, 0 is invalid
        snapshots: # regex that match the snapshot names (NOTE: syntax should be the same as for snapshot filtering)
          - regex: "^zrepl_.*"
            action: rollback
          - regex: ".*"
            action: prohibit

      # Policy Option #3: rename the conflicting dataset and start over using initial replication
      policy:
        type: conflict_copy
        clone: # whether to try cloning instead of initial replication if there is still a common snapshot between sender and receiver

          # Policy Option #1: don't clone
          type: no

          # Policy Option #2: try it within some limits, otherwise
          type: try
          max_time: 1d # the max time window for which we should consider cloning
          max_count: 10 # the max number of snapshots that we should go back before giving up on trying to clone
          max_written: 1GiB # max value for the written@ property between the clone origin candidate
          otherwise: fail | copy # copy does full initial replication

Security Considerations

  • Future use cases such as append-only sink cc @mdtancsa
    • What could an attacker accomplish if they take over the sender but not the receiver? Our goal should be that the conflict resolution can be configured such that the attacker cannot provoke deletion of snapshots through the conflict resolution policy.

Implementation

  • Open question, let's discuss this after we have settled on features.
@problame
Copy link
Member Author

Additional case

type: push # active side
...
conflict_resolution:
  receiver_diverged:
    # if there are no common snapshots between sender and receiver
    no_common_snapshots:
      # Option 1: fial
      policy:
        type: fail
      # Option 2
      policy:
        type: conflict_copy
        ...
      # Option 3
      policy:
        type: destroy #perhaps 'replace' would be a better name?

@problame
Copy link
Member Author

Additional use case

type: push
...
conflict_resultion:
  # what should happen if there is no corresponding receive-side file system
  initial_replication:
    # Option 1: fail
    # Option 2: send most recent snapshot (current behavior)
    # Option 3: send all snapshots on the sender

grahamc added a commit to grahamc/zrepl that referenced this issue Apr 15, 2021
…hots

By configuring your push or pull job with:

```yaml
conflict_resolution:
  initial_replication:
    replicate_all_snapshots: true
```

when a dataset is transfered for the first time, all snapshots on
that dataset will be transfered. This is disabled by default, and
only the most recent snapshot will be transfered.

Based on the names in zrepl#408 (comment)
though the more complicated options suggested in that issue are
more Go than I know.
grahamc added a commit to grahamc/zrepl that referenced this issue Apr 15, 2021
…hots

By configuring your push or pull job with:

```yaml
conflict_resolution:
  initial_replication:
    replicate_all_snapshots: true
```

when a dataset is transfered for the first time, all snapshots on
that dataset will be transfered. This is disabled by default, and
only the most recent snapshot will be transfered.

Based on the names in zrepl#408 (comment)
though the more complicated options suggested in that issue are
more Go than I know.
grahamc added a commit to grahamc/zrepl that referenced this issue Apr 15, 2021
```yaml
conflict_resolution:
  initial_replication:
    replicate_all_snapshots: true
```

when a dataset is transfered for the first time, all snapshots on
that dataset will be transfered. This is disabled by default, and
only the most recent snapshot will be transfered.

Based on the names in zrepl#408 (comment)
though the more complicated options suggested in that issue are
more Go than I know.
grahamc added a commit to grahamc/zrepl that referenced this issue Apr 15, 2021
```yaml
conflict_resolution:
  initial_replication:
    replicate_all_snapshots: true
```

when a dataset is transfered for the first time, all snapshots on
that dataset will be transfered. This is disabled by default, and
only the most recent snapshot will be transfered.

Based on the names in zrepl#408 (comment)
though the more complicated options suggested in that issue are
more Go than I know.
problame pushed a commit that referenced this issue Apr 10, 2022
```yaml
conflict_resolution:
  initial_replication:
    replicate_all_snapshots: true
```

when a dataset is transfered for the first time, all snapshots on
that dataset will be transfered. This is disabled by default, and
only the most recent snapshot will be transfered.

Based on the names in #408 (comment)
though the more complicated options suggested in that issue are
more Go than I know.
problame pushed a commit that referenced this issue Apr 10, 2022
```yaml
conflict_resolution:
  initial_replication:
    replicate_all_snapshots: true
```

when a dataset is transfered for the first time, all snapshots on
that dataset will be transfered. This is disabled by default, and
only the most recent snapshot will be transfered.

Based on the names in #408 (comment)
though the more complicated options suggested in that issue are
more Go than I know.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant