Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix handling of tenative cursor presence if protection strategy doesn't use it #714

Merged

Conversation

problame
Copy link
Member

@problame problame commented Jun 6, 2023

Before this PR, we would panic in the check phase of endpoint.Send()'s TryBatchDestroy call in the following cases: the current protection strategy does NOT produce a tentative replication cursor AND

  • FromVersion is a tentative cursor bookmark
  • FromVersion is a snapshot, and there exists a tentative cursor bookmark for that snapshot
  • FromVersion is a bookmark != tentative cursor bookmark, but there exists a tentative cursor bookmark for the same snapshot as the FromVersion bookmark

In those cases, the check concluded that we would delete FromVersion.
It came to that conclusion because the tentative cursor isn't part of obsoleteAbs if the protection strategy doesn't produce a tentative replication cursor.

The scenarios above can happen if the user changes the protection strategy from "with tentative cursor" to one "without tentative replication cursor", while there is a tentative replication cursor on disk.
The workaround was to rename the tentative cursor.

In all cases above, TryBatchDestroy would have destroyed the tentative cursor.

In case 1, that would fail the Send step and potentially break replication if the cursor is the last common bookmark. The check conclusion was correct.

In cases 2 and 3, deleting the tentative cursor would have been fine because FromVersion was a different entity than the tentative cursor. So, destroying the tentative cursor would be the right call.

The solution in this PR is as follows:

  • add the FromVersion to the liveAbs set of live abstractions
  • rewrite the check closure to use the full dataset path (fullpath) to identify the concrete ZFS object instead of the zfs.FilesystemVersionEqualIdentity, which is only identified by matching GUID.
    • Holds have no dataset path and are not the FromVersion in any case, so disregard them.

fixes #666

The new test `ReplicationIncrementalHandlesFromVersionEqTentativeCursorCorrectly` fails with

	panic: impl error: "#zrepl_CURSORTENTATIVE__G_15cbdccbc1f99b2f_J_sender-job" would be destroyed because it is considered stale but it is part of of sendArgs=zfs.ZFSSendArgsValidated

The updated existing test `ReplicationIncrementalCleansUpStaleAbstractionsWithCacheOnSecondReplication`, which, despite what its name suggests, invalidates the cache, fails with

	panic: impl error: "@3" would be destroyed because it is considered stale but it is part of of sendArgs=zfs.ZFSSendArgsValidated

The test that doesn't invalidate the cache doesn't fail.
That makes sense because without the cursor in the cache, we can't make the overly cautious decision that causes the panic.

 with '#' will be ignored, and an empty message aborts the commit.
@problame problame force-pushed the problame/iss-666-panic-multiple-abstractions-same-identity branch from 3e952c5 to ebaf976 Compare July 4, 2023 16:54
@problame problame changed the title fix #666: panic if tenative & regular replication cursors present for the same snapshot fix handling of tenative cursor presence if protection strategy doesn't use it Jul 4, 2023
@problame problame marked this pull request as ready for review July 4, 2023 18:16
@problame problame merged commit bbdc6f5 into master Jul 4, 2023
11 checks passed
@problame problame deleted the problame/iss-666-panic-multiple-abstractions-same-identity branch July 4, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic if tenative & regular replication cursors present for the same snapshot
1 participant