Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid fast past update restart race with concurrently created replica #11561

Conversation

vekterli
Copy link
Member

@geirst please review. This fix was hammered out in a cramped seat at 35,000 feet (occasionally interrupted by a high fidelity surround sound setup of screaming infants) so I will likely polish these changes a bit later...!

After the recent change to allow safe path updates to be restarted
as fast path updates iff all observed document timestamps are equal,
a race condition regression was introduced. If the bucket that the
update operation was scheduled towards got a new replica concurrently
created between the time that safe path Gets were sent and received,
it was possible for updates to be sent to inconsistent replicas. This
is because the Get and Update operations use the current database
state at their start time, not a stable snapshot state from the start
time of the two phase update operation itself.

Add an explicit check that the replica state between sending Gets and
Updates is unchanged. If it has changed, a fast path restart is not
permitted.

After the recent change to allow safe path updates to be restarted
as fast path updates iff all observed document timestamps are equal,
a race condition regression was introduced. If the bucket that the
update operation was scheduled towards got a new replica concurrently
created _between_ the time that safe path Gets were sent and received,
it was possible for updates to be sent to inconsistent replicas. This
is because the Get and Update operations use the current database
state at _their_ start time, not a stable snapshot state from the start
time of the two phase update operation itself.

Add an explicit check that the replica state between sending Gets and
Updates is unchanged. If it has changed, a fast path restart is _not_
permitted.
@vekterli vekterli requested a review from geirst December 13, 2019 10:14
@aressem aressem changed the title Avoid fast past update restart race with concurrently created replica Avoid fast past update restart race with concurrently created replica Dec 16, 2019
Copy link
Member

@geirst geirst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@geirst geirst assigned vekterli and unassigned geirst Dec 16, 2019
@vekterli vekterli changed the title Avoid fast past update restart race with concurrently created replica Avoid fast past update restart race with concurrently created replica Dec 16, 2019
@vekterli vekterli merged commit 6f5128b into master Dec 16, 2019
@vekterli vekterli deleted the vekterli/avoid-fast-path-update-race-with-concurrent-replica-creation branch December 16, 2019 15:39
vekterli added a commit that referenced this pull request Dec 20, 2019
Even with the fix in #11561 we are still observing replica
divergence warnings in the logs. Disabling this feature entirely
until the issue has been fully investigated and a complete fix
has been implemented.

Also emit a log message when the distributor has forced convergence
of a detected inconsistent update.
vekterli added a commit that referenced this pull request Dec 20, 2019
Even with the fix in #11561 we are still observing replica
divergence warnings in the logs. Disabling this feature entirely
until the issue has been fully investigated and a complete fix
has been implemented.

Also emit a log message when the distributor has forced convergence
of a detected inconsistent update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants