Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast updates when replica metadata is out of sync but document itself is in sync #11319

Conversation

@vekterli
Copy link
Member

vekterli commented Nov 15, 2019

@toregge please review
@geirst FYI

When a bucket has replicas with mismatching metadata (i.e. they are out of sync),
the distributor will initiate a write-repair for updates to avoid divergence of
replica content. This is done by first sending a Get to all diverging replica
sets, picking the highest timestamp and applying the update locally. The updated
document is then sent out as a Put. This can be very expensive if document Put
operations are disproportionally more expensive than partial updates, and also
makes the distributor thread part of a contended critical path.

This commit lets TwoPhaseUpdateOperation restart an update as a "fast path"
update (partial updates sent directly to the nodes) if the initial read phase
returns the same timestamp for the document across all replicas.

It also removes an old (but now presumed unsafe) optimization where Get
operations are only sent to replicas marked "trusted" even if others are
out of sync with it. Since trustedness is a transient state that does not
persist across restarts or bucket handoffs, it's not robust enough to be
used for such purposes. Gets will now be sent to all out of sync replica
groups regardless of trusted status.

…self is in sync

When a bucket has replicas with mismatching metadata (i.e. they are out of sync),
the distributor will initiate a write-repair for updates to avoid divergence of
replica content. This is done by first sending a Get to all diverging replica
sets, picking the highest timestamp and applying the update locally. The updated
document is then sent out as a Put. This can be very expensive if document Put
operations are disproportionally more expensive than partial updates, and also
makes the distributor thread part of a contended critical path.

This commit lets `TwoPhaseUpdateOperation` restart an update as a "fast path"
update (partial updates sent directly to the nodes) if the initial read phase
returns the same timestamp for the document across all replicas.

It also removes an old (but now presumed unsafe) optimization where Get
operations are only sent to replicas marked "trusted" even if others are
out of sync with it. Since trustedness is a transient state that does not
persist across restarts or bucket handoffs, it's not robust enough to be
used for such purposes. Gets will now be sent to all out of sync replica
groups regardless of trusted status.
@vekterli vekterli merged commit 4059e8d into master Nov 15, 2019
2 checks passed
2 checks passed
Merge Stop Enforcer Check preventing merges at merge stop.
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@vekterli vekterli deleted the vekterli/restart-two-phase-updates-in-fast-path-if-docs-in-sync branch Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.