-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling for mediator time jumps in datashard #2342
Better handling for mediator time jumps in datashard #2342
Conversation
⚪
|
⚪
|
Changelog entry
Better handling for mediator time jumps in datashard
Changelog category
Additional information
While investigating G2-item and G-single-item anomalies detected with Jepsen, it was discovered that datashards didn't handle mediator time jumps very well. When mediator is restarted it may replay transaction stream that has not been acknowledged yet. This in turn could cause time cast atomic variable to jump backwards, and lead to confusion, where a chosen mvcc version wouldn't later produce intended side-effects and edge promotions. For example a write version may chose the current mediator step, but later current step jumps backwards, and
PromoteCompleteEdge
is not called, because the write is "in the future". This could theoretically cause later reads to incorrectly choose an earlier version (based on a concurrent distributed transaction) than intended. It's unclear whether there's an actual bug though, sincePromoteImmediatePostExecuteEdges
currently callsMarkPlannedLogicallyCompleteUpTo
(which also callsPromoteCompleteEdge
for all earlier inflight distributed transactions), however we may want to remove that call later (to avoid unintended writes when performing reads concurrently with distributed transactions), and the current code is not robust enough.This PR has two fixes. First is to never allow atomic time cast variable to go backwards (it's too difficult to reason about code correctness otherwise). Second is to unambiguously choose mvcc versions: for new reads to always include all previously replied immediate writes, and for new writes to always happen after all previously performed immediate writes.
Fixes KIKIMR-21065.