WT-8747 Reading between commit_timestamp and durable_timestamp can produce inconsistency #7485

sauclovian-wt · 2022-01-29T10:14:44Z

Prohibit reading for updates that are committed but not stable.

lukech · 2022-01-29T10:14:46Z

autobuild: Can one of the admins verify this patch?

sauclovian-wt · 2022-01-29T10:20:51Z

This version produces WT_ROLLBACK if you read at the wrong time; that's not necessarily the best behavior, as retrying won't succeed. Also, it probably ought to stop prohibiting the read once stable moves forward, and this version doesn't. And I think it's also necessary to check on-disk values as well, since they might be evicted without being fully stable yet. So I've marked this a draft, but it's probably good enough to run preliminary tests to see if it blows up the world.

Update by KB:
After discussion, the plan is to return WT_ROLLBACK, but include stable in the check so it won't repeatedly fail on the same timestamp.

1. Don't need to reject reads once stable reaches durable. (Whether or not the value is checkpointed; once it's stable nobody can commit and become durable before it does, so it's ok for them to read it.) This limits the possible problems to things happening shortly after commits. 2. Do need to check on-disk values; they can easily get evicted before they're stable.

Either move reads from between commit and durable to durable or after, or assert that reading in between fails, or both.

sauclovian-wt · 2022-02-02T00:02:18Z

(and if anyone's wondering, it does explode the world)

Whole pile of these failed, but fortunately they all have pretty much the same structure and needed the same fix.

sauclovian-wt · 2022-02-02T05:27:13Z

Don't bother running tests on this push (dcf4fdf); there's still a bit missing (on-disk values with nondurable stop times) and that causes at least one known failure and fixing it may introduce more.

(this makes reads of nondurable removes fail once they've been evicted) Add a specific test for this issue, that among other things makes sure that the restrictions don't interfere with RTS.

sauclovian-wt · 2022-02-03T01:53:12Z

There's still a way to get inconsistent behavior: read from past the initial commit's durable timestamp, but commit with no timestamp, which becomes durable immediately, then checkpoint before stable reaches the initial commit's durable timestamp and crash. I'm inclined to think that reading with a read timestamp and then committing without any timestamp (in one transaction) isn't valid, though it's not prohibited and nothing in the docs says not to. Unfortunately, you can do the same thing by reading with no timestamp and committing with no timestamp, which is harder to argue about and doesn't seem to directly violate any of the strictures about reading timestamped tables with no timestamp.

Not sure what to do about either of these cases, but the changes here do at least fix the fully timestamped part of the problem. (And they should be ready to go now.)

keithbostic

This makes sense to me, but I'm not a great reviewer for it. @kommiharibabu, if you think we should have an additional reviewer on this, please select someone.

src/include/txn.h

src/include/txn_inline.h

keithbostic · 2022-02-03T23:12:47Z

MDB patch run: https://evergreen.mongodb.com/version/61fc61400305b973ad2bf953?redirect_spruce_users=true

kommiharibabu · 2022-02-04T12:01:06Z

src/docs/timestamp-prepare.dox

+reading its writes in a second transaction and then committing such that the
+second transaction becomes durable before the first can produce data
+inconsistency. To help prevent this, reads between the commit and durable
+timestamp will fail with ::WT_ROLLBACK if the global stable timestamp has not


As per the recent conversation with @keithbostic, MongoDB read operations don't expect a WT_ROLLBACK error. MongoDB may not hit this scenario, but in case it happens can it cause some problem? How about returning a WT_NOTFOUND instead if this is the only update for the key or returning a historical version?

Based on this change, the entire approach can change, @keithbostic can you provide your views on the same?

Interesting... I'd prefer to go with rollback if it's possible. My concern is that notfound could be interpreted by the caller as the row physically isn't there which I think could itself lead to interesting and hard-to-debug failures.

I agree the list of failures in the patch build was unexpected. I guess my plan is to make this our next SERVER ticket once the WT-8745/SERVER-63166 pair is resolved, and understand the failures. Hopefully they're fixable, but if not, maybe notfound is our fallback position. I'm going to run a patch build with notfound, and see if that makes a difference.

Let me know if I'm misunderstanding the implications of returning notfound, please.

Returning the previous version (or WT_NOTFOUND if there is no previous version) is mostly equivalent to rounding the commit timestamp up to the durable timestamp. I suspect that's not what we want, since there's then no point distinguishing the commit timestamp from the durable timestamp.

(I say "mostly equivalent" because if this applies only when stable hasn't moved up to the durable timestamp yet, then it's equivalent to that only before that point. But changing what you see from one value to another when reading in that interval, depending on where stable is, seems like a bad plan also.)

Always returning WT_NOTFOUND in that interval seems bad because it's providing wrong data rather than no data, so I'd rather not go that way if possible.

There's another completely different approach, which is that what we're doing here is trying to avoid generating dependent transactions. But dependent transactions aren't catastrophic and this is a minor manifestation of them; we could just track the dependence. I'll make a top-level post about this so it doesn't get buried.

Thanks for the details @keithbostic and @sauclovian-wt.

Returning WT_ROLLBACK has problems with MDB as they don't handle it. If MDB supports handling WT_ROLLBACK for all the read operations it shouldn't be a problem.

Returning WT_NOTFOUND has problems as described by David.

I am thinking of another return type WT_PREPARE_CONFLICT instead of either WT_ROLLBACK or WT_NOTFOUND as this issue occurs only with prepared transactions. I believe that MDB should have the code to handle the WT_PREPATE_CONFLICT error for reading operations.

That's an interesting idea -- thank you, @kommiharibabu.

Let's run a patch check, and see if that passes the MDB server tests.

Here's the patch run: https://evergreen.mongodb.com/version/62014cbc2a60ed3347152d81?redirect_spruce_users=true

kommiharibabu · 2022-02-04T12:09:21Z

MDB patch build has a lot of failures, I checked one failure where the dbcheck failed with conflict operation. I suspect it is due to our WT_ROLLBACK error.

keithbostic · 2022-02-04T15:43:54Z

Patch build returning notfound instead of rollback: https://evergreen.mongodb.com/version/61fd6248a4cf472d4bc3243c?redirect_spruce_users=true

Update: it's a little better, but not significantly, still a large number of failures.

sauclovian-wt · 2022-02-05T00:57:39Z

What we're trying to do with this patch is avoid generating dependent transactions: basically, if txn1 commits and txn2 reads what it wrote before it becomes stable, then txn2 depends on txn1 in the sense that it must not become durable before txn1. The current patchset prevents this by not allowing txn2 to read txn1's commits until either they're stable or we're guaranteed that txn2 cannot become durable before txn1 via other constraints.

We could instead just remember the dependency and fail txn2 or round up its durable timestamp if it tries to commit too early. Unlike typical dependent transaction scenarios where txn2 is prevented from /committing/ before txn1 (which means that if txn1 aborts, txn2 also must abort, which requires a lot of machinery to handle) all we really need to do to handle this is to track txn2's earliest legal durable timestamp depending on what it reads. This requires code at all the same places as the current patchset, but instead of a visibility check we just track the maximum durable timestamp that we've seen, and then at commit time check it against the proposed durable (or commit) timestamp and the value of stable at that point. Downside: we'd need to round up rather than fail, because failure at that point will panic, but it's possible that this could be worked around by checks at prepare time. Upside: it handles the cases where txn2 commits without a timestamp.

Other downside though is that it's kind of against the system architecture (which isn't set up to track reads in detail) so the changes will likely be kinda messy.

sauclovian-wt · 2022-02-05T01:03:29Z

Separately, I saw something in an old Jira ticket yesterday that suggests that MongoDB uses all_durable to pick new commit timestamps. If that's true (and it's always true rather than mostly true) then they won't ever actually hit this problem and it might make sense to give them a flag to silence the checks.

(nope, that's too optimistic; all_durable plus a little isn't necessarily ahead of txn1's durable. never mind this bit)

keithbostic · 2022-02-09T01:03:09Z

New patch build: https://evergreen.mongodb.com/version/620312875623432d870f0ea5?redirect_spruce_users=true

sauclovian-wt · 2022-02-23T02:41:32Z

(Don't bother testing what I just pushed; just want a non-broken changeset for archival before I start on a completely different approach to the problem.)

Instead of prohibiting reads of nondurable data, track the durable timestamps of data read by a transaction and prohibit committing with a durable timestamp (or commit timestamp for non-prepared transactions) earlier than that. Keep the test changes; most of them are about only reading nondurable data when we explicitly mean to. It seems best to avoid doing that otherwise.

Don't read g.ts_stable until it's actually been set by set_stable(), otherwise we can try and set the oldest timestamp after the stable timestamp.

…edtiger#7577) The cache size is now set to 1GB instead of 100MB by default.

Add more documentation around the ignore_prepare configuration.

sauclovian-wt · 2022-02-25T02:43:32Z

Here is the different approach.

However, the conclusion for the time being is that this is also reasonably likely to break MongoDB and everyone's tired of thinking about it. So the plan for now is to close WT-8747 by documenting the behavior (and merging the test changes that avoid reading nondurable data except on purpose) with a separate pull request, then archiving this one and maybe coming back to it in a while.

sauclovian-wt · 2022-02-26T01:19:34Z

Update: we closed WT-8747 (via another pull request) by documenting the issue as a hazard. That includes the test cleanup, so I've merged that here. I opened a new ticket WT-8881 to track the fact that we should really eventually figure out a way to prohibit these commits; it points back here for archival purposes since these changes are probably still useful.

Also I've ascertained that it is potentially necessary to track the durable timestamps of values read from history, which the code here so far doesn't do, and the new test in here doesn't check. Checking it is a bit of a nuisance, unfortunately.

I'm going to close this pull request.

Prohibit reading for updates that are committed but not stable.

1400caf

sauclovian-wt marked this pull request as draft January 29, 2022 10:14

sauclovian-wt added 2 commits February 1, 2022 03:24

Fix up assorted tests.

293957a

Either move reads from between commit and durable to durable or after, or assert that reading in between fails, or both.

Fix up the RTS tests.

dcf4fdf

Whole pile of these failed, but fortunately they all have pretty much the same structure and needed the same fix.

Handle on-disk nondurable stop times as well as start times.

41dc3fd

(this makes reads of nondurable removes fail once they've been evicted) Add a specific test for this issue, that among other things makes sure that the restrictions don't interfere with RTS.

keithbostic requested review from keithbostic and kommiharibabu February 3, 2022 15:19

keithbostic approved these changes Feb 3, 2022

View reviewed changes

src/include/txn.h Outdated Show resolved Hide resolved

src/include/txn_inline.h Outdated Show resolved Hide resolved

keithbostic marked this pull request as ready for review February 3, 2022 21:53

sauclovian-wt added 3 commits February 3, 2022 18:28

Resort enum and fix incomprehensible comment.

7854c8f

Merge branch 'develop' into wt-8747-read-nondurable

eb237a6

Update the docs. Add a warning about the cases we still can't detect.

7f65022

kommiharibabu reviewed Feb 4, 2022

View reviewed changes

Switch to returning WT_PREPARE_CONFLICT for this instead of WT_ROLLBACK.

c18971d

sauclovian-wt added 4 commits February 16, 2022 00:11

Honor ignore_prepare=true. Test that it works.

4cd4818

Merge branch 'develop' into wt-8747-read-nondurable

1a9ce37

Fix tests that failed in the last test run, for posterity.

9af2e1f

Merge branch 'develop' into wt-8747-read-nondurable

ba8837f

sauclovian-wt and others added 9 commits February 24, 2022 02:23

WT-8866 Disable LSM performance tests in Evergreen (wiredtiger#7576)

9c2105d

WT-8708 Fix timestamp usage error in test/checkpoint (wiredtiger#7582)

0fb96bc

Don't read g.ts_stable until it's actually been set by set_stable(), otherwise we can try and set the oldest timestamp after the stable timestamp.

WT-8706 Increase the cache size in all test_checkpoint scenarios (wir…

5907987

…edtiger#7577) The cache size is now set to 1GB instead of 100MB by default.

WT-6422 Document expectation around ignore_prepare (wiredtiger#7568)

978cae0

Add more documentation around the ignore_prepare configuration.

WT-8695 Remove file_close_sync config (wiredtiger#7583)

8d63999

Initial merge with the WT-8170 change was no good; try again.

2065eff

Rewrite the new durable_ts04 for this approach.

64d6d3c

Adjust durable_ts04 to avoid tripping on an unrelated issue.

c4c3e75

sauclovian-wt marked this pull request as draft February 25, 2022 02:43

sauclovian-wt added 2 commits February 25, 2022 19:31

Merge branch 'develop' into wt-8747-read-nondurable

7a54e23

Fix up merge botches.

55d2ff0

Merge branch 'develop' into wt-8747-read-nondurable

39fa982

sauclovian-wt closed this Feb 26, 2022

keithbostic deleted the wt-8747-read-nondurable branch February 26, 2022 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WT-8747 Reading between commit_timestamp and durable_timestamp can produce inconsistency #7485

WT-8747 Reading between commit_timestamp and durable_timestamp can produce inconsistency #7485

sauclovian-wt commented Jan 29, 2022

lukech commented Jan 29, 2022

sauclovian-wt commented Jan 29, 2022 •

edited by keithbostic

sauclovian-wt commented Feb 2, 2022

sauclovian-wt commented Feb 2, 2022

sauclovian-wt commented Feb 3, 2022

keithbostic left a comment

keithbostic commented Feb 3, 2022

kommiharibabu Feb 4, 2022

keithbostic Feb 4, 2022

sauclovian-wt Feb 5, 2022

kommiharibabu Feb 6, 2022

keithbostic Feb 7, 2022

keithbostic Feb 7, 2022

kommiharibabu commented Feb 4, 2022

keithbostic commented Feb 4, 2022 •

edited

sauclovian-wt commented Feb 5, 2022

sauclovian-wt commented Feb 5, 2022 •

edited

keithbostic commented Feb 9, 2022

sauclovian-wt commented Feb 23, 2022

sauclovian-wt commented Feb 25, 2022

sauclovian-wt commented Feb 26, 2022

WT-8747 Reading between commit_timestamp and durable_timestamp can produce inconsistency #7485

WT-8747 Reading between commit_timestamp and durable_timestamp can produce inconsistency #7485

Conversation

sauclovian-wt commented Jan 29, 2022

lukech commented Jan 29, 2022

sauclovian-wt commented Jan 29, 2022 • edited by keithbostic

sauclovian-wt commented Feb 2, 2022

sauclovian-wt commented Feb 2, 2022

sauclovian-wt commented Feb 3, 2022

keithbostic left a comment

Choose a reason for hiding this comment

keithbostic commented Feb 3, 2022

kommiharibabu Feb 4, 2022

Choose a reason for hiding this comment

keithbostic Feb 4, 2022

Choose a reason for hiding this comment

sauclovian-wt Feb 5, 2022

Choose a reason for hiding this comment

kommiharibabu Feb 6, 2022

Choose a reason for hiding this comment

keithbostic Feb 7, 2022

Choose a reason for hiding this comment

keithbostic Feb 7, 2022

Choose a reason for hiding this comment

kommiharibabu commented Feb 4, 2022

keithbostic commented Feb 4, 2022 • edited

sauclovian-wt commented Feb 5, 2022

sauclovian-wt commented Feb 5, 2022 • edited

keithbostic commented Feb 9, 2022

sauclovian-wt commented Feb 23, 2022

sauclovian-wt commented Feb 25, 2022

sauclovian-wt commented Feb 26, 2022

sauclovian-wt commented Jan 29, 2022 •

edited by keithbostic

keithbostic commented Feb 4, 2022 •

edited

sauclovian-wt commented Feb 5, 2022 •

edited