Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh 8654 memtx mvcc abort readers of prepared 2.11 #8796

Conversation

alyapunov
Copy link
Contributor

A huge refactoring of MVCC engine.

Closes #8654
Closes #8648
Closes #8326

(backport from master).

That's strange, but in this test in a group of simple test cases
there are test cases that checks replaces, updates and deletes,
but occasionally there's no test case that checks inserts.

Fix it and add simple test cases for inserts.

No logical changes.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=new test case
NO_CHANGELOG=new test case

(cherry picked from commit 37b4561)
If addition of a tuple is rolled back while the corresponding
story is needed for something else (for example it stores a read
set of another transaction) - the story cannot be deleted.
Now there's a special flag `rollbacked` that is set to true
for such stories, and the flag must be considered in places
where history chains are scanned. That approach also requires
psn to be set for rolled-back transactions, which surprisingly
not as simple as it to say. All that makes the code complicated
and hard to maintain.

There's another approach for managing rolled back stories: simply
set their del_psn to a low enough value (lower than any existing
transaction's PSN) and (if necessary) push them to the end of
history chain. Such a story would be invisible to any transaction
due to already existing mechanisms, that's what is needed.

In order to provide "low enough" del_psn it will be natural to
assign real PSN starting from some predefined value, so any value
below that predefined value will be less that any existing PSN and
thus "low enough".

Implement this more simple approach.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit ba394a5)
Conflict trackers are used to store information that if some
transaction is committed then some another transaction must be
aborted. This happens when the first transaction writes some
key while the other reads the same key. On the other hand there
are another trackers - read trackers - that are designed to
handle exactly the same situation. That's why conflict trackers
can be simply replaced with read trackers.

That would allow to remove conflict trackers as not needed
anymore.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit a6c2b9f)
@alyapunov alyapunov added the full-ci Enables all tests for a pull request label Jun 21, 2023
Remove runtime allocation error handling and use panic-on-fail
versions of allocation functions. Reasons for that:
* Memory error handling was never tested an probably doesn't work.
* Some return codes was ignored so the code had obvious flaws.
* Rollback in case of memory error made some code overcomplicated.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=no new functionality added
NO_TEST=no new functionality added
NO_CHANGELOG=no new functionalily added

(cherry picked from commit c951c9d)
Before this patch there were several different places in the code
that deal with referencing tuple in space, setting in_index member
and marking the story as retained or not. But logically all above
is about the same - about placing a story to the top of a chain,
i.e. the first story in version list to which index points.

This commit refactors these things a bit. This mostly relates to
two functions - memtx_tx_story_new and memtx_tx_story_link_top.

Changes in memtx_tx_story_new are based on the fact that if a story
is created by tuple, it is or immediately will be at the top of
chain. Considering this we can omit argument `is_referenced_to_pk`
and always create a story ready to be in top of chain. If a story
is already in the top - nothing else is needed; if it is to become
the top - memtx_tx_story_link_top must be called after.

Further, linking to top of chain is needed exactly in two cases:
* if a story just created by memtx_tx_story_new must become a top
* if a chain is reordered involving the top story (the top and the
  next stories are swapped)
These two cases are logically very close but still different.
Even more, previously there were two functions for that:
memtx_tx_story_link_top_light and memtx_tx_story_link_top
correspondingly. This commit introduces one function for that
(although with one more argument) that also incapsulates
activities about referencing tuples and marking stories as
retained.

After this patch the rules are logical and simple:
* if a tuple is inserted - call _story_new and _link_top(.. true).
* if a story of existing clean tuple is needed - call _story_new.
* if a chain is reordered involving top story - _link_top(.. false).

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 202340b)
There's a special 'point hole' mechanism in mvcc transactional
manager that manages point gap reads by full key when no raw
tuple was found in the index. For instance, it's the only way
to collect gap reads for non-tree indexes.

Once a new tuple is inserted to the index, the read records are
transferred to the normal read set in the corresponding story.
Actually after that the 'point hole' record in no more needed.

So let's remove it.

While we are here, drop unused point_holes_size, improve names
and comments.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 62c6563)
By a mistake in 8a56514 a shortcut was added to procedure
that handles gap write: it was considered that if the writing
transaction is the same as reading - there is no actual conflict
that must be stored further. That was a wrong decision: if such
a transaction yields and another transaction comes and commits
a value with the same key - the first one must go to conflicted
state since it has read no more possible state.

Another similar mistake was made in e6f5090, where writing
after full scan of the same transaction was not tracked as read.
Obviously that was wrong: if some other transaction overwrites
the key and commits - this transaction must go to read view since
it did not see anything by this key which is not so anymore.

Fix it, reverting the first commit and an modifying the second and
add a test.

Closes tarantool#8326

NO_DOC=bugfix

(cherry picked from commit b41c454)
The only place where this static function is used is more general
static function - memtx_tx_track_read_story. This is a bit
confusing - usually slow variant stand for public inline 'fast'
method.

So merge both functions in one.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 32e41b7)
In our SQL implementation temporary spaces are used. They come to
MVCC engine in two variants - NULL or ephemeral. In both cases
MVCC engine must not do anything, there are several checks for
that in different parts of code.

Normalize these checks and make them similar to each other.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 86a8155)
In further commit this list will be used for tracking all read
gaps, not only 'nearby'. Since this rename has rather huge
diff, let's make it in separate commit.

No logical changes.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit d3feb69)
Now read trackers are used both for cases when a transaction has
read an existing value and it has read nothing (read by key but
there was no visible tuple in this place). For latter case an
additional index_mask was used to identify from which index the
read was done. Along with that there was per-index interval gap
trackers.

This patch divides area of responsibility between read trackers
and gap tracker in the following way:
* Reads of existing visible values are stored in read trackers.
* Reads of non-existing or non-visible values are store in gap
  trackers.

This new approach allows to provide new invariants: gap trackers
are stored only at top of chain, and read trackers are stored
only at topmost committed story in chain.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 7b8b78b)
Now there are three kinds of very close trackers:
* The transaction have read some tuple that is not committed and
  thus not visible. This kind is now stored as gap_item with
  is_nearby = false.
* The transaction made a select or range scan, reading a key or
  range between two adjacent tuples of the index. This kind is
  stored as gap_iteam with is_nearby = true.
* A transaction completed a full scan of unordered index. This
  kind is stored as full_scan_item.

All these trackers serve for the same thing: to record a fact
that a transaction read something but didn't see anything.

There are some problems with the current solution:
* gap_item with is_nearby = false has several unused members that
  just consume space.
* bool is_nearby flag for type descriptin is an ugly solution.
* full_scan_item is separated from logically close items.

This commit joins all these trackers under one base (that is
struct gap_item_base) and solves problems above.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit f8d97a2)
The function memtx_tx_story_delete is expected to delete fully
unlinked stories and thus should not try to unlink something by
itself. So remove unlink and add asserts instead.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 0706740)
It was an ugly solution when MVCC engine requires outside engine
to set this flag which is not convenient.

Remove it and use mode arguments to set up proper read trackers.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 7414973)
The latter flag is a bit wider: it reveals not only inserting
statements after deleting by the same transaction, but also
replacing and deleting statements after all kinds of previois
changes but the same transaction. This extended behavior will
be used in further commits.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 3cfa675)
Almost completely rewrite, simplify and comment this part of code.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit 63da3be)
Almost completely rewrite, simplify and comment this part of code.

Part of tarantool#8648
Part of tarantool#8654

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

(cherry picked from commit e901507)
Rollback is rather complicated part if MVCC implementation that
is meant to handle two kinds of rollback:
* rollback from in-progress state, if box.rollback() is called.
* rollback from prepared state, when WAL fails.
Unfortunately the last one was not properly tested and surely
has at least one flaw. When an inserting transaction becomes
prepared its stories could be linked as deleted (via del_stmt
pointer) by other in-progress transactions in order to maintain
correct visibility. The problem is that in case of rollback of
such prepared transaction those links remained. Broken links
breaks the chain structure, and some older changes (that were
linked as deleted before that hapless preparation) can become
visible to other transaction.

Refactor, simplify a bit that part of code and fix the issue
described above; cover with tests.

Closes tarantool#8648

NO_DOC=bugfix

(cherry picked from commit 85569d9)
There's case when a transaction is rolled back from prepared
state. This happens when WAL fails, synchronized replication
confirmation failure or perhaps in other similar cases. By design
other RW transactions and transactions with READ_COMMITTED
isolation level are allowed to read prepared state. All these
transactions must be aborted in case of rollback of prepared
transaction since they definitely have read no more possible
database state.

This patch implements this abortion.

Closed tarantool#8654

NO_DOC=bugfix

(cherry picked from commit 5498690)
@alyapunov alyapunov force-pushed the gh_8654_memtx_mvcc_abort_readers_of_prepared-2.11 branch from e1ac9e0 to 79482d0 Compare June 21, 2023 16:11
@coveralls
Copy link

Coverage Status

coverage: 85.778% (+0.05%) from 85.726% when pulling 79482d0 on alyapunov:gh_8654_memtx_mvcc_abort_readers_of_prepared-2.11 into 98d892a
on tarantool:release/2.11
.

@alyapunov alyapunov merged commit aca3092 into tarantool:release/2.11 Jun 22, 2023
86 checks passed
@alyapunov alyapunov deleted the gh_8654_memtx_mvcc_abort_readers_of_prepared-2.11 branch November 24, 2023 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants