Gh 8654 memtx mvcc abort readers of prepared 2.11 #8796

alyapunov · 2023-06-21T15:44:20Z

A huge refactoring of MVCC engine.

Closes #8654
Closes #8648
Closes #8326

(backport from master).

That's strange, but in this test in a group of simple test cases there are test cases that checks replaces, updates and deletes, but occasionally there's no test case that checks inserts. Fix it and add simple test cases for inserts. No logical changes. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=new test case NO_CHANGELOG=new test case (cherry picked from commit 37b4561)

If addition of a tuple is rolled back while the corresponding story is needed for something else (for example it stores a read set of another transaction) - the story cannot be deleted. Now there's a special flag `rollbacked` that is set to true for such stories, and the flag must be considered in places where history chains are scanned. That approach also requires psn to be set for rolled-back transactions, which surprisingly not as simple as it to say. All that makes the code complicated and hard to maintain. There's another approach for managing rolled back stories: simply set their del_psn to a low enough value (lower than any existing transaction's PSN) and (if necessary) push them to the end of history chain. Such a story would be invisible to any transaction due to already existing mechanisms, that's what is needed. In order to provide "low enough" del_psn it will be natural to assign real PSN starting from some predefined value, so any value below that predefined value will be less that any existing PSN and thus "low enough". Implement this more simple approach. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit ba394a5)

Conflict trackers are used to store information that if some transaction is committed then some another transaction must be aborted. This happens when the first transaction writes some key while the other reads the same key. On the other hand there are another trackers - read trackers - that are designed to handle exactly the same situation. That's why conflict trackers can be simply replaced with read trackers. That would allow to remove conflict trackers as not needed anymore. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit a6c2b9f)

Remove runtime allocation error handling and use panic-on-fail versions of allocation functions. Reasons for that: * Memory error handling was never tested an probably doesn't work. * Some return codes was ignored so the code had obvious flaws. * Rollback in case of memory error made some code overcomplicated. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=no new functionality added NO_TEST=no new functionality added NO_CHANGELOG=no new functionalily added (cherry picked from commit c951c9d)

Before this patch there were several different places in the code that deal with referencing tuple in space, setting in_index member and marking the story as retained or not. But logically all above is about the same - about placing a story to the top of a chain, i.e. the first story in version list to which index points. This commit refactors these things a bit. This mostly relates to two functions - memtx_tx_story_new and memtx_tx_story_link_top. Changes in memtx_tx_story_new are based on the fact that if a story is created by tuple, it is or immediately will be at the top of chain. Considering this we can omit argument `is_referenced_to_pk` and always create a story ready to be in top of chain. If a story is already in the top - nothing else is needed; if it is to become the top - memtx_tx_story_link_top must be called after. Further, linking to top of chain is needed exactly in two cases: * if a story just created by memtx_tx_story_new must become a top * if a chain is reordered involving the top story (the top and the next stories are swapped) These two cases are logically very close but still different. Even more, previously there were two functions for that: memtx_tx_story_link_top_light and memtx_tx_story_link_top correspondingly. This commit introduces one function for that (although with one more argument) that also incapsulates activities about referencing tuples and marking stories as retained. After this patch the rules are logical and simple: * if a tuple is inserted - call _story_new and _link_top(.. true). * if a story of existing clean tuple is needed - call _story_new. * if a chain is reordered involving top story - _link_top(.. false). Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 202340b)

There's a special 'point hole' mechanism in mvcc transactional manager that manages point gap reads by full key when no raw tuple was found in the index. For instance, it's the only way to collect gap reads for non-tree indexes. Once a new tuple is inserted to the index, the read records are transferred to the normal read set in the corresponding story. Actually after that the 'point hole' record in no more needed. So let's remove it. While we are here, drop unused point_holes_size, improve names and comments. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 62c6563)

By a mistake in 8a56514 a shortcut was added to procedure that handles gap write: it was considered that if the writing transaction is the same as reading - there is no actual conflict that must be stored further. That was a wrong decision: if such a transaction yields and another transaction comes and commits a value with the same key - the first one must go to conflicted state since it has read no more possible state. Another similar mistake was made in e6f5090, where writing after full scan of the same transaction was not tracked as read. Obviously that was wrong: if some other transaction overwrites the key and commits - this transaction must go to read view since it did not see anything by this key which is not so anymore. Fix it, reverting the first commit and an modifying the second and add a test. Closes tarantool#8326 NO_DOC=bugfix (cherry picked from commit b41c454)

The only place where this static function is used is more general static function - memtx_tx_track_read_story. This is a bit confusing - usually slow variant stand for public inline 'fast' method. So merge both functions in one. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 32e41b7)

In our SQL implementation temporary spaces are used. They come to MVCC engine in two variants - NULL or ephemeral. In both cases MVCC engine must not do anything, there are several checks for that in different parts of code. Normalize these checks and make them similar to each other. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 86a8155)

In further commit this list will be used for tracking all read gaps, not only 'nearby'. Since this rename has rather huge diff, let's make it in separate commit. No logical changes. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit d3feb69)

Now read trackers are used both for cases when a transaction has read an existing value and it has read nothing (read by key but there was no visible tuple in this place). For latter case an additional index_mask was used to identify from which index the read was done. Along with that there was per-index interval gap trackers. This patch divides area of responsibility between read trackers and gap tracker in the following way: * Reads of existing visible values are stored in read trackers. * Reads of non-existing or non-visible values are store in gap trackers. This new approach allows to provide new invariants: gap trackers are stored only at top of chain, and read trackers are stored only at topmost committed story in chain. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 7b8b78b)

Now there are three kinds of very close trackers: * The transaction have read some tuple that is not committed and thus not visible. This kind is now stored as gap_item with is_nearby = false. * The transaction made a select or range scan, reading a key or range between two adjacent tuples of the index. This kind is stored as gap_iteam with is_nearby = true. * A transaction completed a full scan of unordered index. This kind is stored as full_scan_item. All these trackers serve for the same thing: to record a fact that a transaction read something but didn't see anything. There are some problems with the current solution: * gap_item with is_nearby = false has several unused members that just consume space. * bool is_nearby flag for type descriptin is an ugly solution. * full_scan_item is separated from logically close items. This commit joins all these trackers under one base (that is struct gap_item_base) and solves problems above. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit f8d97a2)

The function memtx_tx_story_delete is expected to delete fully unlinked stories and thus should not try to unlink something by itself. So remove unlink and add asserts instead. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 0706740)

It was an ugly solution when MVCC engine requires outside engine to set this flag which is not convenient. Remove it and use mode arguments to set up proper read trackers. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 7414973)

The latter flag is a bit wider: it reveals not only inserting statements after deleting by the same transaction, but also replacing and deleting statements after all kinds of previois changes but the same transaction. This extended behavior will be used in further commits. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 3cfa675)

Almost completely rewrite, simplify and comment this part of code. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 63da3be)

Almost completely rewrite, simplify and comment this part of code. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit e901507)

Rollback is rather complicated part if MVCC implementation that is meant to handle two kinds of rollback: * rollback from in-progress state, if box.rollback() is called. * rollback from prepared state, when WAL fails. Unfortunately the last one was not properly tested and surely has at least one flaw. When an inserting transaction becomes prepared its stories could be linked as deleted (via del_stmt pointer) by other in-progress transactions in order to maintain correct visibility. The problem is that in case of rollback of such prepared transaction those links remained. Broken links breaks the chain structure, and some older changes (that were linked as deleted before that hapless preparation) can become visible to other transaction. Refactor, simplify a bit that part of code and fix the issue described above; cover with tests. Closes tarantool#8648 NO_DOC=bugfix (cherry picked from commit 85569d9)

There's case when a transaction is rolled back from prepared state. This happens when WAL fails, synchronized replication confirmation failure or perhaps in other similar cases. By design other RW transactions and transactions with READ_COMMITTED isolation level are allowed to read prepared state. All these transactions must be aborted in case of rollback of prepared transaction since they definitely have read no more possible database state. This patch implements this abortion. Closed tarantool#8654 NO_DOC=bugfix (cherry picked from commit 5498690)

coveralls · 2023-06-21T16:27:43Z

coverage: 85.778% (+0.05%) from 85.726% when pulling 79482d0 on alyapunov:gh_8654_memtx_mvcc_abort_readers_of_prepared-2.11 into 98d892a
on tarantool:release/2.11.

alyapunov added 3 commits June 21, 2023 18:42

alyapunov added the full-ci Enables all tests for a pull request label Jun 21, 2023

alyapunov added 16 commits June 21, 2023 19:11

memtx: simplify and refactor memtx_tx_history_add_stmt

bc36260

Almost completely rewrite, simplify and comment this part of code. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit 63da3be)

memtx: simplify and refactor memtx_tx_history_prepare_stmt

271f7ad

Almost completely rewrite, simplify and comment this part of code. Part of tarantool#8648 Part of tarantool#8654 NO_DOC=refactoring NO_TEST=refactoring NO_CHANGELOG=refactoring (cherry picked from commit e901507)

alyapunov force-pushed the gh_8654_memtx_mvcc_abort_readers_of_prepared-2.11 branch from e1ac9e0 to 79482d0 Compare June 21, 2023 16:11

alyapunov mentioned this pull request Jun 21, 2023

memtx: mvcc abort readers of prepared #8665

Merged

alyapunov merged commit aca3092 into tarantool:release/2.11 Jun 22, 2023
86 checks passed

alyapunov deleted the gh_8654_memtx_mvcc_abort_readers_of_prepared-2.11 branch November 24, 2023 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gh 8654 memtx mvcc abort readers of prepared 2.11 #8796

Gh 8654 memtx mvcc abort readers of prepared 2.11 #8796

alyapunov commented Jun 21, 2023

coveralls commented Jun 21, 2023

Gh 8654 memtx mvcc abort readers of prepared 2.11 #8796

Gh 8654 memtx mvcc abort readers of prepared 2.11 #8796

Conversation

alyapunov commented Jun 21, 2023

coveralls commented Jun 21, 2023