You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, on initial join we send the current vinyl state. To do that,
we open a read iterator over a space's primary index and send statements
returned by it. Such an approach has a number of inherent problems:
- An open read iterator blocks compaction, which is unacceptable for
such a long operation as join. To avoid blocking compaction, we open
the iterator in the dirty mode, i.e. it skims over the tops. This,
however, introduces a different kind of problem: this makes the
threshold between initial and final join phases hazy - statements
sent on final join may or may not be among those sent during the
initial join, and there's no efficient way to differentiate between
them w/o sending extra information.
- The replica expects LSNs to be growing monotonically. This constraint
is imposed by the lsregion allocator used for storing statements in
memory, but read iterator returns statements ordered by key, not by
LSN. Currently, replica simply crashes if statements happen to be
sent in an order different from chronological, which renders vinyl
replication unusable. In the scope of the current model, we can't fix
this by assigning fake LSNs to statements received on initial join,
because there's no strict LSN threshold between initial and final
join phases (see the previous paragraph).
- In the initial join phase, replica is only aware of spaces that were
created before the last snapshot, while vinyl sends statements from
spaces that exist now. As a result, if a space was created after the
most recent snapshot, the replica won't be able to receive its tuples
and fail.
To address the above-mentioned problems, we make vinyl initial join send
the latest snapshot, just like in case of memtx. We implement this by
loading the vinyl state from the last snapshot of the metadata log and
sending statements of all runs from the snapshot as is (including
deletes and updates), to be applied by the replica. To make lsregion at
the receiving end happy, we assign fake monotonically growing LSNs to
statements received on initial join. This is OK, because
any LSN from final join > max real LSN from initial join
max real LSN from initial join >= max fake LSN
hence
any LSN from final join > any fake LSN from initial join
Besides fixing vinyl replication, this patch also enables the
replication test suite for the vinyl engine (except for hot_standby)
and makes engine/replica_join cover the following test cases:
- secondary indexes
- delete and update statements
- keys added in an order different from LSN
- recreate space after checkpoint
Closes#1911Closes#2001
0 commit comments