[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

rahuldesirazu · 2022-01-03T22:24:04Z

Jira Link: [DB-310](https://yugabyte.atlassian.net/browse/DB-310)

Product-Level Description and Requirements for the First Iteration

The Use Cases Supported

For the use case where only one cluster is taking writes at a given time, we will support the following transaction semantics on the target cluster:

Multi-shard atomic reads: A transaction spanning tablets A and B will either fully be readable or not at all.
Ordered multi-shard reads for a single-session: If timestamp(txn1) < timestamp(txn2) on the source universe, then the user will never be able to read txn2 before txn1 in a single session.
Consistent cutover on disaster recovery. When the target is promoted, it will have a clean cut of the system and preserve atomic and ordered reads.

For the above, multi-shard can refer to single-table, multi-table, or table-index. So this will encapsulate guarantees for transactional updates to a main table and index, as well as two separate transactions to a parent then child foreign key table.

Product-level Changes and Semantics

Reads on the target will occur at a timestamp bounded by the laggiest tablet in the system. If the lag is 1s, the read timestamp for any read will be at least 1s in the past.
This read time will be applied on all tables on the target, whether or not they are under replication.
Each server will periodically get an updated version of the read time from the master leader. So if the master process is down or there is a master - tserver partition, the read time will not advance and reads will become increasingly stale.
In the case of DR cutover, all tablets will cutover to the timestamp of the laggiest tablet. What this means is, if the laggiest tablet is behind by 1s, every tablet will lose 1s worth of data in the system.
A toggle will be introduced between reading at a consistent but past timestamp or reading at the current time. The tradeoff is the atomic and ordered guarantees of an older timestamp vs lower latency and less data loss in case of a cutover of a real-time read.
When target-side read staleness exceeds the history retention cutoff (currently 15m), the target will reject reads to avoid serving incorrect data.
Transaction status WAL retention will need to be increased on the source, which will which will increase space utilization.
The user will need to do a full backup restore of all tables in case the transaction status replication stream can’t catch up. This is because the transaction status table has no B/R mechanism and therefore the only way to resolve transactions on the target for user tables is to do a B/R for those tables.

Situations not supported

Active-acive use cases (both sides taking independent writes) will not preserve transactional guarantees. For this case, we will continue to use last-writer wins semantics.
Ordered multi-shard reads across sessions: If timestamp(txn1) < timestamp(txn2) on the source universe, then the user may still be to read txn2 before txn1 across multiple sessions.
1 : N, N : 1, or triangle formations
Local transaction status tables on the source cluster

Subtasks

Open Problems

What do we do when we need to bootstrap the txn status table? Since there is no rocksdb for this table and all committed txns are eventually gc'ed, we may need to bootstrap the entire cluster over again to resolve lost transactions.
How do we ensure that two versions of ht_read on two different servers are bounded? On a single cluster, we use clock skew to bound the read time on any two servers. However, ht_read doesn't have any bound between two consecutive versions.
Do we need to change history retention at all? Given we're reading at a timestamp in the past, if ht_read falls below history retention, we may no longer be able to serve reads.

bmatican · 2022-03-31T16:16:17Z

Removing priority label as this is a higher level tracking task.

rahuldesirazu added area/docdb YugabyteDB core features priority/high High Priority labels Jan 3, 2022

rahuldesirazu added this to To do in xCluster replication via automation Jan 3, 2022

This was referenced Jan 3, 2022

[docdb] xCluster Roadmap Firehose #10319

Closed

[xCluster] Roadmap for 2.13 #11015

Closed

rahuldesirazu mentioned this issue Feb 14, 2022

[2DC] Apply transactions atomically on replica universe #1808

Closed

bmatican removed the priority/high High Priority label Mar 31, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 11, 2022

rahuldesirazu self-assigned this May 11, 2022

lingamsandeep added the xCluster Label for xCluster related issues/improvements label Jun 30, 2022

yugabyte-ci assigned hari90 and unassigned rahuldesirazu Jun 30, 2022

yugabyte-ci added kind/new-feature This is a request for a completely new feature and removed kind/bug This issue is a bug labels Jul 9, 2022

vkulichenko mentioned this issue Aug 5, 2022

[DR] Support 2-region DR #13532

Open

mrajcevic01 mentioned this issue Oct 3, 2022

[Docs] Atomicity Limitations of xCluster Need to be Clearer #14294

Closed

hari90 closed this as completed Sep 26, 2023

xCluster replication automation moved this from To do to Done Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

rahuldesirazu commented Jan 3, 2022 •

edited

Loading

bmatican commented Mar 31, 2022

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

Comments

rahuldesirazu commented Jan 3, 2022 • edited Loading

Product-Level Description and Requirements for the First Iteration

The Use Cases Supported

Product-level Changes and Semantics

Situations not supported

Subtasks

Open Problems

bmatican commented Mar 31, 2022

rahuldesirazu commented Jan 3, 2022 •

edited

Loading