[xCluster] Handle Large transaction Apply on consumer #16826

hari90 · 2023-04-12T18:43:16Z

Jira Link: DB-6176

Description

Large transactions apply batching....
Flag txn_max_apply_batch_records

Notes:
What's the impact of this on large transactions that involve moving a lot of intents to Regular DB.
Impact: Replication stalls during this step.
Notes: Large transactions we only apply a limited number of intents, and then write a checkpoint entry in regular rocksDB and async apply rest of the intents. There is just 1 APPLYING OP, but RegularDB will have special checkpoints (intent key or key for reverse index).
Problem: 1. Amount of data can be so large that it can affect cluster stability. We need to split to batches. 2. Raft DoReplicated holds raft locks which will block raft heartbeats causing lease losses.
This SHOULD be disabled on the consumer side. We should keep the batching and async apply. But the WriteRequest RPC should wait for all batch to complete.
Make sure intent aware iterator can deal with intents moving to regular db during a scan.
Next steps: Test this! >100k KVs (with packed). Copy with a million rows. Without any changes does it break consistency or bring down the cluster?

Unit test this with bath size set to 10, and async sync delayed by 1min. This validates correctness
QA test this! This validates heartbeat delays

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

…nal intents Summary: This change uses RocksDB DirectWriter for applying the external batch, which helps avoid creating additional copy of data being copied from IntentsDB to RegularDB. It will help reduce the latency of apply operation as well as the transient memory usage for xcluster operations. Jira: DB-6176 Test Plan: Jenkins Reviewers: xCluster, slingam, sergei, mlillibridge Reviewed By: slingam Subscribers: mlillibridge, rthallam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D25843

… applying external intents Summary: Original commit: 4d21be5 / D25843 This change uses RocksDB DirectWriter for applying the external batch, which helps avoid creating additional copy of data being copied from IntentsDB to RegularDB. It will help reduce the latency of apply operation as well as the transient memory usage for xcluster operations. Jira: DB-6176 Test Plan: Jenkins Reviewers: xCluster, slingam, sergei, mlillibridge Reviewed By: slingam Subscribers: ybase, rthallam, mlillibridge Differential Revision: https://phorge.dev.yugabyte.com/D25955

lingamsandeep · 2023-09-19T23:52:47Z

XCluster uses a different path (ExternalIntentsWriter) to copy the records over from intents to regular db, therefore we never split a large transaction into batches. Another reason why XCluster copy is faster is because of combined external intents in intentsdb which reduces the number of keys to traverse in IntentsDB

hari90 added kind/enhancement This is an enhancement of an existing feature area/docdb YugabyteDB core features xCluster Label for xCluster related issues/improvements labels Apr 12, 2023

hari90 self-assigned this Apr 12, 2023

yugabyte-ci added the priority/medium Medium priority issue label Apr 12, 2023

yugabyte-ci assigned karan-yb and unassigned hari90 May 2, 2023

lingamsandeep closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xCluster] Handle Large transaction Apply on consumer #16826

[xCluster] Handle Large transaction Apply on consumer #16826

hari90 commented Apr 12, 2023 •

edited by jira bot

lingamsandeep commented Sep 19, 2023

[xCluster] Handle Large transaction Apply on consumer #16826

[xCluster] Handle Large transaction Apply on consumer #16826

Comments

hari90 commented Apr 12, 2023 • edited by jira bot

Description

Warning: Please confirm that this issue does not contain any sensitive information

lingamsandeep commented Sep 19, 2023

hari90 commented Apr 12, 2023 •

edited by jira bot