Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xCluster] Handle Large transaction Apply on consumer #16826

Closed
1 task done
hari90 opened this issue Apr 12, 2023 · 1 comment
Closed
1 task done

[xCluster] Handle Large transaction Apply on consumer #16826

hari90 opened this issue Apr 12, 2023 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue xCluster Label for xCluster related issues/improvements

Comments

@hari90
Copy link
Contributor

hari90 commented Apr 12, 2023

Jira Link: DB-6176

Description

Large transactions apply batching....
Flag txn_max_apply_batch_records

Notes:
What's the impact of this on large transactions that involve moving a lot of intents to Regular DB.
Impact: Replication stalls during this step.
Notes: Large transactions we only apply a limited number of intents, and then write a checkpoint entry in regular rocksDB and async apply rest of the intents. There is just 1 APPLYING OP, but RegularDB will have special checkpoints (intent key or key for reverse index).
Problem: 1. Amount of data can be so large that it can affect cluster stability. We need to split to batches. 2. Raft DoReplicated holds raft locks which will block raft heartbeats causing lease losses.
This SHOULD be disabled on the consumer side. We should keep the batching and async apply. But the WriteRequest RPC should wait for all batch to complete.
Make sure intent aware iterator can deal with intents moving to regular db during a scan.
Next steps: Test this! >100k KVs (with packed). Copy with a million rows. Without any changes does it break consistency or bring down the cluster?

  • Unit test this with bath size set to 10, and async sync delayed by 1min. This validates correctness
  • QA test this! This validates heartbeat delays

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@hari90 hari90 added kind/enhancement This is an enhancement of an existing feature area/docdb YugabyteDB core features xCluster Label for xCluster related issues/improvements labels Apr 12, 2023
@hari90 hari90 self-assigned this Apr 12, 2023
@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Apr 12, 2023
@yugabyte-ci yugabyte-ci assigned karan-yb and unassigned hari90 May 2, 2023
karan-yb added a commit that referenced this issue Jun 2, 2023
…nal intents

Summary:
This change uses RocksDB DirectWriter for applying the external batch, which helps avoid creating additional copy of data being copied from IntentsDB to RegularDB. It will help reduce the latency of apply operation as well as the transient memory usage for xcluster operations.
Jira: DB-6176

Test Plan: Jenkins

Reviewers: xCluster, slingam, sergei, mlillibridge

Reviewed By: slingam

Subscribers: mlillibridge, rthallam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D25843
karan-yb added a commit that referenced this issue Jun 5, 2023
… applying external intents

Summary:
Original commit: 4d21be5 / D25843
This change uses RocksDB DirectWriter for applying the external batch, which helps avoid creating additional copy of data being copied from IntentsDB to RegularDB. It will help reduce the latency of apply operation as well as the transient memory usage for xcluster operations.
Jira: DB-6176

Test Plan: Jenkins

Reviewers: xCluster, slingam, sergei, mlillibridge

Reviewed By: slingam

Subscribers: ybase, rthallam, mlillibridge

Differential Revision: https://phorge.dev.yugabyte.com/D25955
@lingamsandeep
Copy link
Contributor

XCluster uses a different path (ExternalIntentsWriter) to copy the records over from intents to regular db, therefore we never split a large transaction into batches. Another reason why XCluster copy is faster is because of combined external intents in intentsdb which reduces the number of keys to traverse in IntentsDB

@lingamsandeep lingamsandeep closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue xCluster Label for xCluster related issues/improvements
Projects
None yet
Development

No branches or pull requests

4 participants