Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Cleanup transactions from memory before CDC streams changes #21290

Closed
1 task done
es1024 opened this issue Mar 4, 2024 · 0 comments
Closed
1 task done

[DocDB] Cleanup transactions from memory before CDC streams changes #21290

es1024 opened this issue Mar 4, 2024 · 0 comments
Assignees
Labels
2024.1 Backport Required 2024.1.0_blocker area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@es1024
Copy link
Contributor

es1024 commented Mar 4, 2024

Jira Link: DB-10206

Description

We currently keep transaction metadata (RunningTransaction) in memory until apply op id <= cdc_checkpoint_op_id, to ensure that intents needed for CDC are not deleted before they get streamed by CDC. This means we have a fair amount of memory per transaction, and cannot set the CDC retention interval too high before memory usage becomes prohibitive.

We should instead clear this metadata from memory once the transaction has been applied, and rely on on-disk metadata for CDC.

Context: #15533

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@es1024 es1024 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Mar 4, 2024
@es1024 es1024 self-assigned this Mar 4, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Mar 4, 2024
es1024 added a commit that referenced this issue Mar 14, 2024
…tely

Summary:
TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata.

This diff makes the following changes:
1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`.
2. The transaction is immediately cleared from memory.
3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case.
  - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`.

These changes are behind two newly added gflags:
- `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it
- `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on

These gflags are on in debug and off in release by default.

**Regarding CDC unit tests:**

After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken).

Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors.

**Upgrade/Rollback safety:**

This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents.

These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off.

This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup).

This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry.

Jira: DB-10206

Test Plan: Jenkins.

Reviewers: sergei, skumar, mbautin, rthallam

Reviewed By: sergei, rthallam

Subscribers: yql, timur, ycdcxcluster, ybase, bogdan, rthallam

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31900
asrinivasanyb pushed a commit to asrinivasanyb/yugabyte-db that referenced this issue Mar 18, 2024
… immediately

Summary:
TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata.

This diff makes the following changes:
1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`.
2. The transaction is immediately cleared from memory.
3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case.
  - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`.

These changes are behind two newly added gflags:
- `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it
- `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on

These gflags are on in debug and off in release by default.

**Regarding CDC unit tests:**

After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken).

Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors.

**Upgrade/Rollback safety:**

This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents.

These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off.

This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup).

This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry.

Jira: DB-10206

Test Plan: Jenkins.

Reviewers: sergei, skumar, mbautin, rthallam

Reviewed By: sergei, rthallam

Subscribers: yql, timur, ycdcxcluster, ybase, bogdan, rthallam

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31900
es1024 added a commit that referenced this issue May 2, 2024
…om memory immediately

Summary:
Original commit: 559b2b0 / D31900
TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata.

This diff makes the following changes:
1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`.
2. The transaction is immediately cleared from memory.
3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case.
  - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`.

These changes are behind two newly added gflags:
- `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it
- `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on

These gflags are on in debug and off in release by default.

**Regarding CDC unit tests:**

After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken).

Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors.

**Upgrade/Rollback safety:**

This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents.

These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off.

This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup).

This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry.

Jira: DB-10206

Test Plan: Jenkins.

Reviewers: sergei, skumar, mbautin, rthallam

Reviewed By: rthallam

Subscribers: rthallam, bogdan, ybase, ycdcxcluster, timur, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33174
ZhenYongFan pushed a commit to ZhenYongFan/yugabyte-db that referenced this issue Jun 15, 2024
…r CDC from memory immediately

Summary:
Original commit: 559b2b0 / D31900
TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata.

This diff makes the following changes:
1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`.
2. The transaction is immediately cleared from memory.
3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case.
  - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`.

These changes are behind two newly added gflags:
- `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it
- `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on

These gflags are on in debug and off in release by default.

**Regarding CDC unit tests:**

After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken).

Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors.

**Upgrade/Rollback safety:**

This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents.

These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off.

This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup).

This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry.

Jira: DB-10206

Test Plan: Jenkins.

Reviewers: sergei, skumar, mbautin, rthallam

Reviewed By: rthallam

Subscribers: rthallam, bogdan, ybase, ycdcxcluster, timur, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024.1 Backport Required 2024.1.0_blocker area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants