[DocDB] Cleanup transactions from memory before CDC streams changes #21290

es1024 · 2024-03-04T16:34:42Z

Description

We currently keep transaction metadata (RunningTransaction) in memory until apply op id <= cdc_checkpoint_op_id, to ensure that intents needed for CDC are not deleted before they get streamed by CDC. This means we have a fair amount of memory per transaction, and cannot set the CDC retention interval too high before memory usage becomes prohibitive.

We should instead clear this metadata from memory once the transaction has been applied, and rely on on-disk metadata for CDC.

Context: #15533

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

…tely Summary: TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata. This diff makes the following changes: 1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`. 2. The transaction is immediately cleared from memory. 3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case. - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`. These changes are behind two newly added gflags: - `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it - `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on These gflags are on in debug and off in release by default. **Regarding CDC unit tests:** After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken). Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors. **Upgrade/Rollback safety:** This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents. These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off. This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup). This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry. Jira: DB-10206 Test Plan: Jenkins. Reviewers: sergei, skumar, mbautin, rthallam Reviewed By: sergei, rthallam Subscribers: yql, timur, ycdcxcluster, ybase, bogdan, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31900

… immediately Summary: TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata. This diff makes the following changes: 1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`. 2. The transaction is immediately cleared from memory. 3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case. - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`. These changes are behind two newly added gflags: - `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it - `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on These gflags are on in debug and off in release by default. **Regarding CDC unit tests:** After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken). Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors. **Upgrade/Rollback safety:** This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents. These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off. This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup). This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry. Jira: DB-10206 Test Plan: Jenkins. Reviewers: sergei, skumar, mbautin, rthallam Reviewed By: sergei, rthallam Subscribers: yql, timur, ycdcxcluster, ybase, bogdan, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31900

…om memory immediately Summary: Original commit: 559b2b0 / D31900 TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata. This diff makes the following changes: 1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`. 2. The transaction is immediately cleared from memory. 3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case. - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`. These changes are behind two newly added gflags: - `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it - `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on These gflags are on in debug and off in release by default. **Regarding CDC unit tests:** After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken). Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors. **Upgrade/Rollback safety:** This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents. These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off. This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup). This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry. Jira: DB-10206 Test Plan: Jenkins. Reviewers: sergei, skumar, mbautin, rthallam Reviewed By: rthallam Subscribers: rthallam, bogdan, ybase, ycdcxcluster, timur, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33174

…r CDC from memory immediately Summary: Original commit: 559b2b0 / D31900 TransactionParticipant currently keeps a transaction in memory until `apply_op_id <= cdc_checkpoint_op_id`, in order to avoid cleaning up intents for changes not yet streamed by CDC. This includes a fair amount of metadata (RunningTransaction, 720 bytes per transaction, on RF number of tservers) and makes memory requirements for large CDC retention windows infeasible. For even a modest 100 transaction/second workload and the current 4 hour window, this means we may at worst case dedicate ~1 GB * RF memory solely to keeping this metadata. This diff makes the following changes: 1. When attempting to clean up a transaction where `apply_op_id > cdc_checkpoint_op_id`, a post-apply transaction metadata entry is written to intentsdb, containing the apply op id/commit ht/apply ht. The key for this entry is `[transaction id] 00`. 2. The transaction is immediately cleared from memory. 3. TransactionParticipant no longer triggers per-transaction intent cleanup in most cases if CDC is active, as there is no longer information in memory about whether it is safe to do so. We instead now rely on cleanup of entire SST files to clean up these intents in this case. - Exception: per-transaction intent cleanup occurs during compaction, which has been modified to locate the post-apply transaction metadata and thus can load the apply op id to compare against `cdc_checkpoint_op_id`. These changes are behind two newly added gflags: - `cdc_write_post_apply_metadata` controls whether the new post-apply transaction metadata entry should be written if a transaction is cleaned up before CDC has streamed it - `cdc_immediate_transaction_cleanup` controls whether to apply the new cleanup logic. `cdc_write_post_apply_metadata` must be on if this is on These gflags are on in debug and off in release by default. **Regarding CDC unit tests:** After these changes, CDC unit tests that check for intent count must be handled with more caution. If the unit tests do not wait for the post-apply transaction metadata to be written, intent counts may be non-deterministic (post-apply metadata may be written before or after counts are taken). Furthermore, as per-transaction cleanup is disabled except for compaction, intents are now generally cleaned up when CDC changes are fetched and the checkpoint op id is updated (which then triggers a full SST file cleanup). If the writing of the post-apply metadata is not waited for, then it may happen after CDC changes are fetched and the transactions intents are cleaned. In a production system, this would just get cleaned up in the next compaction or when the next SST file is deleted, but in a test counting number of intents, this may lead to off by one errors. **Upgrade/Rollback safety:** This change adds the PostApplyTransactionMetadata protobuf, which is written to intentsdb when cleaning a transaction from memory while CDC still needs its intents. These changes are guarded behind the `cdc_write_post_apply_metadata` gflag, which is default off. This flag should only be enabled after upgrade has finished, to ensure that we do not send post-apply transaction metadata to nodes that do not understand it during RBS. Post-upgrade, we handle the case where this metadata is not present safely (assume that it is unsafe to cleanup). This flag will eventually need to be converted into an auto flag, as the change is not rollback safe: once post-apply transaction metadata is present in intents, rolling back will cause this metadata to be incorrectly treated as a malformed reverse index entry. Jira: DB-10206 Test Plan: Jenkins. Reviewers: sergei, skumar, mbautin, rthallam Reviewed By: rthallam Subscribers: rthallam, bogdan, ybase, ycdcxcluster, timur, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33174

es1024 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Mar 4, 2024

es1024 self-assigned this Mar 4, 2024

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Mar 4, 2024

es1024 mentioned this issue Mar 4, 2024

[DocDB] Move opid from util to common #21291

Closed

1 task

rthallamko3 added the 2024.1 Backport Required label Mar 19, 2024

This was referenced Mar 19, 2024

[CDCSDK] Address performance impact of increasing retention period of intents #21580

Closed

[CDCSDK] Long term retention of intents #15533

Closed

es1024 mentioned this issue Mar 21, 2024

[CDCSDK] Catchup on lagging CDC stream takes a long time #21611

Open

1 task

es1024 added the 2024.1.0_blocker label May 2, 2024

rthallamko3 closed this as completed May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DocDB] Cleanup transactions from memory before CDC streams changes #21290

[DocDB] Cleanup transactions from memory before CDC streams changes #21290

es1024 commented Mar 4, 2024 •

edited by jira bot

Loading

[DocDB] Cleanup transactions from memory before CDC streams changes #21290

[DocDB] Cleanup transactions from memory before CDC streams changes #21290

Comments

es1024 commented Mar 4, 2024 • edited by jira bot Loading

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

es1024 commented Mar 4, 2024 •

edited by jira bot

Loading