New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Support CREATE_REPLICATION_SLOT command #19211
Labels
area/cdcsdk
CDC SDK
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/high
High Priority
Comments
dr0pdb
added
kind/enhancement
This is an enhancement of an existing feature
area/ysql
Yugabyte SQL (YSQL)
priority/high
High Priority
labels
Sep 20, 2023
dr0pdb
added a commit
that referenced
this issue
Oct 6, 2023
Summary: Add support for creating CDCSDK stream for a namespace. This diff also adds support to get a CDC stream via `cdcsdk_ysql_replication_slot_name`. This is the first step in supporting a SQL syntax for CDC via the PG logical replication model. We already have support to create a CDCSDK stream for a namespace but it is driven from `CDCServiceImpl::CreateCDCStreamForNamespace`. This diff adds it to master. The logic present in cdc_service will get deprecated in future diffs for YSQL. **Create Stream Path** Request validation - when namespace_id is populated, it is required that the following fields are passed: 1. `cdcsdk_ysql_replication_slot_name` if the namespace is PGSQL 2. `sourceType` in request options: Must be `cdcsdk` 3. `idType` must be `kNamespaceId`: It was an assumption within the code for CDCSDK stream. This diff makes the assumption explicit Details: 1. The namespace_id must be populated 2. For creating namespace level CDCSDK stream - Tables within the namespace which aren't supported (without primary key) get skipped A few helper methods were also added for code reuse. **Follow Ups** 1. YSQL layer changes **Upgrade/Rollback safety** This diff modifies the sys-catalog entry stored in yb-master, the changes are not rollback safe. As a result, these changes will be disabled during an upgrade via an autoflag (yb_enable_replication_commands - `LocalPersisted`) and only enabled after the upgrade is finalized. The responsibility of checking the autoflag `yb_enable_replication_commands` is on the clients of the `CreateCDCStream` and `GetCDCStream` RPC. The client is the YSQL layer commands of Replication Slot which will be added in future diffs. Jira: DB-8008 Test Plan: Newly added ``` ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespace ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceCql ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidDuplicationSlotName ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidIdTypeOption ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceMissingReplicationSlotName ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestListCDCStreamsCDCSDKWithReplicationSlot ``` Existing CDC related tests are also relevant. Reviewers: hsunder, aagrawal, skumar, asrinivasan Reviewed By: hsunder Subscribers: ycdcxcluster, ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D28678
dr0pdb
added a commit
that referenced
this issue
Oct 17, 2023
… consecutive CreateCDCStream and DeleteCDCStream calls Summary: This diff fixes two problems: 1. A race condition between `CreateCDCStream` and the CatalogManager background cleanup task. Scenario is outlined below 2. Initial checkpoint of the tables in the cdc_state_table for CDCSDK must be Invalid (-1,-1) instead of (0,0). The cdc_service sets it to (-1,-1) correctly but while moving the logic of yb-master in https://phorge.dev.yugabyte.com/D28678, I missed doing the same. Race condition scenario: 1. CreateCDCStream with slot name `test_slot` 2. DeleteCDCStream with slot name `test_slot`: This marks the stream as deleting but the actual deletion happens in the background task 3. CreateCDCStream with slot name `test_slot`: This fails with error message - "CDC stream with the given replication slot name already exists" if the cleanup task has not yet been executed This diff handles step 3 now by checking if the stream is in a `DELETING` state and if so, it deletes the entry from the `cdcsdk_replication_slots_to_stream_map_` map eagerly. Jira: DB-8008, DB-8009 Test Plan: ``` ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateDropCDCStreamWithReplicationSlotName ``` Reviewers: hsunder, skumar Reviewed By: hsunder Subscribers: ycdcxcluster, ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29271
dr0pdb
added a commit
that referenced
this issue
Oct 29, 2023
…mands can be issued to a walsender.' Summary: Import PG 11 commit - 'Fix limitations on what SQL commands can be issued to a walsender.'. This diff is needed to be able to detect whether a command is a replication command or not without an error. It'll be used in the replication slot support to be introduced as part of https://phorge.dev.yugabyte.com/D29194. Original Commit (PG 15): postgres/postgres@6aa5186 Batchpatch Commit (PG 11): postgres/postgres@4ec5449 Original Description ``` In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org ``` Jira: DB-8008, DB-8009 Test Plan: NA, new code that isn't enabled yet. Tests along with YB specific changes will be added as part of https://phorge.dev.yugabyte.com/D29194 Reviewers: aagrawal, skumar, dsrinivasan Reviewed By: aagrawal Subscribers: jason, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29723
dr0pdb
added a commit
that referenced
this issue
Oct 29, 2023
… of replication slots Summary: Introduce support for creating, viewing, and dropping replication slots in YSQL. This is part of the project to support Publication/Replication slot API in YSQL (#18724). There are two interfaces for the support for create and drop: 1. Functions: - `pg_create_logical_replication_slot` - `pg_drop_replication_slot` 2. Walsender commands: - CREATE_REPLICATION_SLOT - DROP_REPLICATION_SLOT Both create and drop statements go to yb-master directly. Most of the PG code isn't applicable to YSQL yet and hence it is skipped. For viewing replication slots, we have a view `pg_replication_slots` which is backed by the function `pg_get_replication_slots`. The schema of the view has been modified by adding an extra yb-specific column `yb_stream_id` which is a text. Limitations: 1. Only `yboutput` plugin is supported. It'll only be relevant once we add support for consuming replication slots but we are enforcing it from this diff onwards Apart from the above, this diff fixes two issues: 1. #19509 - Cleanup of held locks in case of an `ereport(elevel >= ERROR)`. This diff fixes that by making sure that we call `LWLockReleaseAll` in `src/postgres/src/backend/access/transam/xact.c` in case of an error. Thanks to Timothy Elgersma. 2. Skipping cache refresh in case of an error in the execution of a replication command. `src/postgres/src/backend/tcop/postgres.c`. This is ok because we only cache DMLs and none of the replication commands are DMLs. We need to do that because the check `yb_is_dml_command` tries to parse the query to check whether it is a DML or not but it doesn't support replication commands. So any `ereport(elevel >= ERROR)` in the execution of a replication command would lead to a syntax error. TODOs for future: 1. This diff creates a CDC stream with CDCRecordType as `CHANGE`. We need to extend the `pg_create_logical_replication_slot` and `CREATE_REPLICATION_SLOT` syntax to take the CDCRecordType. It'll be done in a future diff 2. DROP_REPLICATION_SLOT commands allows waiting for a slot to become inactive before dropping it. It is unsupported currently and will be done in a future diff 3. Temporary replication slots are unsupported. Will be added in future once we also support consumption via Walsender Upgrade\Rollback safety: These changes rely on sys-catalog changes done in yb-master. As a result, all the commands are disabled during upgrade using an autoflag yb_enable_replication_commands (LocalPersisted) and will only be enabled once the user has committed to the new version. The autoflag was introduced during the Publication syntax support and is being reused here since these are both part of the same project: https://phorge.dev.yugabyte.com/D28721 Jira: DB-8008, DB-8009, DB-8305 Test Plan: New unit test ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot' ``` New Regress test ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressReplicationSlot' ``` I've also updated most of the CDCSDK tests to now use the ReplicationSlot commands to create a CDCSDK stream instead of an RPC. Remaining tests will be updated in future diffs Reviewers: dsrinivasan, skumar, asrinivasan, aagrawal Reviewed By: dsrinivasan Subscribers: ycdcxcluster, bogdan, ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29194
dr0pdb
added a commit
that referenced
this issue
Nov 2, 2023
…atabase oid from output of pg_replication_slots Summary: This diff fixes the brittle TestPgRegressReplicationSlot test by removing the database oid from the output of `pg_replication_slots`. Jira: DB-8008, DB-8009 Test Plan: Jenkins: test regex: .*TestPgRegressReplicationSlot.* ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressReplicationSlot' Reviewers: jason, dsrinivasan, myang Reviewed By: jason, myang Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29799
dr0pdb
added a commit
that referenced
this issue
Nov 3, 2023
…Fix limitations on what SQL commands can be issued to a walsender.'" Summary: This reverts commit 0b05735 added as part of https://phorge.dev.yugabyte.com/D29723. We need to revert this commit because we need to import another PG commit (c39983600b284466692f39355536158770527e2f) that was added before this. We'll add it back after importing `c39983600b284466692f39355536158770527e2f`. Apart from reverting the commit, this diff removes the changes that were added in https://phorge.dev.yugabyte.com/D29194 which depend on this commit. Those changes were present in two places: 1. `postgres.c` 2. `TestPgReplicationSlot.java` These will be added again once we import both `c39983600b284466692f39355536158770527e2f` and `4ec54498c5eae338bbc8ce391df4297c314624ff`. Jira: DB-8008, DB-8009 Test Plan: Existing replication slot tests Reviewers: aagrawal Reviewed By: aagrawal Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29895
dr0pdb
added a commit
that referenced
this issue
Nov 3, 2023
… commands in pg_stat_activity.' Summary: This diff imports the upstream PG commit `Make walsenders show their replication commands in pg_stat_activity.` - `c39983600b284466692f39355536158770527e2f`. PG commit: postgres/postgres@c399836 PG commit description: ``` A walsender process that has executed a SQL command left the text of that command in pg_stat_activity.query indefinitely, which is quite confusing if it's in RUNNING state but not doing that query. An easy and useful fix is to treat replication commands as if they were SQL queries, and show them in pg_stat_activity according to the same rules as for regular queries. While we're at it, it seems also sensible to set debug_query_string, allowing error logging and debugging to see the replication command. While here, clean up assorted silliness in exec_replication_command: * Clean up SQLCmd code path, and fix its only-accidentally-not-buggy memory management. * Remove useless duplicate call of SnapBuildClearExportedSnapshot(). * replication_scanner_finish() was never called. Back-patch of commit f560209 into v10-v13. I'd originally felt that this didn't merit back-patching, but subsequent confusion while debugging walsender problems suggests that it'll be useful. Also, the original commit has now aged long enough to provide some comfort that it won't cause problems. Discussion: https://postgr.es/m/2673480.1624557299@sss.pgh.pa.us Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us ``` Jira: DB-8008, DB-8009 Test Plan: Existing tests on replication slots are sufficient. Also the feature is not released yet. Reviewers: aagrawal Reviewed By: aagrawal Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29937
dr0pdb
added a commit
that referenced
this issue
Nov 6, 2023
…mands can be issued to a walsender.' Summary: Import PG 11 commit - 'Fix limitations on what SQL commands can be issued to a walsender.'. This revision is needed to be able to detect whether a command is a replication command or not without an error. It'll be used in the replication slot support to be introduced as part of https://phorge.dev.yugabyte.com/D29194. This revision also imports a 1-liner PG commit `Remember to reset yy_start state when firing up repl_scanner.l.` - postgres/postgres@449a696. It is being imported as part of this revision since it is a trivial change that doesn't warrant a separate diff. Original Commit (PG 15): postgres/postgres@6aa5186 Batchpatch Commit (PG 11): postgres/postgres@4ec5449 Original Description ``` In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org ``` Jira: DB-8008, DB-8009 Test Plan: Existing replication slot tests. Reviewers: aagrawal Reviewed By: aagrawal Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29946
dr0pdb
added a commit
that referenced
this issue
Nov 6, 2023
…mand Summary: This revision adds back the changes to identify replication commands in `yb_is_dml_command`. These changes were first introduced in https://phorge.dev.yugabyte.com/D29194 and had to be reverted in https://phorge.dev.yugabyte.com/D29895 while reverting the upstream PG import. Jira: DB-8008, DB-8009 Test Plan: `./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot'` Reviewers: aagrawal Reviewed By: aagrawal Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29994
dr0pdb
added a commit
that referenced
this issue
Nov 7, 2023
…cation slots Summary: This diff defines whether a replication slot (CDC stream) is active or inactive. PG Definition: In Postgres, a replication slot is considered as active if there is an backend (walsender) process actively streaming changes for it. This is implemented through the shared memory array of replication slots where the field `active_pid` contains the process ID of the backend (if any). YB definition: In the first phase of the project (CRUD), there is no walsender support. Clients will consume changes using the Debezium connector which makes RPC calls to cdc_service. Hence, the PG mechanism isn't applicable to us. It is also not applicable in future when we introduce support for consumption via walsender. This is because a replication slot could be consumed from different nodes. Hence, In YB, a replication slot (CDC stream) will be considered active if it has been consumed (via the GetChanges RPC) within the past `ysql_cdc_active_replication_slot_window_ms` duration which is a Tserver GFlag. To determine whether a stream has been consumed in the past `ysql_cdc_active_replication_slot_window_ms`, all the rows of the CDC state table for the stream_id are read and the maximum `active_time` across them is considered as the last_active_time of the CDC stream. **This diff adds support to read the CDC state table in the `pg_client_service` layer.** This definition of active/inactive is used in two scenarios: 1. In the `pg_replication_slots` view, it is shown in one of the columns similar to PG 2. While dropping a replication slot, we throw an error if the stream is active similar to PG Determining whether a stream is active or not is expensive but both the scenarios above are considered to be **infrequent**, hence we should be okay. **Misc** This diff also fixes a bug in the test-only function `WaitForGetChangesToFetchRecords`. The `actual_count` variable wasn't being reset back to zero for future iterations which should be done in the case of CDC streams with explicit checkpointing. **Upgrade/Rollback safety:** These changes are safe because: 1. The commands are already protected using `yb_enable_replication_commands` auto flag 2. The proto changes of this diff are between PG and local Tserver. Jira: DB-8008, DB-8009 Test Plan: ``` ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter "*TestReplicationSlotDropWithActiveInvalid*" ``` Reviewers: dsrinivasan, skumar, aagrawal Reviewed By: dsrinivasan Subscribers: ybase, yql, ycdcxcluster Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29776
dr0pdb
added a commit
that referenced
this issue
Nov 9, 2023
…ation_slots Summary: PG limits the number of replication slots (CDC stream) using a GUC `max_replication_slots`. This revision adds this to YSQL as well by adding a GFlag wrapping over the GUC and uses it in yb-master to limit the number of CDCSDK streams. A new error code `REPLICATION_SLOT_LIMIT_REACHED` and StatusCode `LimitReached` has been added. The status code is used by the YSQL layer to show the same error message as shown by PG ("all replication slots are in use") to the user. **Upgrade/Rollback safety:** This change introduces new Status codes between RPCs. At the moment, the newly added code is only used as part of creation of replication slot via the YSQL commands which are already protected by the autoflag `yb_enable_replication_commands`. Jira: DB-8008 Test Plan: New test cases ``` ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceLimitReached ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter "*TestCreateCDCStreamReplicationSlotLimit*" ``` Reviewers: hsunder, dsrinivasan, skumar Reviewed By: dsrinivasan Subscribers: yql, ybase, ycdcxcluster, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29897
Spikatrix
pushed a commit
to Spikatrix/yugabyte-db
that referenced
this issue
Nov 21, 2023
…mands a TEST flag Summary: This revision makes the already existing autoflag `yb_enable_replication_commands` a `TEST` flag. This is safe to do because the revision https://phorge.dev.yugabyte.com/D28721 that introduced the flag is only on the master branch. The flag should have been a 'TEST' flag from the start since we want to make further changes before enabling the replication slots feature by default. Now the behavior of `CreateCDCStream` RPC in yb-master is as follows: 1. `yb_enable_replication_commands` - true - YSQL commands for replication slots - supported and the namespace id is read from the `namespace_id` field - CDC service (yb-admin) - supported and the namespace id is read from the `namespace_id` field 2. `yb_enable_replication_commands` - false - YSQL commands - disallowed - CDC service (yb-admin) - supported using the old mechanism of reading the namespace id from the `table_id` field YSQL commands are identified with the presence of the `cdcsdk_ysql_replication_slot_name` in the request. As part of this work, we also cleanup the `kNamespaceId` mode from the `CreateNewCDCStreamMode` since the cdc_service just forwards the request to yb-master now, hence we no longer to support the older flow. Jira: DB-8008, DB-7751 Test Plan: New tests ``` ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter "*TestCDCStreamCreationViaCDCServiceWithReplicationCommands*" ``` Existing CDC and master_xrepl tests Reviewers: asrinivasan, skumar, hsunder, xCluster Reviewed By: hsunder Subscribers: yql, ybase, ycdcxcluster, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30234
pull bot
pushed a commit
to muzammilar/yugabyte-db
that referenced
this issue
Nov 24, 2023
…to_stream_map_ if they are deleted from cdc_stream_map_ Summary: `cdc_stream_map_` stores the metadata of every CDC stream in yb-master. It is a map from `stream_id` to the metadata. `cdcsdk_replication_slots_to_stream_map_` stores the mapping from the replication slot name to stream id in yb-master. Whenever an entry is deleted from the `cdc_stream_map_`, we must also check and delete the corresponding entry from `cdcsdk_replication_slots_to_stream_map_`. This revision ensures we do that in the flow where a CDC stream is cleaned up if all tables that belong to it are dropped. Jira: DB-8008 Test Plan: New test case `./yb_build.sh --cxx-test cdcsdk_stream-test --gtest_filter CDCSDKStreamTest.TestPgReplicationSlotCreateWithDropTable` Reviewers: hsunder, skumar, xCluster, asrinivasan Reviewed By: skumar Subscribers: ybase, ycdcxcluster, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30398
dr0pdb
added a commit
that referenced
this issue
Dec 13, 2023
Summary: Presently, the `CreateCDCStream` RPC on yb-master has two temporary sleep statements. They are for: 1. Waiting for the CDCStateTable creation 2. Waiting for the `AlterTable` calls for each table to finish These sleep statements were added as part of consistent snapshot support: https://phorge.dev.yugabyte.com/D29987 This revision removes the first sleep and updates the second sleep statement to wait for the AlterTable calls for each table to finish. Jira: DB-8008 Test Plan: Jenkins: no java, test regex: .*CDCSDKConsistentSnapshotTest.* Existing CDCSDK tests Reviewers: asrinivasan, skumar, hsunder Reviewed By: hsunder Subscribers: ybase, ycdcxcluster, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30606
dr0pdb
added a commit
that referenced
this issue
Dec 13, 2023
…n for namespace Summary: **Backport Description** The merge was clean but a few additional changes have been made. These changes are needed to ensure we start the changes with a TEST flag. On the master branch, we started off with an autoflag and later changed it to TEST flag (https://phorge.dev.yugabyte.com/D30234). Two changes have been made: 1. Test flag declared in `pg_wrapper.cc` - Taken from https://phorge.dev.yugabyte.com/D30234 2. Disabled the changes if the flag is false - Temporary and will be removed once https://phorge.dev.yugabyte.com/D30234 is backported. **Original Description** Original commit: 830fb9a / D28678 Add support for creating CDCSDK stream for a namespace. This diff also adds support to get a CDC stream via `cdcsdk_ysql_replication_slot_name`. This is the first step in supporting a SQL syntax for CDC via the PG logical replication model. We already have support to create a CDCSDK stream for a namespace but it is driven from `CDCServiceImpl::CreateCDCStreamForNamespace`. This diff adds it to master. The logic present in cdc_service will get deprecated in future diffs for YSQL. **Create Stream Path** Request validation - when namespace_id is populated, it is required that the following fields are passed: 1. `cdcsdk_ysql_replication_slot_name` if the namespace is PGSQL 2. `sourceType` in request options: Must be `cdcsdk` 3. `idType` must be `kNamespaceId`: It was an assumption within the code for CDCSDK stream. This diff makes the assumption explicit Details: 1. The namespace_id must be populated 2. For creating namespace level CDCSDK stream - Tables within the namespace which aren't supported (without primary key) get skipped A few helper methods were also added for code reuse. **Follow Ups** 1. YSQL layer changes **Upgrade/Rollback safety** This diff modifies the sys-catalog entry stored in yb-master, the changes are not rollback safe. As a result, these changes will be disabled during an upgrade via an autoflag (yb_enable_replication_commands - `LocalPersisted`) and only enabled after the upgrade is finalized. The responsibility of checking the autoflag `yb_enable_replication_commands` is on the clients of the `CreateCDCStream` and `GetCDCStream` RPC. The client is the YSQL layer commands of Replication Slot which will be added in future diffs. Jira: DB-8008 Test Plan: Newly added ``` ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespace ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceCql ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidDuplicationSlotName ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidIdTypeOption ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceMissingReplicationSlotName ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestListCDCStreamsCDCSDKWithReplicationSlot ``` Existing CDC related tests are also relevant. Reviewers: hsunder, aagrawal, skumar, asrinivasan, xCluster Reviewed By: skumar Subscribers: bogdan, ybase, ycdcxcluster Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30919
dr0pdb
added a commit
that referenced
this issue
Dec 13, 2023
…exists error in consecutive CreateCDCStream and DeleteCDCStream calls Summary: **Backport Description** The merge was clean. The only difference here is that the changes are under a TEST flag. **Original Description** Original commit: 3924d41 / D29271 This diff fixes two problems: 1. A race condition between `CreateCDCStream` and the CatalogManager background cleanup task. Scenario is outlined below 2. Initial checkpoint of the tables in the cdc_state_table for CDCSDK must be Invalid (-1,-1) instead of (0,0). The cdc_service sets it to (-1,-1) correctly but while moving the logic of yb-master in https://phorge.dev.yugabyte.com/D28678, I missed doing the same. Race condition scenario: 1. CreateCDCStream with slot name `test_slot` 2. DeleteCDCStream with slot name `test_slot`: This marks the stream as deleting but the actual deletion happens in the background task 3. CreateCDCStream with slot name `test_slot`: This fails with error message - "CDC stream with the given replication slot name already exists" if the cleanup task has not yet been executed This diff handles step 3 now by checking if the stream is in a `DELETING` state and if so, it deletes the entry from the `cdcsdk_replication_slots_to_stream_map_` map eagerly. Jira: DB-8008, DB-8009 Test Plan: ``` ./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateDropCDCStreamWithReplicationSlotName ``` Reviewers: hsunder, skumar, xCluster Reviewed By: skumar Subscribers: bogdan, ybase, ycdcxcluster Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31005
dr0pdb
added a commit
that referenced
this issue
Dec 18, 2023
…CDCSDK streams Summary: **Backport Description** Merge was clean. **Original Description** Original commit: ce0726e / D30606 Presently, the `CreateCDCStream` RPC on yb-master has two temporary sleep statements. They are for: 1. Waiting for the CDCStateTable creation 2. Waiting for the `AlterTable` calls for each table to finish These sleep statements were added as part of consistent snapshot support: https://phorge.dev.yugabyte.com/D29987 This revision removes the first sleep and updates the second sleep statement to wait for the AlterTable calls for each table to finish. Jira: DB-8008 Test Plan: Jenkins: no java, test regex: .*CDCSDKConsistentSnapshotTest.* Existing CDCSDK tests Reviewers: asrinivasan, skumar, hsunder, xCluster Reviewed By: skumar Subscribers: bogdan, ycdcxcluster, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31145
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/cdcsdk
CDC SDK
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/high
High Priority
Jira Link: DB-8008
Description
Support creating logical replication slot via:
Tasks:
Default
andNothing
#19260max_replication_slots
#19264Unsupported features:
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: