Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support CREATE_REPLICATION_SLOT command #19211

Open
1 task done
dr0pdb opened this issue Sep 20, 2023 · 0 comments
Open
1 task done

[YSQL] Support CREATE_REPLICATION_SLOT command #19211

dr0pdb opened this issue Sep 20, 2023 · 0 comments
Assignees
Labels
area/cdcsdk CDC SDK area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/high High Priority

Comments

@dr0pdb
Copy link
Contributor

dr0pdb commented Sep 20, 2023

Jira Link: DB-8008

Description

Support creating logical replication slot via:

  1. CREATE_REPLICATION_SLOT
  2. pg_create_logical_replication_slot

Tasks:

Unsupported features:

  1. Temporary replication slot

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@dr0pdb dr0pdb added kind/enhancement This is an enhancement of an existing feature area/ysql Yugabyte SQL (YSQL) priority/high High Priority labels Sep 20, 2023
@dr0pdb dr0pdb added the area/cdcsdk CDC SDK label Sep 22, 2023
@dr0pdb dr0pdb self-assigned this Sep 22, 2023
dr0pdb added a commit that referenced this issue Oct 6, 2023
Summary:
Add support for creating CDCSDK stream for a namespace. This diff also adds support to get a CDC stream via `cdcsdk_ysql_replication_slot_name`. This is the first step in supporting a SQL syntax for CDC via the PG logical replication model.

We already have support to create a CDCSDK stream for a namespace but it is driven from `CDCServiceImpl::CreateCDCStreamForNamespace`. This diff adds it to master. The logic present in cdc_service will get deprecated in future diffs for YSQL.

**Create Stream Path**
Request validation - when namespace_id is populated, it is required that the following fields are passed:
1. `cdcsdk_ysql_replication_slot_name` if the namespace is PGSQL
2. `sourceType` in request options: Must be `cdcsdk`
3. `idType` must be `kNamespaceId`: It was an assumption within the code for CDCSDK stream. This diff makes the assumption explicit

Details:
1. The namespace_id must be populated
2. For creating namespace level CDCSDK stream
  - Tables within the namespace which aren't supported (without primary key) get skipped

A few helper methods were also added for code reuse.

**Follow Ups**
1. YSQL layer changes

**Upgrade/Rollback safety**
This diff modifies the sys-catalog entry stored in yb-master, the changes are not rollback safe. As a result, these changes will be disabled during an upgrade via an autoflag (yb_enable_replication_commands - `LocalPersisted`) and only enabled after the upgrade is finalized.

The responsibility of checking the autoflag `yb_enable_replication_commands` is on the clients of the `CreateCDCStream` and `GetCDCStream` RPC. The client is the YSQL layer commands of Replication Slot which will be added in future diffs.
Jira: DB-8008

Test Plan:
Newly added

```
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespace
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceCql
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidDuplicationSlotName
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidIdTypeOption
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceMissingReplicationSlotName
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestListCDCStreamsCDCSDKWithReplicationSlot
```

Existing CDC related tests are also relevant.

Reviewers: hsunder, aagrawal, skumar, asrinivasan

Reviewed By: hsunder

Subscribers: ycdcxcluster, ybase, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D28678
dr0pdb added a commit that referenced this issue Oct 17, 2023
… consecutive CreateCDCStream and DeleteCDCStream calls

Summary:
This diff fixes two problems:

1.  A race condition between `CreateCDCStream` and the CatalogManager background cleanup task. Scenario is outlined below
2. Initial checkpoint of the tables in the cdc_state_table for CDCSDK must be Invalid (-1,-1) instead of (0,0). The cdc_service sets it to (-1,-1) correctly but while moving the logic of yb-master in https://phorge.dev.yugabyte.com/D28678, I missed doing the same.

Race condition scenario:
1. CreateCDCStream with slot name `test_slot`
2. DeleteCDCStream with slot name `test_slot`: This marks the stream as deleting but the actual deletion happens in the background task
3. CreateCDCStream with slot name `test_slot`: This fails with error message - "CDC stream with the given replication slot name already exists" if the cleanup task has not yet been executed

This diff handles step 3 now by checking if the stream is in a `DELETING` state and if so, it deletes the entry from the `cdcsdk_replication_slots_to_stream_map_` map eagerly.
Jira: DB-8008, DB-8009

Test Plan:
```
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateDropCDCStreamWithReplicationSlotName
```

Reviewers: hsunder, skumar

Reviewed By: hsunder

Subscribers: ycdcxcluster, ybase, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29271
dr0pdb added a commit that referenced this issue Oct 29, 2023
…mands can be issued to a walsender.'

Summary:
Import PG 11 commit - 'Fix limitations on what SQL commands can be issued to a walsender.'. This diff is needed to be able to detect whether a command is a replication command or not without an error. It'll be used in the replication slot support to be introduced as part of https://phorge.dev.yugabyte.com/D29194.

Original Commit (PG 15): postgres/postgres@6aa5186
Batchpatch Commit (PG 11): postgres/postgres@4ec5449

Original Description
```
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands.  Poor design of the replication-command
parser caused it to fail in various cases, notably:

* semicolons embedded in a command, or multiple SQL commands
sent in a single message;

* dollar-quoted literals containing odd numbers of single
or double quote marks;

* commands starting with a comment.

The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command.  Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.

We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command.  That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token.  If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing.  Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.

(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)

In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type.  (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)

I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it.  The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.

Per bug #17379 from Greg Rychlewski.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
```
Jira: DB-8008, DB-8009

Test Plan: NA, new code that isn't enabled yet. Tests along with YB specific changes will be added as part of https://phorge.dev.yugabyte.com/D29194

Reviewers: aagrawal, skumar, dsrinivasan

Reviewed By: aagrawal

Subscribers: jason, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29723
dr0pdb added a commit that referenced this issue Oct 29, 2023
… of replication slots

Summary:
Introduce support for creating, viewing, and dropping replication slots in YSQL. This is part of the project to support Publication/Replication slot API in YSQL (#18724).

There are two interfaces for the support for create and drop:
1. Functions:
    - `pg_create_logical_replication_slot`
    -  `pg_drop_replication_slot`
2. Walsender commands:
    - CREATE_REPLICATION_SLOT
    - DROP_REPLICATION_SLOT

Both create and drop statements go to yb-master directly. Most of the PG code isn't applicable to YSQL yet and hence it is skipped.

For viewing replication slots, we have a view `pg_replication_slots` which is backed by the function `pg_get_replication_slots`. The schema of the view has been modified by adding an extra yb-specific column `yb_stream_id` which is a text.

Limitations:
1. Only `yboutput` plugin is supported. It'll only be relevant once we add support for consuming replication slots but we are enforcing it from this diff onwards

Apart from the above, this diff fixes two issues:
1. #19509 - Cleanup of held locks in case of an `ereport(elevel >= ERROR)`. This diff fixes that by making sure that we call `LWLockReleaseAll` in `src/postgres/src/backend/access/transam/xact.c` in case of an error. Thanks to Timothy Elgersma.
2. Skipping cache refresh in case of an error in the execution of a replication command. `src/postgres/src/backend/tcop/postgres.c`. This is ok because we only cache DMLs and none of the replication commands are DMLs. We need to do that because the check `yb_is_dml_command` tries to parse the query to check whether it is a DML or not but it doesn't support replication commands. So any `ereport(elevel >= ERROR)` in the execution of a replication command would lead to a syntax error.

TODOs for future:
1. This diff creates a CDC stream with CDCRecordType as `CHANGE`. We need to extend the `pg_create_logical_replication_slot` and `CREATE_REPLICATION_SLOT` syntax to take the CDCRecordType. It'll be done in a future diff
2. DROP_REPLICATION_SLOT commands allows waiting for a slot to become inactive before dropping it. It is unsupported currently and will be done in a future diff
3. Temporary replication slots are unsupported. Will be added in future once we also support consumption via Walsender

Upgrade\Rollback safety:
These changes rely on sys-catalog changes done in yb-master. As a result, all the commands are disabled during upgrade using an autoflag yb_enable_replication_commands (LocalPersisted) and will only be enabled once the user has committed to the new version.

The autoflag was introduced during the Publication syntax support and is being reused here since these are both part of the same project: https://phorge.dev.yugabyte.com/D28721
Jira: DB-8008, DB-8009, DB-8305

Test Plan:
New unit test

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot'
```

New Regress test
```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressReplicationSlot'
```

I've also updated most of the CDCSDK tests to now use the ReplicationSlot commands to create a CDCSDK stream instead of an RPC. Remaining tests will be updated in future diffs

Reviewers: dsrinivasan, skumar, asrinivasan, aagrawal

Reviewed By: dsrinivasan

Subscribers: ycdcxcluster, bogdan, ybase, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29194
dr0pdb added a commit that referenced this issue Nov 2, 2023
…atabase oid from output of pg_replication_slots

Summary:
This diff fixes the brittle TestPgRegressReplicationSlot test by removing the database oid from the output of `pg_replication_slots`.
Jira: DB-8008, DB-8009

Test Plan:
Jenkins: test regex: .*TestPgRegressReplicationSlot.*

./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressReplicationSlot'

Reviewers: jason, dsrinivasan, myang

Reviewed By: jason, myang

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29799
dr0pdb added a commit that referenced this issue Nov 3, 2023
…Fix limitations on what SQL commands can be issued to a walsender.'"

Summary:
This reverts commit 0b05735 added as part of https://phorge.dev.yugabyte.com/D29723.

We need to revert this commit because we need to import another PG commit (c39983600b284466692f39355536158770527e2f) that was added before this. We'll add it back after importing `c39983600b284466692f39355536158770527e2f`.

Apart from reverting the commit, this diff removes the changes that were added in https://phorge.dev.yugabyte.com/D29194 which depend on this commit. Those changes were present in two places:
1. `postgres.c`
2. `TestPgReplicationSlot.java`

These will be added again once we import both `c39983600b284466692f39355536158770527e2f` and `4ec54498c5eae338bbc8ce391df4297c314624ff`.
Jira: DB-8008, DB-8009

Test Plan: Existing replication slot tests

Reviewers: aagrawal

Reviewed By: aagrawal

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29895
dr0pdb added a commit that referenced this issue Nov 3, 2023
… commands in pg_stat_activity.'

Summary:
This diff imports the upstream PG commit `Make walsenders show their replication commands in pg_stat_activity.` - `c39983600b284466692f39355536158770527e2f`.

PG commit: postgres/postgres@c399836
PG commit description:
```
A walsender process that has executed a SQL command left the text of
that command in pg_stat_activity.query indefinitely, which is quite
confusing if it's in RUNNING state but not doing that query.  An easy
and useful fix is to treat replication commands as if they were SQL
queries, and show them in pg_stat_activity according to the same rules
as for regular queries.  While we're at it, it seems also sensible to
set debug_query_string, allowing error logging and debugging to see
the replication command.

While here, clean up assorted silliness in exec_replication_command:
* Clean up SQLCmd code path, and fix its only-accidentally-not-buggy
  memory management.
* Remove useless duplicate call of SnapBuildClearExportedSnapshot().
* replication_scanner_finish() was never called.

Back-patch of commit f560209 into v10-v13.  I'd originally felt
that this didn't merit back-patching, but subsequent confusion
while debugging walsender problems suggests that it'll be useful.
Also, the original commit has now aged long enough to provide some
comfort that it won't cause problems.

Discussion: https://postgr.es/m/2673480.1624557299@sss.pgh.pa.us
Discussion: https://postgr.es/m/880181.1600026471@sss.pgh.pa.us
```
Jira: DB-8008, DB-8009

Test Plan: Existing tests on replication slots are sufficient. Also the feature is not released yet.

Reviewers: aagrawal

Reviewed By: aagrawal

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29937
dr0pdb added a commit that referenced this issue Nov 6, 2023
…mands can be issued to a walsender.'

Summary:
Import PG 11 commit - 'Fix limitations on what SQL commands can be issued to a walsender.'. This revision is needed to be able to detect whether a command is a replication command or not without an error. It'll be used in the replication slot support to be introduced as part of https://phorge.dev.yugabyte.com/D29194.

This revision also imports a 1-liner PG commit `Remember to reset yy_start state when firing up repl_scanner.l.` - postgres/postgres@449a696. It is being imported as part of this revision since it is a trivial change that doesn't warrant a separate diff.

Original Commit (PG 15): postgres/postgres@6aa5186
Batchpatch Commit (PG 11): postgres/postgres@4ec5449

Original Description
```
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands.  Poor design of the replication-command
parser caused it to fail in various cases, notably:

* semicolons embedded in a command, or multiple SQL commands
sent in a single message;

* dollar-quoted literals containing odd numbers of single
or double quote marks;

* commands starting with a comment.

The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command.  Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.

We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command.  That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token.  If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing.  Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.

(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)

In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type.  (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)

I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it.  The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.

Per bug #17379 from Greg Rychlewski.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
```
Jira: DB-8008, DB-8009

Test Plan: Existing replication slot tests.

Reviewers: aagrawal

Reviewed By: aagrawal

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29946
dr0pdb added a commit that referenced this issue Nov 6, 2023
…mand

Summary:
This revision adds back the changes to identify replication commands in `yb_is_dml_command`.

These changes were first introduced in https://phorge.dev.yugabyte.com/D29194 and had to be reverted in https://phorge.dev.yugabyte.com/D29895 while reverting the upstream PG import.

Jira: DB-8008, DB-8009

Test Plan: `./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot'`

Reviewers: aagrawal

Reviewed By: aagrawal

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29994
dr0pdb added a commit that referenced this issue Nov 7, 2023
…cation slots

Summary:
This diff defines whether a replication slot (CDC stream) is active or inactive.

PG Definition: In Postgres, a replication slot is considered as active if there is an backend (walsender) process actively streaming changes for it. This is implemented through the shared memory array of replication slots where the field `active_pid` contains the process ID of the backend (if any).

YB definition: In the first phase of the project (CRUD), there is no walsender support. Clients will consume changes using the Debezium connector which makes RPC calls to cdc_service. Hence, the PG mechanism isn't applicable to us. It is also not applicable in future when we introduce support for consumption via walsender. This is because a replication slot could be consumed from different nodes.

Hence, In YB, a replication slot (CDC stream) will be considered active if it has been consumed (via the GetChanges RPC) within the past `ysql_cdc_active_replication_slot_window_ms` duration which is a Tserver GFlag.

To determine whether a stream has been consumed in the past `ysql_cdc_active_replication_slot_window_ms`, all the rows of the CDC state table for the stream_id are read and the maximum `active_time` across them is considered as the last_active_time of the CDC stream. **This diff adds support to read the CDC state table in the `pg_client_service` layer.**

This definition of active/inactive is used in two scenarios:
1. In the `pg_replication_slots` view, it is shown in one of the columns similar to PG
2. While dropping a replication slot, we throw an error if the stream is active similar to PG

Determining whether a stream is active or not is expensive but both the scenarios above are considered to be **infrequent**, hence we should be okay.

**Misc**
This diff also fixes a bug in the test-only function `WaitForGetChangesToFetchRecords`. The `actual_count` variable wasn't being reset back to zero for future iterations which should be done in the case of CDC streams with explicit checkpointing.

**Upgrade/Rollback safety:**
These changes are safe because:
1. The commands are already protected using `yb_enable_replication_commands` auto flag
2. The proto changes of this diff are between PG and local Tserver.
Jira: DB-8008, DB-8009

Test Plan:
```
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter "*TestReplicationSlotDropWithActiveInvalid*"
```

Reviewers: dsrinivasan, skumar, aagrawal

Reviewed By: dsrinivasan

Subscribers: ybase, yql, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29776
dr0pdb added a commit that referenced this issue Nov 9, 2023
…ation_slots

Summary:
PG limits the number of replication slots (CDC stream) using a GUC `max_replication_slots`.

This revision adds this to YSQL as well by adding a GFlag wrapping over the GUC and uses it in yb-master to limit the number of CDCSDK streams.

A new error code `REPLICATION_SLOT_LIMIT_REACHED` and StatusCode `LimitReached` has been added. The status code is used by the YSQL layer to show the same error message as shown by PG ("all replication slots are in use") to the user.

**Upgrade/Rollback safety:**
This change introduces new Status codes between RPCs. At the moment, the newly added code is only used as part of creation of replication slot via the YSQL commands which are already protected by the autoflag `yb_enable_replication_commands`.
Jira: DB-8008

Test Plan:
New test cases

```
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceLimitReached
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter "*TestCreateCDCStreamReplicationSlotLimit*"
```

Reviewers: hsunder, dsrinivasan, skumar

Reviewed By: dsrinivasan

Subscribers: yql, ybase, ycdcxcluster, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D29897
Spikatrix pushed a commit to Spikatrix/yugabyte-db that referenced this issue Nov 21, 2023
…mands a TEST flag

Summary:
This revision makes the already existing autoflag `yb_enable_replication_commands` a `TEST` flag.

This is safe to do because the revision https://phorge.dev.yugabyte.com/D28721 that introduced the flag is only on the master branch. The flag should have been a 'TEST' flag from the start since we want to make further changes before enabling the replication slots feature by default.

Now the behavior of `CreateCDCStream` RPC in yb-master is as follows:
1. `yb_enable_replication_commands` - true
  - YSQL commands for replication slots - supported and the namespace id is read from the `namespace_id` field
  - CDC service (yb-admin) - supported and the namespace id is read from the `namespace_id` field
2. `yb_enable_replication_commands` - false
  - YSQL commands - disallowed
  - CDC service (yb-admin) - supported using the old mechanism of reading the namespace id from the `table_id` field

YSQL commands are identified with the presence of the `cdcsdk_ysql_replication_slot_name` in the request.

As part of this work, we also cleanup the `kNamespaceId` mode from the `CreateNewCDCStreamMode` since the cdc_service just forwards the request to yb-master now, hence we no longer to support the older flow.
Jira: DB-8008, DB-7751

Test Plan:
New tests

```
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter "*TestCDCStreamCreationViaCDCServiceWithReplicationCommands*"
```

Existing CDC and master_xrepl tests

Reviewers: asrinivasan, skumar, hsunder, xCluster

Reviewed By: hsunder

Subscribers: yql, ybase, ycdcxcluster, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30234
pull bot pushed a commit to muzammilar/yugabyte-db that referenced this issue Nov 24, 2023
…to_stream_map_ if they are deleted from cdc_stream_map_

Summary:
`cdc_stream_map_` stores the metadata of every CDC stream in yb-master. It is a map from `stream_id` to the metadata.
`cdcsdk_replication_slots_to_stream_map_` stores the mapping from the replication slot name to stream id in yb-master.

Whenever an entry is deleted from the `cdc_stream_map_`, we must also check and delete the corresponding entry from `cdcsdk_replication_slots_to_stream_map_`.

This revision ensures we do that in the flow where a CDC stream is cleaned up if all tables that belong to it are dropped.
Jira: DB-8008

Test Plan:
New test case

`./yb_build.sh --cxx-test cdcsdk_stream-test --gtest_filter CDCSDKStreamTest.TestPgReplicationSlotCreateWithDropTable`

Reviewers: hsunder, skumar, xCluster, asrinivasan

Reviewed By: skumar

Subscribers: ybase, ycdcxcluster, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30398
dr0pdb added a commit that referenced this issue Dec 13, 2023
Summary:
Presently, the `CreateCDCStream` RPC on yb-master has two temporary sleep statements. They are for:

1. Waiting for the CDCStateTable creation
2. Waiting for the `AlterTable` calls for each table to finish

These sleep statements were added as part of consistent snapshot support: https://phorge.dev.yugabyte.com/D29987

This revision removes the first sleep and updates the second sleep statement to wait for the AlterTable calls for each table to finish.
Jira: DB-8008

Test Plan:
Jenkins: no java, test regex: .*CDCSDKConsistentSnapshotTest.*

Existing CDCSDK tests

Reviewers: asrinivasan, skumar, hsunder

Reviewed By: hsunder

Subscribers: ybase, ycdcxcluster, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30606
dr0pdb added a commit that referenced this issue Dec 13, 2023
…n for namespace

Summary:
**Backport Description**
The merge was clean but a few additional changes have been made. These changes are needed to ensure we start the changes with a TEST flag. On the master branch, we started off with an autoflag and later changed it to TEST flag (https://phorge.dev.yugabyte.com/D30234).

Two changes have been made:
1. Test flag declared in `pg_wrapper.cc` - Taken from https://phorge.dev.yugabyte.com/D30234
2. Disabled the changes if the flag is false - Temporary and will be removed once https://phorge.dev.yugabyte.com/D30234 is backported.

**Original Description**
Original commit: 830fb9a / D28678
Add support for creating CDCSDK stream for a namespace. This diff also adds support to get a CDC stream via `cdcsdk_ysql_replication_slot_name`. This is the first step in supporting a SQL syntax for CDC via the PG logical replication model.

We already have support to create a CDCSDK stream for a namespace but it is driven from `CDCServiceImpl::CreateCDCStreamForNamespace`. This diff adds it to master. The logic present in cdc_service will get deprecated in future diffs for YSQL.

**Create Stream Path**
Request validation - when namespace_id is populated, it is required that the following fields are passed:
1. `cdcsdk_ysql_replication_slot_name` if the namespace is PGSQL
2. `sourceType` in request options: Must be `cdcsdk`
3. `idType` must be `kNamespaceId`: It was an assumption within the code for CDCSDK stream. This diff makes the assumption explicit

Details:
1. The namespace_id must be populated
2. For creating namespace level CDCSDK stream
  - Tables within the namespace which aren't supported (without primary key) get skipped

A few helper methods were also added for code reuse.

**Follow Ups**
1. YSQL layer changes

**Upgrade/Rollback safety**
This diff modifies the sys-catalog entry stored in yb-master, the changes are not rollback safe. As a result, these changes will be disabled during an upgrade via an autoflag (yb_enable_replication_commands - `LocalPersisted`) and only enabled after the upgrade is finalized.

The responsibility of checking the autoflag `yb_enable_replication_commands` is on the clients of the `CreateCDCStream` and `GetCDCStream` RPC. The client is the YSQL layer commands of Replication Slot which will be added in future diffs.
Jira: DB-8008

Test Plan:
Newly added

```
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespace
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceCql
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidDuplicationSlotName
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceInvalidIdTypeOption
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateCDCStreamForNamespaceMissingReplicationSlotName
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestListCDCStreamsCDCSDKWithReplicationSlot
```

Existing CDC related tests are also relevant.

Reviewers: hsunder, aagrawal, skumar, asrinivasan, xCluster

Reviewed By: skumar

Subscribers: bogdan, ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30919
dr0pdb added a commit that referenced this issue Dec 13, 2023
…exists error in consecutive CreateCDCStream and DeleteCDCStream calls

Summary:
**Backport Description**
The merge was clean. The only difference here is that the changes are under a TEST flag.

**Original Description**
Original commit: 3924d41 / D29271
This diff fixes two problems:

1.  A race condition between `CreateCDCStream` and the CatalogManager background cleanup task. Scenario is outlined below
2. Initial checkpoint of the tables in the cdc_state_table for CDCSDK must be Invalid (-1,-1) instead of (0,0). The cdc_service sets it to (-1,-1) correctly but while moving the logic of yb-master in https://phorge.dev.yugabyte.com/D28678, I missed doing the same.

Race condition scenario:
1. CreateCDCStream with slot name `test_slot`
2. DeleteCDCStream with slot name `test_slot`: This marks the stream as deleting but the actual deletion happens in the background task
3. CreateCDCStream with slot name `test_slot`: This fails with error message - "CDC stream with the given replication slot name already exists" if the cleanup task has not yet been executed

This diff handles step 3 now by checking if the stream is in a `DELETING` state and if so, it deletes the entry from the `cdcsdk_replication_slots_to_stream_map_` map eagerly.
Jira: DB-8008, DB-8009

Test Plan:
```
./yb_build.sh --cxx-test master_xrepl-test --gtest_filter MasterTestXRepl.TestCreateDropCDCStreamWithReplicationSlotName
```

Reviewers: hsunder, skumar, xCluster

Reviewed By: skumar

Subscribers: bogdan, ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31005
dr0pdb added a commit that referenced this issue Dec 18, 2023
…CDCSDK streams

Summary:
**Backport Description**
Merge was clean.

**Original Description**
Original commit: ce0726e / D30606
Presently, the `CreateCDCStream` RPC on yb-master has two temporary sleep statements. They are for:

1. Waiting for the CDCStateTable creation
2. Waiting for the `AlterTable` calls for each table to finish

These sleep statements were added as part of consistent snapshot support: https://phorge.dev.yugabyte.com/D29987

This revision removes the first sleep and updates the second sleep statement to wait for the AlterTable calls for each table to finish.
Jira: DB-8008

Test Plan:
Jenkins: no java, test regex: .*CDCSDKConsistentSnapshotTest.*

Existing CDCSDK tests

Reviewers: asrinivasan, skumar, hsunder, xCluster

Reviewed By: skumar

Subscribers: bogdan, ycdcxcluster, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdcsdk CDC SDK area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/high High Priority
Projects
None yet
Development

No branches or pull requests

1 participant