Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete cleanup in case of a create stream failure due to a late alter table #20725

Closed
yugabyte-ci opened this issue Jan 22, 2024 · 0 comments
Assignees
Labels
area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/low Low priority

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented Jan 22, 2024

Jira Link: DB-9727

@yugabyte-ci yugabyte-ci added area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/low Low priority status/awaiting-triage Issue awaiting triage labels Jan 22, 2024
asrinivasanyb added a commit that referenced this issue Jan 24, 2024
…e to a late alter table

Summary:
When a create stream fails, a cleanup is performed.

The UpdatesPeersAndMetrics background thread is involved in this cleanup.
As part of the rollback, the cdc_state table entries associated with
the stream have	their checkpoint set to	OpId::Max. This	is the indication
to UpdatePeersAndMetrics to perform cleanup for	the stream in question.

One of the ways	in which a create stream could fail is if the ALTER TABLEs
that were issued as part of the	create stream timeout. However,	an ALTER
TABLE response that comes late, after the rollback and cleanup have been
performed, could put back	an entry in the	cdc_state table	for the	same stream
with a valid checkpoint. This in turn could once again cause resources to be
used. This issue is caused because the Stream metadata is not refreshed from
master by UpdatePeersAndMetrics.

The solution is	for UpdatePeersAndMetrics to refresh the stream	metadata from
master for a stream that is in the INITIATED state. This would result in
UpdatePeersANdMetrics realizing	that the stream	has been deleted and do	the
necessary cleanup again. Care is taken to limit the number of metadata refresh
calls to master per stream per round of UpdatePeersAndMetrics.
Jira: DB-9727

Test Plan: All tests

Reviewers: stiwary, skumar

Reviewed By: stiwary

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D31882
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jan 24, 2024
asrinivasanyb added a commit that referenced this issue Jan 25, 2024
…ream failure due to a late alter table

Summary:
**Backport Description**

There was one merge conflct in cdc_service.cc

**Original Description**
Original commit: 054374b / D31882
When a create stream fails, a cleanup is performed.

The UpdatesPeersAndMetrics background thread is involved in this cleanup.
As part of the rollback, the cdc_state table entries associated with
the stream have	their checkpoint set to	OpId::Max. This	is the indication
to UpdatePeersAndMetrics to perform cleanup for	the stream in question.

One of the ways	in which a create stream could fail is if the ALTER TABLEs
that were issued as part of the	create stream timeout. However,	an ALTER
TABLE response that comes late, after the rollback and cleanup have been
performed, could put back	an entry in the	cdc_state table	for the	same stream
with a valid checkpoint. This in turn could once again cause resources to be
used. This issue is caused because the Stream metadata is not refreshed from
master by UpdatePeersAndMetrics.

The solution is	for UpdatePeersAndMetrics to refresh the stream	metadata from
master for a stream that is in the INITIATED state. This would result in
UpdatePeersANdMetrics realizing	that the stream	has been deleted and do	the
necessary cleanup again. Care is taken to limit the number of metadata refresh
calls to master per stream per round of UpdatePeersAndMetrics.
Jira: DB-9727

Test Plan: All tests

Reviewers: stiwary, skumar

Reviewed By: stiwary

Subscribers: ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31921
dr0pdb added a commit that referenced this issue Apr 17, 2024
…e operation via Walsender

Summary:
##### Backport Description
There was a small merge conflict in the `TestPgReplicationSlot.java` file in the `getTServerFlags` function. This is because https://phorge.dev.yugabyte.com/D33201 has not yet been backported to 2024.1 and I decided not to wait for it.

All the other files were merged without conflicts.

##### Original Description(s)

Original commits:
fddfec6 / D32859
459e6b6 / D33539
8d56bc5 / D33567

###### [#20726] YSQL: Introduce support for streaming update operation via Walsende
Before this revision, we only supported streaming Insert & Delete operations as part of the PG compatible logical replication support. This were added in
https://phorge.dev.yugabyte.com/D31997 and https://phorge.dev.yugabyte.com/D32868 respectively.

This revision extends the support to also update operations. To do this, the following has been implemented:
1. Decoding update operations and streaming them
2. Ability to get the replica identities stored in the stream metadata from yb-master and use it while decoding update records. It is used to determine whether we want to handle omitted values differently vs NULL values. We only want to do that for YB specific replica identities.

---

A key part of this support requires the capability to distinguish between NULL and omitted values for YB specific replica identity values. For example: consider replica identity (record type) `CHANGE`:

```
create table t (a int primary key, b int, c text);

insert into t values (1,1, 'abc');

update t set b = 2 where a = 1;      // Statement 1
update t set b = 3, c = NULL where a = 1;       // Statement 2
```

In Statement 1, c is not being updated while in statement 2, it is being explicit set to `NULL`. For statement 1, we need to be able to tell the client that we have decided not to send the column value for `c` as it is unchanged. We cannot send `c` as NULL as that would imply that the value was explicitly set to `NULL`. To do this, we will send omitted values as `'u'` which indicates an unchanges toasted value. The PG debezium connector (our fork) will be modified to treat this as a special case.

This requirement doesn't arise in replica identity values inherited from PG (DEFAULT, NOTHING, FULL) because you can deduce whether the column is omitted or NULL just through the replica identity value.

For e.g., consider the same example as above but with `FULL`. In this case, you'll also get the complete old tuple i.e. none of the columns are omitted. So if a column is sent as NULL, it means that it truly was NULL.

Similar deductions apply to `DEFAULT` and `NOTHING` which have no before image except primary keys.

---

**Upgrade/Rollback safety:**
The logical replication support is disabled via the test flag `ysql_TEST_enable_replication_slot_consumption`.

###### [#21620] YSQL: Handle schema change (DDL) in the walsender for streaming of records
Before this revision, we didn't support schema changes in the logical replication protocol implementation.

This revision introduces mechanism to handle them. The implementation works by keeping a per table read_time_ht which is the hybrid time at which the
Relation descriptor will be read from the PG catalog for that table.

The implementation is as follows:
1. At the start of streaming, the record_commit_ht stored in the cdc state table is used.
2. Upon receiving a DDL response for table t
   - Remember that we received a ddl event for this table in a hash table
3. Upon the first DML on table t
   - Set the yb_read_time to what we stored in step 2
   - Invalidate the relcache
   - Notify the reorder buffer which then notifies the pgoutput plugin that the schema for table t has changed. This ensures that pgoutput sends the schema to the client with the DML
   - mark that we have handled the DDL event for the table t in the hash table

**Upgrade/Rollback safety:**
Feature is disabled under test flag: `ysql_TEST_enable_replication_slot_consumption`

###### [#21686] CDCSDK: Store replica identity of tables created after the creation of the stream

Before this revision, the replica identity of tables were stored at the stream creation time and used throughout streaming. This didn't
include tables that were created and added to the stream after the creation of the stream.

This revision adds logic to store replica identities of such tables as part our background addition logic.

Note: With the present implementation, the only replica identity value possible for these tables is CHANGE (default) unless the users are able to sneak in an `ALTER TABLE REPLICA IDENTITY` statement immediately after the table creation and before the background thread handling such tables for CDC runs. Filed #21718 to allow an alternate mechanism for the same.

Jira: DB-9727

Test Plan:
`./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot'`
`./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot#testWithDDLs'`

Reviewers: asrinivasan, skumar

Reviewed By: skumar

Subscribers: bogdan, yql, ycdcxcluster, ybase, hsunder

Differential Revision: https://phorge.dev.yugabyte.com/D34212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/low Low priority
Projects
None yet
Development

No branches or pull requests

2 participants