Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDCSDK] DBZ: Connector should resume snapshot if GetCheckpoint returns a key #19394

Closed
yugabyte-ci opened this issue Oct 3, 2023 · 1 comment · Fixed by yugabyte/debezium-connector-yugabytedb#273
Assignees
Labels
area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/medium Medium priority issue
Projects

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented Oct 3, 2023

Jira Link: DB-8188

The PR yugabyte/debezium-connector-yugabytedb#272 has introduced one regression in the connector with regard to the snapshot resume capability. This was discovered when one of the connector unit tests started failing i.e. YugabyteDBSnapshotResumeTest

Current logic is as follows:

  1. Get the checkpoint on a tablet using GetCheckpoint RPC
  2. If a snapshot_key is present, then take the snapshot
  3. Start snapshot process from YugabyteDBOffsetContext#snapshotStartLsn

Now, in step 3 above, we are assigning a from_op_id which indicates to take snapshot from the beginning regardless of whether the GetCheckpoint returns a checkpoint to resume.

@yugabyte-ci yugabyte-ci added area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/medium Medium priority issue labels Oct 3, 2023
@vaibhav-yb
Copy link
Contributor

Update:
Upon investigation, it was found that while the connector is snapshot mode, it uses a method getChangesForCDCSDK from AsyncYBClient which was passing null as the explicit checkpoint even when the connector was setting the correct value.

This was not discovered in the tests because the test to verify this was marked with the annotation @PreviewOnly and was thus being skipped from our Jenkins pipelines.

vaibhav-yb added a commit that referenced this issue Oct 5, 2023
…checkpointing

Summary:
This diff modifies a method in `AsyncYBClient` which was earlier passing a `null` value to another overloaded method. The effect of this was that the `explicit_cdc_sdk_opid` was not even being passed to the service.

Another thing this diff modifies is that while setting the checkpoint during snapshot, service was using the `from_op_id` whereas we should be using the `explicit_cdc_sdk_opid` if the stream is in `EXPLICIT` checkpointing mode.
Jira: DB-8188

Test Plan: Run existing tests to verify regression.

Reviewers: skumar, asrinivasan, stiwary

Reviewed By: asrinivasan

Subscribers: yugaware, ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29060
vaibhav-yb added a commit that referenced this issue Oct 5, 2023
…er in explicit checkpointing

Summary:
Original commit: 21ae05f / D29060

This diff modifies a method in `AsyncYBClient` which was earlier passing a null value to another overloaded method. The effect of this was that the `explicit_cdc_sdk_opid` was not even being passed to the service.

Another thing this diff modifies is that while setting the checkpoint during snapshot, service was using the `from_op_id` whereas we should be using the `explicit_cdc_sdk_opid` if the stream is in `EXPLICIT` checkpointing mode.

**2.18 only note:**
Skipping the yb-client version changes from the original diff while backporting.
Jira: DB-8188

Test Plan: Run existing tests to verify there's no regression

Reviewers: skumar, asrinivasan, stiwary

Reviewed By: asrinivasan

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29095
vaibhav-yb added a commit to yugabyte/debezium-connector-yugabytedb that referenced this issue Oct 6, 2023
vaibhav-yb added a commit that referenced this issue Oct 9, 2023
…not null

Summary:
This diff modifies the way we populate the key in the `from_op_id` and `explicit_op_id` parameter of `GetChangesRequest` and now we will only set the key if it is not `null` - if it is `null` then we will not do anything to key.
Jira: DB-8188

Test Plan: Jenkins: java only

Reviewers: skumar, stiwary, asrinivasan

Reviewed By: skumar

Subscribers: ycdcxcluster, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D29124
vaibhav-yb added a commit that referenced this issue Oct 9, 2023
…er in explicit checkpointing

Summary:
Original commit: 21ae05f / D29060

This diff modifies a method in `AsyncYBClient` which was earlier passing a `null` value to another overloaded method. The effect of this was that the `explicit_cdc_sdk_opid` was not even being passed to the service.

Another thing this diff modifies is that while setting the checkpoint during snapshot, service was using the `from_op_id` whereas we should be using the `explicit_cdc_sdk_opid` if the stream is in `EXPLICIT` checkpointing mode.

**2.20 only note:**
Skipping the `yb-client` version changes from the original diff while backporting.
Jira: DB-8188

Test Plan: Run existing tests to verify no regression.

Reviewers: skumar, stiwary, asrinivasan

Reviewed By: asrinivasan

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29122
@vaibhav-yb vaibhav-yb added this to In progress in CDC Oct 9, 2023
vaibhav-yb added a commit that referenced this issue Oct 10, 2023
…eter in explicit checkpointing

Summary:
Original commit: 21ae05f / D29060

This diff modifies a method in `AsyncYBClient` which was earlier passing a `null` value to another overloaded method. The effect of this was that the `explicit_cdc_sdk_opid` was not even being passed to the service.

Another thing this diff modifies is that while setting the checkpoint during snapshot, service was using the `from_op_id` whereas we should be using the `explicit_cdc_sdk_opid` if the stream is in `EXPLICIT` checkpointing mode.

**2.19.3 only note:**
Skipping the `yb-client` version changes from the original diff while backporting.
Jira: DB-8188

Test Plan: Run existing tests to verify no regression.

Reviewers: skumar, stiwary, asrinivasan

Reviewed By: asrinivasan

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29126
vaibhav-yb added a commit that referenced this issue Oct 12, 2023
…st` when it is not null

Summary:
Original commit: f3dddd3 / D29124

This diff modifies the way we populate the key in the `from_op_id` and `explicit_op_id` parameter of `GetChangesRequest` and now we will only set the key if it is not `null` - if it is `null` then we will not do anything to key.

**2.18 only note:**
Skipping the version changes in yb-client.
Jira: DB-8188

Test Plan: Run existing tests.

Reviewers: skumar, asrinivasan, stiwary

Reviewed By: skumar

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29191
vaibhav-yb added a commit that referenced this issue Oct 12, 2023
…st` when it is not null

Summary:
Original commit: f3dddd3 / D29124

This diff modifies the way we populate the key in the `from_op_id` and `explicit_op_id` parameter of `GetChangesRequest` and now we will only set the key if it is not `null` - if it is `null` then we will not do anything to key.

**2.20 only note:**
Skipping the version changes in yb-client.
Jira: DB-8188

Test Plan: Run existing tests.

Reviewers: skumar, stiwary, asrinivasan

Reviewed By: skumar

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29193
vaibhav-yb added a commit that referenced this issue Oct 17, 2023
…uest` when it is not null

Summary:
Original commit: f3dddd3 / D29124

This diff modifies the way we populate the key in the `from_op_id` and `explicit_op_id` parameter of `GetChangesRequest` and now we will only set the key if it is not `null` - if it is `null` then we will not do anything to key.

**2.19.3 only note:**
Skipping the version changes in yb-client.
Jira: DB-8188

Test Plan: Run existing tests.

Reviewers: skumar, stiwary, asrinivasan

Reviewed By: skumar

Subscribers: ycdcxcluster

Differential Revision: https://phorge.dev.yugabyte.com/D29192
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdcsdk CDC SDK jira-originated kind/bug This issue is a bug priority/medium Medium priority issue
Projects
CDC
  
In progress
2 participants