[yugabyte/yugabyte-db#20136] Add tablets to snapshotCompletedTablets set in form of tableId.tabletId #300
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Current behaviour:
Assume connector restarted after successful completion of snapshot bootstrap call.
After the GetCheckpoint call, we find that if we had already completed snapshot on a tablet, we add the tablet’s entry to
snapshotCompletedTablets
set in the form ofTabletId
.During the snapshot consumption phase, we check for a tablet's entry in this set by looking for
partition.getId()
. Butpartition.getId()
returnsTableId.TabletID
. Since the format is different, the check fails to find the entry and we tend to poll on tablets even though they we had already completed snapshot on them.Due to this, the following issues pop-up:
tabletsWaitingForCallback
set where we wait for kafka to send the snapshot complete marker as an acknowledgement for this tablet. Consequently, connector is stuck in the snapshot phase, and hence, it is never able to switch to the streaming phaseSNAPSHOT MARKED DONE BY CLIENT
to the transient stateSNAPSHOT FULLY CONSUMED
Consider the below example:
Solution
After Getcheckpoint call, add entry in the set in the form TableId.TabletId. This will ensure that during consumption phase, we do not poll on those tablets on which snapshot is already completed.
Note: This solution itself isnt sufficient to switch to streaming phase. This, along with fixes for #20134 & #20135 will ensure the connector switches to streaming phase.
Testing
Performed Manual testing:
Relevant Github Issue
[CDCSDK] Improve logic for adding tablets to snapshotCompletedTablets set