Skip to content

Commit

Permalink
[BACKPORT 2.20][#22383] CDCSDK: Fix Tablet incorrectly declared not o…
Browse files Browse the repository at this point in the history
…f interest for stream before configured interval is reached

Summary:
**Backport Description**
The merge was clean

**Original Description**
Original commit: ebe498f / D35037
This is related to the CDC Consistent Snapshot feature (GH #18508).

Lack of interest in a tablet for a stream is inferred if the tablet has not been polled even once
since stream creation for a configurable time limit. This time limit is defined by the gflag
"cdcsdk_tablet_not_of_interest_timeout_secs". The default value of this flag is 4 hours.
This support was introduced in https://phorge.dev.yugabyte.com/D30907.

The flag value is converted to microseconds before being used to determine if the tablet has not
been polled by the stream for too long. However, this converted value was being incorrectly stored
in a int32 variable. An int64 is required to store even the default value of the flag in microseconds.
This was causing the problem.
Jira: DB-11282

Test Plan: Jenkins: test regex: .*CDC.*

Reviewers: stiwary, skumar

Reviewed By: stiwary

Subscribers: ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35082
  • Loading branch information
asrinivasanyb committed May 15, 2024
1 parent 7b00d0e commit a5a8446
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions src/yb/cdc/cdc_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3456,7 +3456,8 @@ Status CDCServiceImpl::CheckTabletNotOfInterest(
return Status::OK();
}

auto limit = GetAtomicFlag(&FLAGS_cdcsdk_tablet_not_of_interest_timeout_secs) * 1000 * 1000;
int64_t limit_flag = GetAtomicFlag(&FLAGS_cdcsdk_tablet_not_of_interest_timeout_secs);
auto limit = limit_flag * 1000 * 1000;
if (deletion_check) {
// Add a little bit more to the timeout limit to determine if the cdc_state table
// entry for this producer_tablet can be deleted. This will help avoid race conditions.
Expand All @@ -3477,8 +3478,9 @@ Status CDCServiceImpl::CheckTabletNotOfInterest(
}

VLOG(1) << "Stream: " << producer_tablet.stream_id
<< ", unpolled for too long " << (now - last_active_time) << "micros"
<< " for tablet: " << producer_tablet.tablet_id
<< ", unpolled for too long " << (now - last_active_time) << " mus"
<< ", limit was " << limit << " mus"
<< ", for tablet: " << producer_tablet.tablet_id
<< ", active time in CDCState table: " << last_active_time << ", current time: " << now;
return STATUS_FORMAT(
InternalError,
Expand Down

0 comments on commit a5a8446

Please sign in to comment.