Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202

Closed
rahuldesirazu opened this issue Jan 25, 2022 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@rahuldesirazu
Copy link
Contributor

rahuldesirazu commented Jan 25, 2022

Jira Link: DB-1297

Description

Given we know each tablet's safe time, each target server needs to calculate the minimum safe time across all tablets and report this to the master on heartbeat. The master will keep a mapping of tserver -> min_safe_time, update this structure, and calculate a new global min to be sent back as part of the response.

@rahuldesirazu rahuldesirazu added the area/docdb YugabyteDB core features label Jan 25, 2022
@rahuldesirazu rahuldesirazu self-assigned this Jan 25, 2022
@omkar-yb omkar-yb added this to Backlog in YBase features Apr 7, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022
@yugabyte-ci yugabyte-ci assigned hari90 and unassigned rahuldesirazu Jul 6, 2022
hari90 added a commit that referenced this issue Sep 9, 2022
Summary:
Compute the xcluster min safe read time for each namespace\DB and propagate it to all tservers.
Each CDC Producer, the source cluster sends the safe time (last replicated operation ht, or leader safe time) to the consumers.
On Consumer cluster, cdc_poller keeps track of the the min safe time it got from the producer. Periodically the cdc_consumer gets the safe time from all pollers and writes it to xcluster_safe_time table.
XClusterSafeTimeService (which currently runs on master) runs a periodic task which read the entries in the table, and compute the min safe time per consumer namespace (DB in ysql). This information is stored in a new catalog entity XCLUSTER_SAFE_TIME, and propagated back to all tservers via the heartbeat.

XClusterSafeTimeService split brain protection:
During master failovers it is possible that the XClusterSafeTime task on the old node has not completed before the one on the new node starts. Or it may have some network requests that have not yet completed. The Task modifies two on-disk data that are still protected in these cases:
  - XClusterSafeTime Sys CatalogEntity: The task gets the master leader term at the start of the work and uses it to commit the entity change. This ensures that there has not been any leader change since the start of the work, and commit of the new entity.
  - Stale rows from the XClusterSafeTime Table: This is an idempotent operation, and can be run multiple times, even in parallel from multiple nodes. If the replication stream was destroyed and recreated, then it would just delay the new safe time computation by one round. There is no correctness issues in removing rows from the table, as Sys CatalogEntity stores the actual safeTime and it is guaranteed never to move backwards.

xcluster_safe_time table Schema:
universe_id string(HASH), tablet_id string(HASH), safe_time int64

Reduced the scope of `CDCConsumer::should_run_mutex_`. The background `RunThread` was holding this for the entire run causing Shutdown to block on it. With this new reduced scope Shutdown will be able to clear in-memory structures and call the Client Shutdown which will cause `RunThread` to fail and exit early.

Dump ClusterConfig in yb-admin dump_masters_state

Test Plan:
xcluster_safe_time_service-test
xcluster_safe_time-itest

Manual test:
Setup the cluster with replication:
./bin/yugabyted destroy
./bin/yb-ctl destroy
./bin/yb-ctl wipe_restart --data_dir ~/yugabyte-data1 --ip_start 1 --tserver_flags "yb_system_namespace_readonly=false,vmodule=xcluster_safe_time_service=1"
./bin/yb-ctl wipe_restart --data_dir ~/yugabyte-data2 --ip_start 10 --tserver_flags "yb_system_namespace_readonly=false,vmodule=xcluster_safe_time_service=1"
./bin/ysqlsh -h 127.0.0.1 -c "create table tbl1(a int);"
./bin/ysqlsh -h 127.0.0.10 -c "create table tbl1(a int);"
#./build/latest/bin/yb-admin -master_addresses 127.0.0.1:7100 list_tables include_table_id | grep tbl1
ybadmin get_universe_config
./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 setup_universe_replication e2ff1315-9811-4211-b8b8-386f5083049a 127.0.0.1:7100 000033e8000030008000000000004000

Get the safe time and make sure it moves up:
 ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 dump_masters_state console | grep XCluster
XCluster Safe Time: safe_time_map { key: "000033e8000030008000000000000000" value: 6805014956461838336 }
./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 dump_masters_state console | grep XCluster
XCluster Safe Time: safe_time_map { key: "000033e8000030008000000000000000" value: 6805014997570662400 }

Reviewers: slingam, rahuldesirazu

Reviewed By: rahuldesirazu

Subscribers: jenkins-bot, yugaware, ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D18579
@hari90 hari90 closed this as completed Sep 10, 2022
YBase features automation moved this from Backlog to Done Sep 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Development

No branches or pull requests

3 participants