[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202

rahuldesirazu · 2022-01-25T03:03:49Z

Jira Link: DB-1297

Description

Given we know each tablet's safe time, each target server needs to calculate the minimum safe time across all tablets and report this to the master on heartbeat. The master will keep a mapping of tserver -> min_safe_time, update this structure, and calculate a new global min to be sent back as part of the response.

Summary: Compute the xcluster min safe read time for each namespace\DB and propagate it to all tservers. Each CDC Producer, the source cluster sends the safe time (last replicated operation ht, or leader safe time) to the consumers. On Consumer cluster, cdc_poller keeps track of the the min safe time it got from the producer. Periodically the cdc_consumer gets the safe time from all pollers and writes it to xcluster_safe_time table. XClusterSafeTimeService (which currently runs on master) runs a periodic task which read the entries in the table, and compute the min safe time per consumer namespace (DB in ysql). This information is stored in a new catalog entity XCLUSTER_SAFE_TIME, and propagated back to all tservers via the heartbeat. XClusterSafeTimeService split brain protection: During master failovers it is possible that the XClusterSafeTime task on the old node has not completed before the one on the new node starts. Or it may have some network requests that have not yet completed. The Task modifies two on-disk data that are still protected in these cases: - XClusterSafeTime Sys CatalogEntity: The task gets the master leader term at the start of the work and uses it to commit the entity change. This ensures that there has not been any leader change since the start of the work, and commit of the new entity. - Stale rows from the XClusterSafeTime Table: This is an idempotent operation, and can be run multiple times, even in parallel from multiple nodes. If the replication stream was destroyed and recreated, then it would just delay the new safe time computation by one round. There is no correctness issues in removing rows from the table, as Sys CatalogEntity stores the actual safeTime and it is guaranteed never to move backwards. xcluster_safe_time table Schema: universe_id string(HASH), tablet_id string(HASH), safe_time int64 Reduced the scope of `CDCConsumer::should_run_mutex_`. The background `RunThread` was holding this for the entire run causing Shutdown to block on it. With this new reduced scope Shutdown will be able to clear in-memory structures and call the Client Shutdown which will cause `RunThread` to fail and exit early. Dump ClusterConfig in yb-admin dump_masters_state Test Plan: xcluster_safe_time_service-test xcluster_safe_time-itest Manual test: Setup the cluster with replication: ./bin/yugabyted destroy ./bin/yb-ctl destroy ./bin/yb-ctl wipe_restart --data_dir ~/yugabyte-data1 --ip_start 1 --tserver_flags "yb_system_namespace_readonly=false,vmodule=xcluster_safe_time_service=1" ./bin/yb-ctl wipe_restart --data_dir ~/yugabyte-data2 --ip_start 10 --tserver_flags "yb_system_namespace_readonly=false,vmodule=xcluster_safe_time_service=1" ./bin/ysqlsh -h 127.0.0.1 -c "create table tbl1(a int);" ./bin/ysqlsh -h 127.0.0.10 -c "create table tbl1(a int);" #./build/latest/bin/yb-admin -master_addresses 127.0.0.1:7100 list_tables include_table_id | grep tbl1 ybadmin get_universe_config ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 setup_universe_replication e2ff1315-9811-4211-b8b8-386f5083049a 127.0.0.1:7100 000033e8000030008000000000004000 Get the safe time and make sure it moves up: ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 dump_masters_state console | grep XCluster XCluster Safe Time: safe_time_map { key: "000033e8000030008000000000000000" value: 6805014956461838336 } ./build/latest/bin/yb-admin -master_addresses 127.0.0.10:7100 dump_masters_state console | grep XCluster XCluster Safe Time: safe_time_map { key: "000033e8000030008000000000000000" value: 6805014997570662400 } Reviewers: slingam, rahuldesirazu Reviewed By: rahuldesirazu Subscribers: jenkins-bot, yugaware, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D18579

rahuldesirazu added the area/docdb YugabyteDB core features label Jan 25, 2022

rahuldesirazu self-assigned this Jan 25, 2022

rahuldesirazu mentioned this issue Jan 25, 2022

[xCluster] Support Atomic and Ordered multi-shard Transactions #10976

Closed

omkar-yb added this to Backlog in YBase features Apr 7, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022

yugabyte-ci assigned hari90 and unassigned rahuldesirazu Jul 6, 2022

hari90 closed this as completed Sep 10, 2022

YBase features automation moved this from Backlog to Done Sep 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202

[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202

rahuldesirazu commented Jan 25, 2022 •

edited by yugabyte-ci

Loading

[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202

[xCluster] Keep min_safe_time calculation periodically refreshed on the master #11202

Comments

rahuldesirazu commented Jan 25, 2022 • edited by yugabyte-ci Loading

Description

rahuldesirazu commented Jan 25, 2022 •

edited by yugabyte-ci

Loading