Summary:
As DDLs are processed, rows build up in the ddl_queue and replicated_ddls tables.
Without cleanup, these tables will grow without bound.
The diff introduces a periodic task (default 30 minutes, controlled by gflag) that cleans up entries older than a certain amount of time (default seven days, controlled by a gflag).
This code runs on both source and target universes when xCluster automatic mode is being used.
This is a periodic task that uses essentially the same code as the existing periodic XReplParentTabletDeletion task.
The original design was:
* create a new background task similar to ScheduleXReplParentTabletDeletionTask
* this runs on the master leader
* essentially copy and paste the existing code that lives in xrepl_catalog_manager
* new gflag for how often it runs
* suggesting default of half an hour
* that task runs on both source and target universes
* it exits if at the beginning of a run it sees master has lost leadership
* it lists all the xCluster replication namespaces using automatic mode replication
* for each of those, it sends SQL statements to a TServer via ExecutePgsqlStatements
* exact statements depend on whether it's source or target
* statements that run on source (run first):
* delete the rows in the ddl_queue table whose ddl_end_time is older than a new gflag (default one week)
* no DML blocking override is present so if accidentally run on a target will do nothing
* statements that run on both source and target
* delete rows in the replicated_ddls table satisfying:
* their ddl_end_time is older than a gflag (default one week)
* they do not have a matching row in the ddl_queue table (comparing primary keys)
* these statements do have the DML blocking override specified so they will work on the target or source
* there are no other interlocks; note that statements on the target side are executed at xCluster safe time as usual
The consequence of these statements is that we GC ddl_queue rows on the source, which then propagate to the target. On both sides, we independently GC replicated_ddls rows (remember that table does not get replicated) but take care to only remove them when they are no longer needed to indicate that the corresponding ddl_queue row has been processed.
The only change that had to be made was excluding the special row in the replicated_ddls table from cleanup.
UPDATE: after examining the SQL performance, we decided to make some optimizations:
* we changed the primary key index for the DDL replication tables from hashed to ranged
* we changed the query for finding what details we need to perform to be more efficient
* using NOT EXISTS
* using a plan hint to force merge join which is about 3X faster than what the planner does by default
* I verified the optimized query also performs faster with the old schema
**Upgrade/Rollback safety:**
The diff changes the primary key of the two DDL replication tables from hash to ranged. This should make some of the DDL replication queries faster.
If the user upgrades while running automatic mode then the schema will remain the old schema for the databases already using automatic mode, which will mean they don't get the query speed up for those databases.
When they next re-create replication for those databases (This includes re-bootstrapping an existing set up) they will get the new schema.
Fixes #19193
Jira: DB-7985
Test Plan:
New unit tests:
```
ybd release --cxx-test xcluster_ddl_replication-test --gtest_filter '*.BasicDdlTableCleanup'
ybd release --cxx-test xcluster_ddl_replication-test --gtest_filter '*.DdlTableCleaningDuringPause''
```
```
~/code/yugabyte-db/bin/yb-ctl start --master_flags "xcluster_ddl_tables_retention_secs=10"
E0901 09:39:33.853299 20994 xcluster_manager.cc:83] Invalid value '10' for flag 'xcluster_ddl_tables_retention_secs': Must be greater than or equal to 1 * 24 * 60 * 60
ERROR: failed validation of new value '10' for flag 'xcluster_ddl_tables_retention_secs'
```
Reviewers: xCluster, jhe, hsunder
Reviewed By: hsunder
Subscribers: yql, hsunder, ybase
Differential Revision: https://phorge.dev.yugabyte.com/D46231