Skip to content

2024.2.3.0-b77

@fourpointfour fourpointfour tagged this 04 Apr 07:07
Summary:
The online index backfill process waits for all the backends to catchup to the same catalog version to proceed. However, when CDC is running, the walsender process can be lagging and hence not be on par with the latest catalog version which can lead to the index creation process/statement to timeout with a similar error as:

```
ERROR:  timed out waiting for postgres backends to catch up
DETAIL:  1 backends on database 13260 are still behind catalog version 4.
HINT:  Run the following query on all tservers to find the lagging backends: SELECT * FROM pg_stat_activity WHERE catalog_version < 4 AND datid = 13260;
```

The fix is to exempt the walsender process from the YSQL backend check so that the index creation can proceed without waiting for the walsender to catch up to the latest catalog version.

In case the walsender lags behind, it can take a while to catch up to the latest catalog version. However, it is safe to exempt walsender from the backend check as walsender only reads the system catalog for event streaming and doesn't perform any writes. Additionally, the walreceiver and walwriter process are not being exempted yet as we do not use those processes anyway, so exempting them isn't going to be any different than what the current state is.

**Backport description:**

The backport was applied cleanly and no merge conflict was encountered.

Original commit: 3201b1ead3ad587aa3c85c5977c5ddf0695003a0 / D42827
Jira: DB-15958

Test Plan:
Added a java unit test:

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot#testReplicationConnectionConsumptionWithCreateIndex'
```

Reviewers: jason, skumar, sumukh.phalgaonkar

Reviewed By: jason

Subscribers: yql, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D42914
Assets 2
Loading