Summary:
When `--TEST_ysql_yb_enable_invalidation_messages=true`, the test
`TestPgRegressThirdPartyExtensionsPostgresqlAnonymizer` sometimes fail with an
error
```
-- C. Masked Data is Masked -- C. Masked Data is Masked
-- --
SELECT "IBAN" = md5('0') FROM test."COMPANY"; SELECT "IBAN" = md5('0') FROM test."COMPANY";
?column? | ERROR: The catalog snapshot used for this transaction has been invalidated:
---------- <
t <
(1 row) <
<
```
After debugging, I found that the test runs a child ysqlsh like
```
\! ${YB_BUILD_ROOT}/postgres/bin/ysqlsh -f tmp/_pg_dump_A.sql yugabyte >/dev/null # YB: Use ysqlsh and yugabyte database
--
-- C. Masked Data is Masked
--
SELECT "IBAN" = md5('0') FROM test."COMPANY";
```
In the script tmp/_pg_dump_A.sql there is a breaking DDL statement (REVOKE) that
incremented the breaking catalog version. Once the child ysqlsh completes, it
is possible that due to heartbeat delay the latest breaking catalog version has
not yet propagated to this tserver yet before the session continues to execute
the next `SELECT` statement. When incremental catalog cache refresh is on,
currently we only compare the shared catalog version (from shared memory) with
the session's local catalog version. Then we send a local RPC to the local
tserver to retrieve the invalidation messages. After applying them we no longer
do the full catalog cache refresh. However when the session resumes for the next
`SELECT` statement, we will put the shared catalog version to the read RPC
request. If the shared catalog version is less than the latest master catalog
version and the latter has already propagated by the time the read RPC is
executed, we run into the above error.
In contrast, if `--TEST_ysql_yb_enable_invalidation_messages=false`, we always
do a full catalog cache refresh which involves a RPC to read the latest catalog
version from master. Then the catalog cache refresh will use the latest master
catalog version to proceed therefore it will not see the above error in this
situation.
In this diff, I have only added a unit test that can reproduce the bug
with a new YB test GUC `yb_test_delay_after_applying_inval_message_ms`.
When > 0, after applying the invalidation messages, we insert a delay to
allow the latest master breaking catalog version to propagate. The test
is currently asserting that by the time the `SELECT` is executed the
newer breaking catalog version has already propagated and we see the
error. Once I make the code fix, the test can be updated to assert that
we do not see the error.
Note that from user's perspective, the separately executed child ysqlsh
has synchronously completed and the next `SELECT` had better to see its
effect as if those statements were executed inline from the parent
ysqlsh's session. Also I found that once the child ysqlsh's script
execution completes, it returns back to the parent ysqlsh right away
without waiting for the child ysqlsh's PG backend to exit even if that can
take some time.
Test Plan: ./yb_build.sh --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.WaitForSharedCatalogVersionToCatchup -n 20 --tp 1
Reviewers: kfranz, sanketh, mihnea
Reviewed By: kfranz
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D42555