Summary:
When running `SqlCrossDBLoadWithDDL` test with a local dev vm cluster
```
./bin/yb-ctl create --rf 3 --master_flags 'vmodule="master_heartbeat_service=2",log_ysql_catalog_versions=true'
```
via the command
```
/usr/lib/jvm/java-17-openjdk-17.0.15.0.6-2.el8.x86_64/bin/java -jar $HOME/code/yb-stress-test/tools/sample-app/target/yb-stress-sample-apps.jar --workload SqlCrossDBLoadWithDDL --num_of_tables_in_db 1 --uuid be8ef4cb-ec72-4679-a488-5af1eb848d39 --uuid_marker be8ef4cb-ec72-4679-a488-5af1eb848d39 --num_writes -1 --num_reads -1 --num_threads_write 3 --num_threads_read 3 --num_unique_keys 2000000000000000 --num_value_columns 30 --default_postgres_database postgres --use_datatypes true --nodes 127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --username yugabyte --per_db_catalog_mode true --batch_size 3 --num_of_non_colocated_databases 1 --num_of_colocated_databases 0 --num_of_parallel_ddls 1 --run_time 3600
```
I saw some PG logs that indicated unexpected full catalog cache refreshes. For
example
```
./node-1/disk-1/yb-data/tserver/logs/postgresql-2025-06-25_175257.log:2025-06-25 18:09:14.012 UTC [2769022] LOG: calling YBRefreshCache: 0 704 705 0 1 0 4 for database 16384
```
Based upon the two catalog versions 704 and 705, the following master side log
was observed:
```
./node-3/disk-1/yb-data/master/logs/yb-master.INFO:I0625 18:09:13.974139 2756585 master_heartbeat_service.cc:379] vlog2: responding (to ts 8e823b900adb44fcba98c37aaab67510) db catalog versions: db_catalog_versions { db_oid: 1 current_version: 1 last_breaking_version: 1 } db_catalog_versions { db_oid: 4 current_version: 1 last_breaking_version: 1 } db_catalog_versions { db_oid: 5 current_version: 2 last_breaking_version: 1 } db_catalog_versions { db_oid: 13515 current_version: 1 last_breaking_version: 1 } db_catalog_versions { db_oid: 13516 current_version: 1 last_breaking_version: 1 } db_catalog_versions { db_oid: 16384 current_version: 705 last_breaking_version: 650 }) db inval messages: [{5, [{2, 624}]}, {16384, [{693, 216}, {694, 216}, {695, 216}, {696, 384}, {697, 264}, {698, 48}, {699, 72}, {700, 216}, {701, 216}, {702, 216}, {703, 216}, {704, 192}]}]
```
The inconsistency is between
```
{ db_oid: 16384 current_version: 705 last_breaking_version: 650 }
```
which is read from `pg_yb_catalog_version` and
```
{16384, [{693, 216}, {694, 216}, {695, 216}, {696, 384}, {697, 264}, {698, 48}, {699, 72}, {700, 216}, {701, 216}, {702, 216}, {703, 216}, {704, 192}]}
```
which is read from `pg_yb_invalidation_messages`. These two tables are updated
by DDL statements transactionally. Although we read them at different times, we
read `pg_yb_catalog_version` first, if a change is detected via fingerprint, we
then read `pg_yb_invalidation_messages`. So it is possible that we read
new version from `pg_yb_invalidation_messages` that did not exist in
`pg_yb_catalog_version`, but here we are seeing the opposite.
After debugging, I found that this is because when reading
`pg_yb_catalog_version` we ensure to read up-to-date data via restarts, but when
reading `pg_yb_invalidation_messages` we used a simpler read which may return
stale data. In this case the up-to-date version is 705, but the stale read from
`pg_yb_invalidation_messages` returned 704. As a result, 705 gets set in tserver
shared memory, but when PG requests for the invalidation messages of version
705, it did not find and PG has to do a full catalog cache refresh.
To fix this bug, I made changes to also read `pg_yb_invalidation_messages` in
the same way as reading `pg_yb_catalog_version` (which should have been done
initially).
Also made some minor logging changes in tablet_server.cc.
Jira: DB-17376
Test Plan:
(1)
YB_EXTRA_MASTER_FLAGS="--vmodule=master_heartbeat_service=2" YB_EXTRA_TSERVER_FLAGS="--vmodule=heartbeater=1" ./yb_build.sh release --cxx-test pg_catalog_version-test
(2) Manual test
(2.1) Start a RF-3 local dev vm cluster
```
./bin/yb-ctl create --rf 3 --master_flags 'vmodule="master_heartbeat_service=2",log_ysql_catalog_versions=true'
```
(2.2) Run the `SqlCrossDBLoadWithDDL` test:
```
/usr/lib/jvm/java-17-openjdk-17.0.15.0.6-2.el8.x86_64/bin/java -jar $HOME/code/yb-stress-test/tools/sample-app/target/yb-stress-sample-apps.jar --workload SqlCrossDBLoadWithDDL --num_of_tables_in_db 1 --uuid be8ef4cb-ec72-4679-a488-5af1eb848d39 --uuid_marker be8ef4cb-ec72-4679-a488-5af1eb848d39 --num_writes -1 --num_reads -1 --num_threads_write 3 --num_threads_read 3 --num_unique_keys 2000000000000000 --num_value_columns 30 --default_postgres_database postgres --use_datatypes true --nodes 127.0.0.1:5433,127.0.0.2:5433,127.0.0.3:5433 --username yugabyte --per_db_catalog_mode true --batch_size 3 --num_of_non_colocated_databases 1 --num_of_colocated_databases 0 --num_of_parallel_ddls 1 --run_time 3600
```
Did not see full catalog cache refresh in PG logs.
Reviewers: hsunder, kfranz, sanketh, mihnea
Reviewed By: hsunder
Subscribers: yql
Differential Revision: https://phorge.dev.yugabyte.com/D45027