Skip to content

2.25.2.0-b142

@myang2021 myang2021 tagged this 12 Mar 00:17
Summary:
In this diff, I made changes so that PG backend can do incremental catalog cache
refresh using invalidation messages.

When a PG backend detects that a newer catalog version has arrived in shared
memory, currently it does a full catalog cache refresh. I inserted steps before
it invokes YBRefreshCache:
(1) PG calls new function `YBCGetTserverCatalogMessageLists` that performs an
RPC to the local tserver to retrieve the message lists that reflect the delta
between the PG backend's local catalog version and the shared memory catalog
version. For example, if the local catalog version is x, and the shared memory
catalog version is y, then in the happy case `YBCGetTserverCatalogMessageLists`
will return invalidation messages associated with catalog version `x + 1, x + 2, ..., x + k`
(where k = y - x).
(2) PG calls `YbApplyInvalidationMessages` which attempts to apply the
invalidation messages retrieved above. These messages will be applied
transactionally (all or none). If all the messages can be successfully applied,
then we can skip `YBRefreshCache`. If any of the messages cannot be applied,
then the incremental catalog cache refresh optimization has failed and
`YBRefreshCache` is invoked just as before as a fall back.

Two new metrics are added to count the number of full catalog cache refreshes
and incremental catalog cache refreshes.

A new unit test is added to verify that the incremental catalog cache refresh
does happen. I moved some existing data structure `YsqlMetric` and two functions
`ParsePrometheusMetrics` and `ParseJsonMetrics` to `LibPqTestBase` so that they
can also be used in `PgCatalogVersionTest`.

**Upgrade/Rollback safety:**
The src/yb/tserver/pg_client.proto change is only used in PG -> tserver
communication which is upgrade safe.

The src/yb/common/common.proto does not change any existing proto message.
New message and RPC should not be used when the upgrade has completed. They are
added to support the PG -> tserver communication. The existing infrastructure
(e.g., `get_ysql_db_oid_to_cat_version_info_map`) needs to have a RPC API on
`TabletServerIf` and master tablet server is also a subclass of
`TabletServerIf` and that's why the API is needed for both tserver and master.
In the worst case the RPC will fail with "RPC Not implemented" error if the RPC
is made to an old master leader, or fail with "Unexpected call" to a new master
leader. This is ok and does not cause any correctness issues.

Test Plan:
./yb_build.sh --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.InvalMessageCatCacheRefreshTest
./yb_build.sh --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.InvalMessageQueueOverflowTest

Reviewers: kfranz, sanketh, mihnea

Reviewed By: kfranz

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D42274
Assets 2
Loading