Skip to content

2.27.0.0-b398

@myang2021 myang2021 tagged this 01 Aug 00:35
Summary:
Currently, PG backend has a table cache that stores the docdb table schemas for
tables that are accessed during the session. Whenever we perform an incremental
catalog cache refresh, we still clear the entire PG table cache. It will be
beneficial if we can avoid clearing the entire table cache, and use invalidation
messages to only invalidate those table cache entries that are relevant.

This diff performs this optimization by looking into the invalidation messages,
and only invalidation table cache entries that are found to be relevant. For
example, if a tuple in RELOID catalog cache (which is a cache for pg_class) is
invalidated, we figure out the DocDB table id of the table and invalidate it
from the PG table cache. There are situations where when the entire RELOID cache
is invalidated, or the entire relation cache is invalidated, we will continue to
invalidate the entire table cache.

The change is fairly empirical because we do not know whether all the necessary
cases are covered. However, the side effect of missing an invalidation of table
cache entry is goind to cause a schema version mismatch error. Consider even
with table locking, we do not completely avoid schema version mismatch error. So
as long as we have covered the common/majority case, even if we miss some
corner cases when stale table cache entry is not invalidated, we will not see
data corruption errors.

I added a new unit test and compared the OpenTable and GetSchema RPC counts
with/without the diff. It shows that we reduced the OpenTable RPC count from 739
to 244, while GetTableSchema RPC count was reduced from 882 to 782.

Note that at tserver side, there is also a docdb table cache. On a heartbeat
response that found the catalog version is incremented, the entire table cache
of the given DB is cleared. In addition, if a PG `OpenTable` RPC has a catalog
version that is higher than that of the DB in the tserver cache, the entire table
cache of that DB is also cleared.

This diff only optimizes PG table cache, not tserver table cache. At
tserver side, it does not understand the invalidation messages yet so the entire
cache is still cleared. It is expected that active connections will retain most
of the table cache entries as valid and therefore can reduce the number of
`OpenTable` RPCs. New connections will need to load table cache entry
from the tserver, and therefore will continue to behave as currently if tserver
table cache is cleared by a DDL statement.
Jira: DB-17672

Test Plan:
(1) ./yb_build.sh release --cxx-test pg_catalog_version-test   --gtest_filter PgCatalogVersionTest.InvalMessageDeltaTableLoad -n 50
(2) run tpcc on local dev cluster (RF-3)

before

```
======================LATENCIES (INCLUDE RETRY ATTEMPTS)=====================
 Transaction |  Count   | Avg. Latency | P99 Latency | Connection Acq Latency
    NewOrder |     3887 |        30.23 |       52.28 |                   0.52
     Payment |     3643 |        19.38 |       33.20 |                   0.53
 OrderStatus |      381 |         5.49 |       13.59 |                   0.52
    Delivery |      351 |       110.68 |      228.23 |                   0.50
  StockLevel |      329 |         9.30 |       15.05 |                   0.51
        All  |     8591 |        27.02 |      155.87 |                   0.52

```
after

```
======================LATENCIES (INCLUDE RETRY ATTEMPTS)=====================
 Transaction |  Count   | Avg. Latency | P99 Latency | Connection Acq Latency
    NewOrder |     3848 |        29.70 |       51.38 |                   0.51
     Payment |     3637 |        19.12 |       34.30 |                   0.50
 OrderStatus |      350 |         5.30 |        9.73 |                   0.50
    Delivery |      342 |        99.97 |      214.05 |                   0.51
  StockLevel |      333 |         9.03 |       15.40 |                   0.53
        All  |     8510 |        26.19 |      141.02 |                   0.51
```

(3) Run tpcc via perf portal, compare with base 2.27.0.0-b310, both with auto-analyze on,
they are roughly the same, with one green line showing the diff performed
10.3% better:

```
Connection Acquisition NewOrder Latency (ms) 0.29 0.26 10.3%
```

There appears to be some small improvement in terms of latency.

Reviewers: kfranz, sanketh, mihnea

Reviewed By: sanketh

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D45266
Assets 2
Loading