Skip to content

Commit

Permalink
[#22139] YSQL: Fix PG crash when cluster has yb_debug_log_catcache_ev…
Browse files Browse the repository at this point in the history
…ents=1

Summary:
To reproduce the bug:

./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1

./bin/ysqlsh
```
ysqlsh: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5433?
FATAL:  cannot read pg_class without having selected a database
```

I found that when MyDatabaseId is not resolved yet, there is a debugging code
block that is executed when yb_debug_log_catcache_events=1 cannot work
properly because it requires a database already selected. The debugging
code relies on other PG code to work and it failed at:

```
static HeapTuple
ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic)
{
    HeapTuple   pg_class_tuple;
    Relation    pg_class_desc;
    SysScanDesc pg_class_scan;
    ScanKeyData key[1];
    Snapshot    snapshot;

    /*
     * If something goes wrong during backend startup, we might find ourselves
     * trying to read pg_class before we've selected a database.  That ain't
     * gonna work, so bail out with a useful error message.  If this happens,
     * it probably means a relcache entry that needs to be nailed isn't.
     */
    if (!OidIsValid(MyDatabaseId))
        elog(FATAL, "cannot read pg_class without having selected a database");
```

Fixed by adjusting the debug code so that when MyDatabaseId is invalid, we avoid
calling `YBDatumToString` which can trigger the PG FATAL.

After the fix, the above simple test now succeeds, and the PG logs show debugging
logs like:

```
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 10:
    Target rel: pg_authid (oid : 1260), index oid 2676
    Search keys: typid=19 value=<not logged>
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 11:
    Target rel: pg_authid (oid : 1260), index oid 2677
    Search keys: typid=26 value=<not logged>
2024-04-25 23:37:25.885 UTC [27592] LOG:  Catalog cache miss on cache with id 35:
    Target rel: pg_namespace (oid : 2615), index oid 2684
```
Jira: DB-11064

Test Plan: YB_EXTRA_TSERVER_FLAGS="--ysql_pg_conf_csv=yb_debug_log_catcache_events=1" ./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.DBCatalogVersion

Reviewers: jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34500
  • Loading branch information
myang2021 committed Apr 26, 2024
1 parent 8fa7152 commit f2046c1
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions src/postgres/src/backend/utils/cache/catcache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1822,6 +1822,8 @@ SearchCatCacheMiss(CatCache *cache,
* For safety, disable catcache logging within the scope of this
* function as YBDatumToString below may trigger additional cache
* lookups (to get the attribute type info).
* Also only call YBDatumToString when MyDatabaseId is valid to
* avoid PG FATAL.
*/
yb_debug_log_catcache_events = false;
for (int i = 0; i < nkeys; i++)
Expand All @@ -1833,8 +1835,10 @@ SearchCatCacheMiss(CatCache *cache,
Oid typid = OIDOID; // default.
if (attnum > 0)
typid = TupleDescAttr(cache->cc_tupdesc, attnum - 1)->atttypid;

appendStringInfoString(&buf, YBDatumToString(cur_skey[i].sk_argument, typid));
if (OidIsValid(MyDatabaseId))
appendStringInfoString(&buf, YBDatumToString(cur_skey[i].sk_argument, typid));
else
appendStringInfo(&buf, "typid=%u value=<not logged>", typid);
}
ereport(LOG,
(errmsg("Catalog cache miss on cache with id %d:\n"
Expand Down

0 comments on commit f2046c1

Please sign in to comment.