Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] PG backend crashes when create cluster has yb_debug_log_catcache_events=1 #22139

Closed
1 task done
myang2021 opened this issue Apr 25, 2024 · 0 comments
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@myang2021
Copy link
Contributor

myang2021 commented Apr 25, 2024

Jira Link: DB-11064

Description

To reproduce the bug:

./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1

$ ysqlsh
ysqlsh: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5433?
FATAL:  cannot read pg_class without having selected a database

I found that when MyDatabaseId is not resolved yet, yb_debug_log_catcache_events=1 is not usable.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@myang2021 myang2021 added the area/ysql Yugabyte SQL (YSQL) label Apr 25, 2024
@myang2021 myang2021 self-assigned this Apr 25, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Apr 25, 2024
myang2021 added a commit that referenced this issue Apr 26, 2024
…ents=1

Summary:
To reproduce the bug:

./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1

./bin/ysqlsh
```
ysqlsh: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5433?
FATAL:  cannot read pg_class without having selected a database
```

I found that when MyDatabaseId is not resolved yet, there is a debugging code
block that is executed when yb_debug_log_catcache_events=1 cannot work
properly because it requires a database already selected. The debugging
code relies on other PG code to work and it failed at:

```
static HeapTuple
ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic)
{
    HeapTuple   pg_class_tuple;
    Relation    pg_class_desc;
    SysScanDesc pg_class_scan;
    ScanKeyData key[1];
    Snapshot    snapshot;

    /*
     * If something goes wrong during backend startup, we might find ourselves
     * trying to read pg_class before we've selected a database.  That ain't
     * gonna work, so bail out with a useful error message.  If this happens,
     * it probably means a relcache entry that needs to be nailed isn't.
     */
    if (!OidIsValid(MyDatabaseId))
        elog(FATAL, "cannot read pg_class without having selected a database");
```

Fixed by adjusting the debug code so that when MyDatabaseId is invalid, we avoid
calling `YBDatumToString` which can trigger the PG FATAL.

After the fix, the above simple test now succeeds, and the PG logs show debugging
logs like:

```
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 10:
    Target rel: pg_authid (oid : 1260), index oid 2676
    Search keys: typid=19 value=<not logged>
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 11:
    Target rel: pg_authid (oid : 1260), index oid 2677
    Search keys: typid=26 value=<not logged>
2024-04-25 23:37:25.885 UTC [27592] LOG:  Catalog cache miss on cache with id 35:
    Target rel: pg_namespace (oid : 2615), index oid 2684
```
Jira: DB-11064

Test Plan: YB_EXTRA_TSERVER_FLAGS="--ysql_pg_conf_csv=yb_debug_log_catcache_events=1" ./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.DBCatalogVersion

Reviewers: jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34500
@myang2021 myang2021 changed the title [YSQL] PG backend crashes when create cluser has yb_debug_log_catcache_events=1 [YSQL] PG backend crashes when create cluster has yb_debug_log_catcache_events=1 Apr 26, 2024
svarnau pushed a commit that referenced this issue May 25, 2024
…ents=1

Summary:
To reproduce the bug:

./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1

./bin/ysqlsh
```
ysqlsh: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5433?
FATAL:  cannot read pg_class without having selected a database
```

I found that when MyDatabaseId is not resolved yet, there is a debugging code
block that is executed when yb_debug_log_catcache_events=1 cannot work
properly because it requires a database already selected. The debugging
code relies on other PG code to work and it failed at:

```
static HeapTuple
ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic)
{
    HeapTuple   pg_class_tuple;
    Relation    pg_class_desc;
    SysScanDesc pg_class_scan;
    ScanKeyData key[1];
    Snapshot    snapshot;

    /*
     * If something goes wrong during backend startup, we might find ourselves
     * trying to read pg_class before we've selected a database.  That ain't
     * gonna work, so bail out with a useful error message.  If this happens,
     * it probably means a relcache entry that needs to be nailed isn't.
     */
    if (!OidIsValid(MyDatabaseId))
        elog(FATAL, "cannot read pg_class without having selected a database");
```

Fixed by adjusting the debug code so that when MyDatabaseId is invalid, we avoid
calling `YBDatumToString` which can trigger the PG FATAL.

After the fix, the above simple test now succeeds, and the PG logs show debugging
logs like:

```
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 10:
    Target rel: pg_authid (oid : 1260), index oid 2676
    Search keys: typid=19 value=<not logged>
2024-04-25 23:37:25.762 UTC [27592] LOG:  Catalog cache miss on cache with id 11:
    Target rel: pg_authid (oid : 1260), index oid 2677
    Search keys: typid=26 value=<not logged>
2024-04-25 23:37:25.885 UTC [27592] LOG:  Catalog cache miss on cache with id 35:
    Target rel: pg_namespace (oid : 2615), index oid 2684
```
Jira: DB-11064

Test Plan: YB_EXTRA_TSERVER_FLAGS="--ysql_pg_conf_csv=yb_debug_log_catcache_events=1" ./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.DBCatalogVersion

Reviewers: jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D34500
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants