-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] PG backend crashes when create cluster has yb_debug_log_catcache_events=1 #22139
Closed
1 task done
Labels
Comments
myang2021
added a commit
that referenced
this issue
Apr 26, 2024
…ents=1 Summary: To reproduce the bug: ./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1 ./bin/ysqlsh ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? FATAL: cannot read pg_class without having selected a database ``` I found that when MyDatabaseId is not resolved yet, there is a debugging code block that is executed when yb_debug_log_catcache_events=1 cannot work properly because it requires a database already selected. The debugging code relies on other PG code to work and it failed at: ``` static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic) { HeapTuple pg_class_tuple; Relation pg_class_desc; SysScanDesc pg_class_scan; ScanKeyData key[1]; Snapshot snapshot; /* * If something goes wrong during backend startup, we might find ourselves * trying to read pg_class before we've selected a database. That ain't * gonna work, so bail out with a useful error message. If this happens, * it probably means a relcache entry that needs to be nailed isn't. */ if (!OidIsValid(MyDatabaseId)) elog(FATAL, "cannot read pg_class without having selected a database"); ``` Fixed by adjusting the debug code so that when MyDatabaseId is invalid, we avoid calling `YBDatumToString` which can trigger the PG FATAL. After the fix, the above simple test now succeeds, and the PG logs show debugging logs like: ``` 2024-04-25 23:37:25.762 UTC [27592] LOG: Catalog cache miss on cache with id 10: Target rel: pg_authid (oid : 1260), index oid 2676 Search keys: typid=19 value=<not logged> 2024-04-25 23:37:25.762 UTC [27592] LOG: Catalog cache miss on cache with id 11: Target rel: pg_authid (oid : 1260), index oid 2677 Search keys: typid=26 value=<not logged> 2024-04-25 23:37:25.885 UTC [27592] LOG: Catalog cache miss on cache with id 35: Target rel: pg_namespace (oid : 2615), index oid 2684 ``` Jira: DB-11064 Test Plan: YB_EXTRA_TSERVER_FLAGS="--ysql_pg_conf_csv=yb_debug_log_catcache_events=1" ./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.DBCatalogVersion Reviewers: jason Reviewed By: jason Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D34500
svarnau
pushed a commit
that referenced
this issue
May 25, 2024
…ents=1 Summary: To reproduce the bug: ./bin/yb-ctl create --rf 1 --tserver_flags=ysql_pg_conf_csv=yb_debug_log_catcache_events=1 ./bin/ysqlsh ``` ysqlsh: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5433? FATAL: cannot read pg_class without having selected a database ``` I found that when MyDatabaseId is not resolved yet, there is a debugging code block that is executed when yb_debug_log_catcache_events=1 cannot work properly because it requires a database already selected. The debugging code relies on other PG code to work and it failed at: ``` static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic) { HeapTuple pg_class_tuple; Relation pg_class_desc; SysScanDesc pg_class_scan; ScanKeyData key[1]; Snapshot snapshot; /* * If something goes wrong during backend startup, we might find ourselves * trying to read pg_class before we've selected a database. That ain't * gonna work, so bail out with a useful error message. If this happens, * it probably means a relcache entry that needs to be nailed isn't. */ if (!OidIsValid(MyDatabaseId)) elog(FATAL, "cannot read pg_class without having selected a database"); ``` Fixed by adjusting the debug code so that when MyDatabaseId is invalid, we avoid calling `YBDatumToString` which can trigger the PG FATAL. After the fix, the above simple test now succeeds, and the PG logs show debugging logs like: ``` 2024-04-25 23:37:25.762 UTC [27592] LOG: Catalog cache miss on cache with id 10: Target rel: pg_authid (oid : 1260), index oid 2676 Search keys: typid=19 value=<not logged> 2024-04-25 23:37:25.762 UTC [27592] LOG: Catalog cache miss on cache with id 11: Target rel: pg_authid (oid : 1260), index oid 2677 Search keys: typid=26 value=<not logged> 2024-04-25 23:37:25.885 UTC [27592] LOG: Catalog cache miss on cache with id 35: Target rel: pg_namespace (oid : 2615), index oid 2684 ``` Jira: DB-11064 Test Plan: YB_EXTRA_TSERVER_FLAGS="--ysql_pg_conf_csv=yb_debug_log_catcache_events=1" ./yb_build.sh release --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.DBCatalogVersion Reviewers: jason Reviewed By: jason Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D34500
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-11064
Description
To reproduce the bug:
I found that when MyDatabaseId is not resolved yet, yb_debug_log_catcache_events=1 is not usable.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: