Skip to content

Commit

Permalink
Use async API to drop database from delete_data_node
Browse files Browse the repository at this point in the history
PG15 introduced a ProcSignalBarrier mechanism in drop database
implementation to force all backends to close the file handles for
dropped tables. The backend that is executing the drop database command
will emit a new process signal barrier and wait for other backends to
accept it. But the backend which is executing the delete_data_node
function will not be able to process the above mentioned signal as it
will be stuck waiting for the drop database query to return. Thus the
two backends end up waiting for each other causing a deadlock.

Fixed it by using the async API to execute the drop database command
from delete_data_node instead of the blocking remote_connection_cmdf_ok
call.

Fixes #4838
  • Loading branch information
lkshminarayanan committed Nov 17, 2022
1 parent 1b65297 commit 839e42d
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 3 deletions.
2 changes: 1 addition & 1 deletion .github/gh_matrix_builder.py
Expand Up @@ -153,7 +153,7 @@ def macos_config(overrides):
"snapshot": "snapshot",
"tsdb_build_args": "-DASSERTIONS=ON -DREQUIRE_ALL_TESTS=ON -DEXPERIMENTAL=ON",
# below tests are tracked as part of #4838
"installcheck_args": "SKIPS='003_connections_privs 001_simple_multinode 004_multinode_rdwr_1pc data_node_bootstrap dist_hypertable-15 bgw_custom cagg_dump dist_move_chunk' "
"installcheck_args": "SKIPS='003_connections_privs 001_simple_multinode 004_multinode_rdwr_1pc dist_hypertable-15 bgw_custom cagg_dump dist_move_chunk' "
# below tests are tracked as part of #4835
"IGNORES='telemetry_stats dist_query dist_partial_agg plan_hashagg partialize_finalize dist_fetcher_type dist_remote_error jit-15 "
# below tests are tracked as part of #4837
Expand Down
16 changes: 14 additions & 2 deletions tsl/src/data_node.c
Expand Up @@ -1776,9 +1776,21 @@ drop_data_node_database(const ForeignServer *server)
* has to rerun the command without drop_database=>true set. We
* don't force removal if there are other connections to the
* database out of caution. If the user wants to forcefully remove
* the database, they can do it manually. */
remote_connection_cmdf_ok(conn, "DROP DATABASE %s", quote_identifier(dbname));
* the database, they can do it manually. From PG15, the backend
* executing the DROP forces all other backends to close all smgr
* fds using the ProcSignalBarrier mechanism. To allow this backend
* to handle that interrupt, send the DROP request using the async
* API. */
char *cmd;
AsyncRequest *req;

cmd = psprintf("DROP DATABASE %s", quote_identifier(dbname));
req = async_request_send(conn, cmd);
Assert(NULL != req);
async_request_wait_ok_result(req);
remote_connection_close(conn);
pfree(req);
pfree(cmd);
}
else
ereport(ERROR,
Expand Down

0 comments on commit 839e42d

Please sign in to comment.