New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regresscheck tests hanging on PG15 #4838
Comments
The testcases The Simple steps to reproduce the issue in PG15 :
The 'DROP DATABASE' sent by the drop_data_node_database method is stuck causing this hang. The server logs keep emitting these lines when the query is stuck :
The query succeeds if |
The other test |
The hang is caused due to a new ProcSignalBarrier mechanism introduced in PG15. The process that is executing the The code changes in postgres that caused this issue are : postgres/postgres@4eb217631 and postgres/postgres@e2f65f42555. |
PG15 introduced a ProcSignalBarrier mechanism in drop database implementation to force all backends to close the file handles for dropped tables. The backend that is executing the drop database command will emit a new process signal barrier and wait for other backends to accept it. But the backend which is executing the delete_data_node function will not be able to process the above mentioned signal as it will be stuck waiting for the drop database query to return. Thus the two backends end up waiting for each other causing a deadlock. Fixed it by using the async API to execute the drop database command from delete_data_node instead of the blocking remote_connection_cmdf_ok call. Fixes timescale#4838
PG15 introduced a ProcSignalBarrier mechanism in drop database implementation to force all backends to close the file handles for dropped tables. The backend that is executing the drop database command will emit a new process signal barrier and wait for other backends to accept it. But the backend which is executing the delete_data_node function will not be able to process the above mentioned signal as it will be stuck waiting for the drop database query to return. Thus the two backends end up waiting for each other causing a deadlock. Fixed it by using the async API to execute the drop database command from delete_data_node instead of the blocking remote_connection_cmdf_ok call. Fixes timescale#4838
PG15 introduced a ProcSignalBarrier mechanism in drop database implementation to force all backends to close the file handles for dropped tables. The backend that is executing the drop database command will emit a new process signal barrier and wait for other backends to accept it. But the backend which is executing the delete_data_node function will not be able to process the above mentioned signal as it will be stuck waiting for the drop database query to return. Thus the two backends end up waiting for each other causing a deadlock. Fixed it by using the async API to execute the drop database command from delete_data_node instead of the blocking remote_connection_cmdf_ok call. Fixes timescale#4838
PG15 introduced a ProcSignalBarrier mechanism in drop database implementation to force all backends to close the file handles for dropped tables. The backend that is executing the drop database command will emit a new process signal barrier and wait for other backends to accept it. But the backend which is executing the delete_data_node function will not be able to process the above mentioned signal as it will be stuck waiting for the drop database query to return. Thus the two backends end up waiting for each other causing a deadlock. Fixed it by using the async API to execute the drop database command from delete_data_node instead of the blocking remote_connection_cmdf_ok call. Fixes timescale#4838
PG15 introduced a ProcSignalBarrier mechanism in drop database implementation to force all backends to close the file handles for dropped tables. The backend that is executing the drop database command will emit a new process signal barrier and wait for other backends to accept it. But the backend which is executing the delete_data_node function will not be able to process the above mentioned signal as it will be stuck waiting for the drop database query to return. Thus the two backends end up waiting for each other causing a deadlock. Fixed it by using the async API to execute the drop database command from delete_data_node instead of the blocking remote_connection_cmdf_ok call. Fixes #4838
PG15 introduced a ProcSignalBarrier mechanism in drop database implementation to force all backends to close the file handles for dropped tables. The backend that is executing the drop database command will emit a new process signal barrier and wait for other backends to accept it. But the backend which is executing the delete_data_node function will not be able to process the above mentioned signal as it will be stuck waiting for the drop database query to return. Thus the two backends end up waiting for each other causing a deadlock. Fixed it by using the async API to execute the drop database command from delete_data_node instead of the blocking remote_connection_cmdf_ok call. Fixes #4838
What type of bug is this?
Unexpected error
What subsystems and features are affected?
Data node
What happened?
TimescaleDB when compiled against PG15 causes many few tests to hang forever.
TimescaleDB version affected
2.9.0
PostgreSQL version used
15.0
What operating system did you use?
Mac OS X
What installation method did you use?
Source
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
make installcheck TESTS="data_node_bootstrap dist_hypertable-15 bgw_custom"
The text was updated successfully, but these errors were encountered: