New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in zc-opencon
thread on node shutdown
#5980
Comments
I think I'm starting to understand what is going on here.
Lines 1936 to 1940 in aad3da4
Every avenue that opens a connection attempts to acquire a grant from this semapore:
In all cases the acquired grant is passed to Lines 1658 to 1664 in aad3da4
In particular, once there are at most I originally assumed that Lines 1989 to 1995 in aad3da4
Lines 2024 to 2026 in aad3da4
(Note how the class declaration is also used to initialize a static instance of itself; I missed this during my initial read.) Instead, it appears that Bitcoin Core (as of 0.11.2, when we forked) manually "released" the semaphore Lines 1973 to 1978 in aad3da4
But |
I think this might have been caused by #5057 (which we merged after 5.0.0), specifically the backport of 4ad38a4 which switched |
This partially reverts commit 4ad38a4 to fix a deadlock introduced by that commit. Part of zcash#5980.
#5982 fixes the deadlock by partially reverting the commit to move |
I ran my
zcashd
mainnet node for an hour or two, syncing the last 200k blocks. I then usedzcash-cli stop
to stop the node, and it hung.debug.log
output after callingzcash-cli stop
:gdb
infoThe remaining threads are:
We can categorise these:
crossbeam
thread on the Rust side.metrics-exporter-prometheus
thread on the Rust side.zc-opencon
thread launched by the net backend.Per above, the main thread is waiting for
WaitForShutdown()
, specifically onthreadGroup->join_all()
. This is the thread group used for most of the C++ threads. As for the other threads:StopRPC()
, which is called fromShutdown()
, which is called afterWaitForShutdown()
.This leaves the
zc-opencon
thread, which is started here as part of the thread group:zcash/src/net.cpp
Lines 1962 to 1964 in aad3da4
This thread is waiting here:
which is waiting on line 1502 here:
zcash/src/net.cpp
Lines 1494 to 1503 in aad3da4
So there is some kind of deadlock, or something similar, where the
zc-opencon
thread is waiting on a semaphore grant, and whatever could grant it has either already stopped, or can't run because it would do so afterWaitForShutdown()
.The text was updated successfully, but these errors were encountered: