Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in zc-opencon thread on node shutdown #5980

Open
str4d opened this issue May 31, 2022 · 3 comments
Open

Deadlock in zc-opencon thread on node shutdown #5980

str4d opened this issue May 31, 2022 · 3 comments
Labels
A-networking Area: Networking code C-bug Category: This is a bug C-upstream-port Category: Changes that are ported from the Bitcoin Core codebase. I-race Problems and improvements related to race conditions.

Comments

@str4d
Copy link
Contributor

str4d commented May 31, 2022

I ran my zcashd mainnet node for an hour or two, syncing the last 200k blocks. I then used zcash-cli stop to stop the node, and it hung.

debug.log output after calling zcash-cli stop:

2022-05-31T17:10:45.933030Z  INFO main: tor: Thread interrupt
2022-05-31T17:10:45.933403Z  INFO main: torcontrol thread exit
2022-05-31T17:10:45.951520Z  INFO main: scheduler thread interrupt
2022-05-31T17:10:45.951525Z  INFO main: msghand thread interrupt
2022-05-31T17:10:45.951542Z  INFO main: addcon thread interrupt
2022-05-31T17:10:45.951585Z  INFO main: metrics-ui thread interrupt
2022-05-31T17:10:45.951625Z  INFO main: net thread interrupt
2022-05-31T17:10:46.888470Z  INFO main: txnotify thread interrupt
[no more output, node is hung]

gdb info

The remaining threads are:

(gdb) info threads
  Id   Target Id                                             Frame 
* 1    Thread 0x7f0039483a40 (LWP 2270483) "zc-wait-to-stop" futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a114a9ac8)
    at ../sysdeps/nptl/futex-internal.h:183
  2    Thread 0x7f0038db3700 (LWP 2270484) "zcashd"          syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  3    Thread 0x7f0038afe700 (LWP 2270485) "zc-rayon-0"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108493c)
    at ../sysdeps/nptl/futex-internal.h:183
  4    Thread 0x7f00388fd700 (LWP 2270486) "zc-rayon-1"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110849a8)
    at ../sysdeps/nptl/futex-internal.h:183
  5    Thread 0x7f00386fc700 (LWP 2270487) "zc-rayon-2"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084a1c)
    at ../sysdeps/nptl/futex-internal.h:183
  6    Thread 0x7f00384fb700 (LWP 2270488) "zc-rayon-3"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084a8c)
    at ../sysdeps/nptl/futex-internal.h:183
  7    Thread 0x7f00382fa700 (LWP 2270489) "zc-rayon-4"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084afc)
    at ../sysdeps/nptl/futex-internal.h:183
  8    Thread 0x7f0023fff700 (LWP 2270490) "zc-rayon-5"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084b6c)
    at ../sysdeps/nptl/futex-internal.h:183
  9    Thread 0x7f0023dfe700 (LWP 2270491) "zc-rayon-6"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084bd8)
    at ../sysdeps/nptl/futex-internal.h:183
  10   Thread 0x7f0023bfd700 (LWP 2270492) "zc-rayon-7"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084c4c)
    at ../sysdeps/nptl/futex-internal.h:183
  11   Thread 0x7f00239fc700 (LWP 2270493) "zc-rayon-8"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084cb8)
    at ../sysdeps/nptl/futex-internal.h:183
  12   Thread 0x7f00237fb700 (LWP 2270494) "zc-rayon-9"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084d2c)
    at ../sysdeps/nptl/futex-internal.h:183
  13   Thread 0x7f00235fa700 (LWP 2270495) "zc-rayon-10"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084d9c)
    at ../sysdeps/nptl/futex-internal.h:183
  14   Thread 0x7f00233f9700 (LWP 2270496) "zc-rayon-11"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084e08)
    at ../sysdeps/nptl/futex-internal.h:183
  15   Thread 0x7f00231f8700 (LWP 2270497) "zc-rayon-12"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084e7c)
    at ../sysdeps/nptl/futex-internal.h:183
  16   Thread 0x7f0022ff7700 (LWP 2270498) "zc-rayon-13"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084ee8)
    at ../sysdeps/nptl/futex-internal.h:183
  17   Thread 0x7f0022df6700 (LWP 2270499) "zc-rayon-14"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084f5c)
    at ../sysdeps/nptl/futex-internal.h:183
  18   Thread 0x7f0022bf5700 (LWP 2270500) "zc-rayon-15"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11084fc8)
    at ../sysdeps/nptl/futex-internal.h:183
  19   Thread 0x7f00229f4700 (LWP 2270501) "zc-rayon-16"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11085038)
    at ../sysdeps/nptl/futex-internal.h:183
  20   Thread 0x7f00227f3700 (LWP 2270502) "zc-rayon-17"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110850a8)
    at ../sysdeps/nptl/futex-internal.h:183
  21   Thread 0x7f00225f2700 (LWP 2270503) "zc-rayon-18"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11085118)
    at ../sysdeps/nptl/futex-internal.h:183
  22   Thread 0x7f00223f1700 (LWP 2270504) "zc-rayon-19"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108518c)
    at ../sysdeps/nptl/futex-internal.h:183
  23   Thread 0x7f00221f0700 (LWP 2270505) "zc-rayon-20"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110851fc)
    at ../sysdeps/nptl/futex-internal.h:183
  24   Thread 0x7f0021fef700 (LWP 2270506) "zc-rayon-21"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11085268)
    at ../sysdeps/nptl/futex-internal.h:183
  25   Thread 0x7f0021dee700 (LWP 2270507) "zc-rayon-22"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110852dc)
    at ../sysdeps/nptl/futex-internal.h:183
  26   Thread 0x7f0021bed700 (LWP 2270508) "zc-rayon-23"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108534c)
    at ../sysdeps/nptl/futex-internal.h:183
  27   Thread 0x7f00219ec700 (LWP 2270509) "zc-rayon-24"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110853b8)
    at ../sysdeps/nptl/futex-internal.h:183
  28   Thread 0x7f00217eb700 (LWP 2270510) "zc-rayon-25"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a11085428)
    at ../sysdeps/nptl/futex-internal.h:183
  29   Thread 0x7f00215ea700 (LWP 2270511) "zc-rayon-26"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108549c)
    at ../sysdeps/nptl/futex-internal.h:183
  30   Thread 0x7f00213e9700 (LWP 2270512) "zc-rayon-27"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108550c)
    at ../sysdeps/nptl/futex-internal.h:183
  31   Thread 0x7f00211e8700 (LWP 2270513) "zc-rayon-28"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108557c)
    at ../sysdeps/nptl/futex-internal.h:183
  32   Thread 0x7f0020fe7700 (LWP 2270514) "zc-rayon-29"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110855ec)
    at ../sysdeps/nptl/futex-internal.h:183
  33   Thread 0x7f0020de6700 (LWP 2270515) "zc-rayon-30"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a1108565c)
    at ../sysdeps/nptl/futex-internal.h:183
  34   Thread 0x7f0020be5700 (LWP 2270516) "zc-rayon-31"     futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a110856cc)
    at ../sysdeps/nptl/futex-internal.h:183
  35   Thread 0x7effac7f8700 (LWP 2270533) "metrics-exporte" 0x00007f00395a546e in epoll_wait (epfd=5, events=0x562a11092380, maxevents=1024, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  36   Thread 0x7eff5ad3d700 (LWP 2270589) "zc-asyncrpc-1"   futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a114d4960)
    at ../sysdeps/nptl/futex-internal.h:183
  37   Thread 0x7eff4bfb1700 (LWP 2271578) "zcashd"          futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a113e11bc)
    at ../sysdeps/nptl/futex-internal.h:183
  38   Thread 0x7eff3b7fe700 (LWP 2289582) "zc-opencon"      futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562aac3e6d28)
    at ../sysdeps/nptl/futex-internal.h:183

We can categorise these:

  • 1: The main thread waiting for shutdown.
(gdb) bt
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562a114a9ac8) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x562a114a9a78, cond=0x562a114a9aa0) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x562a114a9aa0, mutex=0x562a114a9a78) at pthread_cond_wait.c:647
#3  0x0000562a0e5e92bb in boost::posix::pthread_cond_wait (c=<optimized out>, m=<optimized out>)
    at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/thread/pthread/pthread_helpers.hpp:112
#4  boost::condition_variable::wait (this=0x562a114a9a78, m=...)
    at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/thread/pthread/condition_variable.hpp:79
#5  0x0000562a0f02bca6 in boost::thread::join_noexcept() ()
#6  0x0000562a0e5e70c7 in boost::thread::join (this=0x562a1116a490)
    at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/thread/detail/thread.hpp:762
#7  boost::thread_group::join_all (this=0x7ffcfb6c3240)
    at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/thread/detail/thread_group.hpp:119
#8  0x0000562a0e5e5be7 in WaitForShutdown (threadGroup=0x7ffcfb6c3240) at bitcoind.cpp:60
#9  AppInit (argc=<optimized out>, argv=<optimized out>) at bitcoind.cpp:210
#10 0x0000562a0e5e6d29 in main (argc=290101960, argv=0x80) at bitcoind.cpp:227
  • 2: An internal crossbeam thread on the Rust side.
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f0038db3700 (LWP 2270484))]
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
38      ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000562a0eead0ca in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:25
#2  std::sys_common::thread_parker::futex::Parker::park () at library/std/src/sys_common/thread_parker/futex.rs:50
#3  std::thread::park () at library/std/src/thread/mod.rs:901
#4  0x0000562a0ea79b65 in crossbeam_channel::context::Context::wait_until ()
#5  0x0000562a0ea78ca6 in crossbeam_channel::context::Context::with::{{closure}} ()
#6  0x0000562a0ec11a9c in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#7  0x0000562a0ea4b4fc in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#8  0x0000562a0eed05a5 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854
#9  <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once ()
    at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/alloc/src/boxed.rs:1854
#10 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:108
#11 0x00007f0039802609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f00395a5133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  • 3-34: Rayon threadpool threads on the Rust side.
  • 35: metrics-exporter-prometheus thread on the Rust side.
  • 36: The worker thread for async RPC methods (we only launch one).
  • 37: A LevelDB background thread.
  • 38: The zc-opencon thread launched by the net backend.

Per above, the main thread is waiting for WaitForShutdown(), specifically on threadGroup->join_all(). This is the thread group used for most of the C++ threads. As for the other threads:

  • The Rust-side threads will stop when the process ends, so they aren't the issue.
  • The async RPC worker thread is stopped by StopRPC(), which is called from Shutdown(), which is called after WaitForShutdown().
  • The LevelDB background thread isn't managed by us, and probably stops when the process stops.

This leaves the zc-opencon thread, which is started here as part of the thread group:

zcash/src/net.cpp

Lines 1962 to 1964 in aad3da4

// Initiate outbound connections unless connect=0
if (!mapArgs.count("-connect") || mapMultiArgs["-connect"].size() != 1 || mapMultiArgs["-connect"][0] != "0")
threadGroup.create_thread(boost::bind(&TraceThread<void (*)()>, "opencon", &ThreadOpenConnections));

This thread is waiting here:

(gdb) thread 38
[Switching to thread 38 (Thread 0x7eff3b7fe700 (LWP 2289582))]
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562aac3e6d28) at ../sysdeps/nptl/futex-internal.h:183
183     ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x562aac3e6d28) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x562aac3e6d30, cond=0x562aac3e6d00) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x562aac3e6d00, mutex=0x562aac3e6d30) at pthread_cond_wait.c:647
#3  0x0000562a0f1ad98f in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) ()
#4  0x0000562a0e721b83 in std::__1::condition_variable::wait<CSemaphore::wait()::{lambda()#1}>(std::__1::unique_lock<std::__1::mutex>&, CSemaphore::wait()::{lambda()#1}) (this=0x562aac3e6d00, __lk=..., __pred=...) at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/native/bin/../include/c++/v1/__mutex_base:404
#5  CSemaphore::wait (this=0x562aac3e6d00) at ./sync.h:215
#6  CSemaphoreGrant::Acquire (this=0x7eff3b7fd990) at ./sync.h:250
#7  CSemaphoreGrant::CSemaphoreGrant (this=0x7eff3b7fd990, sema=..., fTry=false) at ./sync.h:285
#8  ThreadOpenConnections () at net.cpp:1502
#9  0x0000562a0e627242 in TraceThread<void (*)()> (name=<optimized out>, func=0x562a0e7219f0 <ThreadOpenConnections()>) at ./util.h:201
#10 0x0000562a0e63f045 in boost::_bi::list2<boost::_bi::value<char const*>, boost::_bi::value<void (*)()> >::operator()<void (*)(char const*, void (*)()), boost::_bi::list0> (this=0xffffffffffffff28, f=<error reading variable>, a=...)
    at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/bind/bind.hpp:298
#11 boost::_bi::bind_t<void, void (*)(char const*, void (*)()), boost::_bi::list2<boost::_bi::value<char const*>, boost::_bi::value<void (*)()> > >::operator() (
    this=0xffffffffffffff20) at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/bind/bind.hpp:1273
#12 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(char const*, void (*)()), boost::_bi::list2<boost::_bi::value<char const*>, boost::_bi::value<void (*)()> > > >::run (this=0xfffffffffffffe00) at /home/str4d/dev/zcash/zcash/depends/x86_64-pc-linux-gnu/share/../include/boost/thread/detail/thread.hpp:120
#13 0x0000562a0f02b88d in boost::(anonymous namespace)::thread_proxy(void*) ()
#14 0x00007f0039802609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f00395a5133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

which is waiting on line 1502 here:

zcash/src/net.cpp

Lines 1494 to 1503 in aad3da4

// Initiate network connections
int64_t nStart = GetTime();
while (true)
{
ProcessOneShot();
MilliSleep(500);
CSemaphoreGrant grant(*semOutbound);
boost::this_thread::interruption_point();

So there is some kind of deadlock, or something similar, where the zc-opencon thread is waiting on a semaphore grant, and whatever could grant it has either already stopped, or can't run because it would do so after WaitForShutdown().

@str4d str4d added C-bug Category: This is a bug I-race Problems and improvements related to race conditions. labels May 31, 2022
@str4d
Copy link
Contributor Author

str4d commented May 31, 2022

I think I'm starting to understand what is going on here.

semOutbound is used to control how many outbound network connections are allowed to be active:

zcash/src/net.cpp

Lines 1936 to 1940 in aad3da4

if (semOutbound == NULL) {
// initialize semaphore
int nMaxOutbound = std::min(MAX_OUTBOUND_CONNECTIONS, nMaxConnections);
semOutbound = new CSemaphore(nMaxOutbound);
}

Every avenue that opens a connection attempts to acquire a grant from this semapore:

  • ProcessOneShot():

    zcash/src/net.cpp

    Lines 1466 to 1468 in aad3da4

    CSemaphoreGrant grant(*semOutbound, true);
    if (grant) {
    if (!OpenNetworkConnection(addr, &grant, strDest.c_str(), true))
    • This sets fTry = true so does not block.
  • ThreadOpenConnections():

    zcash/src/net.cpp

    Lines 1494 to 1503 in aad3da4

    // Initiate network connections
    int64_t nStart = GetTime();
    while (true)
    {
    ProcessOneShot();
    MilliSleep(500);
    CSemaphoreGrant grant(*semOutbound);
    boost::this_thread::interruption_point();

    zcash/src/net.cpp

    Lines 1567 to 1568 in aad3da4

    if (addrConnect.IsValid())
    OpenNetworkConnection(addrConnect, &grant);
  • ThreadOpenAddedConnections():
    • These only apply if either of the -addnode option or addnode RPC method are used.
    • if (HaveNameProxy()):

      zcash/src/net.cpp

      Lines 1587 to 1593 in aad3da4

      for (const std::string& strAddNode : lAddresses) {
      CAddress addr;
      CSemaphoreGrant grant(*semOutbound);
      OpenNetworkConnection(addr, &grant, strAddNode.c_str());
      MilliSleep(500);
      }
      MilliSleep(120000); // Retry every 2 minutes
    • else:

      zcash/src/net.cpp

      Lines 1633 to 1639 in aad3da4

      for (vector<CService>& vserv : lservAddressesToAdd)
      {
      CSemaphoreGrant grant(*semOutbound);
      OpenNetworkConnection(CAddress(vserv[i % vserv.size()]), &grant);
      MilliSleep(500);
      }
      MilliSleep(120000); // Retry every 2 minutes

In all cases the acquired grant is passed to OpenNetworkConnection(), which then moves it into the CNode if an outbound connection is successfully opened:

zcash/src/net.cpp

Lines 1658 to 1664 in aad3da4

CNode* pnode = ConnectNode(addrConnect, pszDest);
boost::this_thread::interruption_point();
if (!pnode)
return false;
if (grantOutbound)
grantOutbound->MoveTo(pnode->grantOutbound);

In particular, once there are at most MAX_OUTBOUND_CONNECTIONS open, ThreadOpenConnections() will (intentionally) block until one of the connections closes, running its CNode destructor and releasing a grant. Crucially, this block does not respond to Boost thread interrupts (stdlib C++ threads in fact have no interruption ability at all, let alone listening to Boost), which is what WaitForShutdown() uses!

I originally assumed that CNetCleanup was used to close all open sockets, which runs the CNode destructors for all active peers. This would unblock the grant request in ThreadOpenAddedConnections(), which would then immediately run into the boost::this_thread::interruption_point() and end the zc-opencon thread. But this only happens statically when the process ends, and thus cannot help:

zcash/src/net.cpp

Lines 1989 to 1995 in aad3da4

static class CNetCleanup
{
public:
CNetCleanup() {}
~CNetCleanup()
{

zcash/src/net.cpp

Lines 2024 to 2026 in aad3da4

}
}
instance_of_cnetcleanup;

(Note how the class declaration is also used to initialize a static instance of itself; I missed this during my initial read.)

Instead, it appears that Bitcoin Core (as of 0.11.2, when we forked) manually "released" the semaphore MAX_OUTBOUND_CONNECTIONS times during StopNode():

zcash/src/net.cpp

Lines 1973 to 1978 in aad3da4

bool StopNode()
{
LogPrintf("StopNode()\n");
if (semOutbound)
for (int i=0; i<MAX_OUTBOUND_CONNECTIONS; i++)
semOutbound->post();

But StopNode() is called from Shutdown(), which is called after WaitForShutdown(), causing the deadlock.

@str4d
Copy link
Contributor Author

str4d commented May 31, 2022

I think this might have been caused by #5057 (which we merged after 5.0.0), specifically the backport of 4ad38a4 which switched sync.{cpp, h} from Boost threading primitives to std threading primitives. I haven't yet confirmed whether boost::condition_variable respects Boost thread interruption, but it seems likely; std::condition_variable definitely doesn't. If this is the cause, the next step is to look at upstream's codebase as of the PR that was backported, and see what upstream PRs it was implicitly relying on.

str4d added a commit to str4d/zcash that referenced this issue May 31, 2022
This partially reverts commit 4ad38a4
to fix a deadlock introduced by that commit.

Part of zcash#5980.
@str4d
Copy link
Contributor Author

str4d commented May 31, 2022

#5982 fixes the deadlock by partially reverting the commit to move CSemaphore back to Boost threading primitives. We will need to delay that part until after we've figured out why upstream Bitcoin Core didn't encounter this bug (i.e. what other PRs we need to backport first).

@str4d str4d added this to the Release 5.1.0 milestone Jun 1, 2022
@nuttycom nuttycom modified the milestones: Release 5.2.0, Release 5.3.0 Jul 18, 2022
@daira daira modified the milestones: Release 5.3.0, Release 5.4.0 Sep 19, 2022
@str4d str4d removed this from the Release 5.4.0 milestone Jan 9, 2023
@str4d str4d added C-upstream-port Category: Changes that are ported from the Bitcoin Core codebase. A-networking Area: Networking code labels Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-networking Area: Networking code C-bug Category: This is a bug C-upstream-port Category: Changes that are ported from the Bitcoin Core codebase. I-race Problems and improvements related to race conditions.
Projects
None yet
Development

No branches or pull requests

3 participants