Deadlock in Berkeley DB between msghand and txnotify threads #4502

solardiz · 2020-05-06T23:48:31Z

Describe the issue

As observed in a fork of Zcash last sync'ed to Zcash 2.1.0-1 upstream, when running a node with an unusually large wallet.dat (of 2+ GB in this case), the node would deadlock every few days (or every few thousand blocks) with cs_main taken and never released.

Can you reliably reproduce the issue?

Unfortunately, it kept reoccurring, but we've since worked around it and don't want to see it again.

If so, please list the steps to reproduce below:

Something like this will likely trigger it again:

Start a node, generate a t-address.
Receive a payment to the address in every block, send some payments from the address daily.
Let wallet.dat grow to 2+ GB. Observe the node start to lock up every few days.

Expected behaviour

The node should continue working without locking up.

Actual behaviour + errors

The node locks up. When the metrics screen is enabled, it stops being updated. There's no relevant error in debug.log, and nothing new gets written to there when the node is locked up.

The version of Zcash you were using:

Resistance Core 2.1.0-2 (last sync'ed to Zcash 2.1.0-1 upstream).

Machine specs:

OS name + version: Ubuntu 18.04.3 LTS
CPU: Amazon AWS instance with 8 vCPUs
RAM: 32 GB
Disk size: 100 GB
Disk Type (HD/SDD): unknown, possibly rate-limited by AWS
Linux kernel version (uname -a): Linux 4.15.0-1065-aws x86_64
Compiler version (gcc --version): gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Linker version (ld -v): GNU ld (GNU Binutils for Ubuntu) 2.30
Assembler version (as --version): GNU assembler (GNU Binutils for Ubuntu) 2.30

Any extra information that might be useful in the debugging process.

My analysis of the problem is as follows:

msghand and txnotify threads both proceeded through a chain of calls eventually ending up in Berkeley DB. There they deadlocked against each other, presumably (I didn't analyze this in detail) on BDB taking two mutexes in different order. msghand has cs_lock taken, txnotify does not.

A workaround, which seems to have worked, is to have txnotify take cs_lock before proceeding into those calls, which prevents msghand from also going there simultaneously:

+++ b/src/txmempool.cpp
@@ -753,6 +753,7 @@ void CTxMemPool::NotifyRecentlyAdded()
     // wallet transaction's block information.
     for (auto tx : txs) {
         try {
+            LOCK(cs_main);
             SyncWithWallets(tx, NULL);
         } catch (const boost::thread_interrupted&) {
             throw;

Another workaround (untested) could be to introduce locks in Zcash's BDB wrapper functions, which would cover all potential code paths leading to the deadlock (not just the specific scenario above) and would possibly be cheaper performance-wise (not keeping one global lock for long), but would result in a more invasive patch.

One similar (but not exactly the same) bug within BDB is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1349779

Reviewing upstream BDB 6.2.x fixes listed at https://download.oracle.com/otndocs/products/berkeleydb/html/changelog_6_2.html the closest match I see is "Fixed a bug that may cause self-deadlock during database compaction. [#23725]", which looks like the same as the Red Hat bug above, but doesn't look exactly the same as what we have (Red Hat's backtraces include database compaction, ours don't). So upgrading BDB to latest 6.2.x isn't very promising.

One thing I'm confused about is why Zcash even uses BDB, given that Bitcoin Core seems to have migrated to LevelDB before Zcash forked from it. Moreover, Zcash brings its own choice of BDB version, different from what Bitcoin had used. Was this a deliberate preference over both LevelDB and older Bitcoin's older BDB?

Do you have a backup of `~/.zcash` directory and/or take a VM snapshot?

I have gdb backtraces, here:

* 35   Thread 0x7f79737fe700 (LWP 7162) "resistance-msgh" 0x00007f79b047a9f3 in futex_wait_cancelable (private=<optimized out>, 
    expected=0, futex_word=0x56465c7d7fe0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

#2  __pthread_cond_wait (cond=0x56465c7d7fb8, mutex=0x56465c7d7f90) at pthread_cond_wait.c:655
#3  0x000056465af99b46 in __db_hybrid_mutex_suspend ()
#4  0x000056465af98bd4 in __db_tas_mutex_lock ()
#5  0x000056465aff8a91 in __lock_get_internal ()
#6  0x000056465aff8f2d in __lock_get ()
#7  0x000056465af3f3db in __db_lget ()
#8  0x000056465afb448d in __bam_search ()
#9  0x000056465afa079c in __bamc_search ()
#10 0x000056465afa4e23 in __bamc_put ()
#11 0x000056465af328e8 in __dbc_iput ()
#12 0x000056465af34763 in __dbc_put ()
#13 0x000056465af2dc75 in __db_put ()
#14 0x000056465af3d5bd in __db_put_pp ()
#15 0x000056465af0e9f3 in Db::put(DbTxn*, Dbt*, Dbt*, unsigned int) ()
#16 0x000056465ad37706 in CDB::Write<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, uint256>, CWalletTx> (fOverwrite=true, value=..., key={...}, this=0x7f79737fbcc0) at ./wallet/db.h:167
#17 CWalletDB::WriteTx (this=this@entry=0x7f79737fbcc0, hash=..., wtx=...) at wallet/walletdb.cpp:61
#18 0x000056465ad22157 in CWallet::SetBestChainINTERNAL<CWalletDB> (this=this@entry=0x56466775f6a0, walletdb=..., loc=...)
    at ./wallet/wallet.h:807
#19 0x000056465ad05f76 in CWallet::SetBestChain (this=0x56466775f6a0, loc=...) at wallet/wallet.cpp:642
...
#35 FlushStateToDisk (state=..., mode=mode@entry=FLUSH_STATE_PERIODIC) at main.cpp:3124
#36 0x000056465ab12c62 in ActivateBestChain (state=..., chainparams=..., pblock=pblock@entry=0x7f79737fcd90) at main.cpp:3532
#37 0x000056465ab13676 in ProcessNewBlock (state=..., chainparams=..., pfrom=pfrom@entry=0x7f795c02f790, 
    pblock=pblock@entry=0x7f79737fcd90, fForceProcessing=<optimized out>, dbp=dbp@entry=0x0) at main.cpp:4264
#38 0x000056465ab19216 in ProcessMessage (pfrom=pfrom@entry=0x7f795c02f790, strCommand="block", vRecv=..., 
    nTimeReceived=nTimeReceived@entry=1586241648619050) at main.cpp:6143
#39 0x000056465ab1b488 in ProcessMessages (pfrom=0x7f795c02f790) at main.cpp:6503

* 30   Thread 0x7f797ab72700 (LWP 7156) "resistance-txno" 0x00007f79b047a9f3 in futex_wait_cancelable (private=<optimized out>, 
    expected=0, futex_word=0x56465c7e3de0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

#2  __pthread_cond_wait (cond=0x56465c7e3db8, mutex=0x56465c7e3d90) at pthread_cond_wait.c:655
#3  0x000056465af99b46 in __db_hybrid_mutex_suspend ()
#4  0x000056465af98bd4 in __db_tas_mutex_lock ()
#5  0x000056465aff8a91 in __lock_get_internal ()
#6  0x000056465aff8f2d in __lock_get ()
#7  0x000056465af3f3db in __db_lget ()
#8  0x000056465af3f65c in __db_new ()
#9  0x000056465af445fa in __db_poff ()
#10 0x000056465afa8269 in __bam_ovput ()
#11 0x000056465afa999f in __bam_iitem ()
#12 0x000056465afa4c44 in __bamc_put ()
#13 0x000056465af328e8 in __dbc_iput ()
#14 0x000056465af34763 in __dbc_put ()
#15 0x000056465af2dc75 in __db_put ()
#16 0x000056465af3d5bd in __db_put_pp ()
#17 0x000056465af0e9f3 in Db::put(DbTxn*, Dbt*, Dbt*, unsigned int) ()
#18 0x000056465ad37706 in CDB::Write<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, uint256>, CWalletTx> (fOverwrite=true, value=..., key={...}, this=0x7f797ab71470) at ./wallet/db.h:167
#19 CWalletDB::WriteTx (this=this@entry=0x7f797ab71470, hash=..., wtx=...) at wallet/walletdb.cpp:61
#20 0x000056465acf4888 in CWalletTx::WriteToDisk (this=this@entry=0x7f7974750100, pwalletdb=pwalletdb@entry=0x7f797ab71470)
    at wallet/wallet.cpp:2420
#21 0x000056465ad0914b in CWallet::AddToWallet (this=this@entry=0x56466775f6a0, wtxIn=...,
    fFromLoadWallet=fFromLoadWallet@entry=false, pwalletdb=pwalletdb@entry=0x7f797ab71470) at wallet/wallet.cpp:1599
#22 0x000056465ad09aaf in CWallet::AddToWalletIfInvolvingMe (this=this@entry=0x56466775f6a0, tx=..., pblock=pblock@entry=0x0,
    fUpdate=fUpdate@entry=true) at wallet/wallet.cpp:1705
#23 0x000056465ad09c11 in CWallet::SyncTransaction (this=0x56466775f6a0, tx=..., pblock=0x0) at wallet/wallet.cpp:1714
...
#39 SyncWithWallets (tx=..., pblock=pblock@entry=0x0) at validationinterface.cpp:58
#40 0x000056465ac43e6d in CTxMemPool::NotifyRecentlyAdded (this=this@entry=0x56465b853220 <mempool>) at txmempool.cpp:756
#41 0x000056465aaa5a8a in ThreadNotifyRecentlyAdded () at init.cpp:679
#42 0x000056465aaca0a6 in TraceThread<void (*)()> (name=<optimized out>, func=0x56465aaa5a4d <ThreadNotifyRecentlyAdded()>)
    at ./util.h:244

The text was updated successfully, but these errors were encountered:

daira · 2020-05-07T11:35:08Z

Thanks for the report.

One thing I'm confused about is why Zcash even uses BDB, given that Bitcoin Core seems to have migrated to LevelDB before Zcash forked from it. Moreover, Zcash brings its own choice of BDB version, different from what Bitcoin had used. Was this a deliberate preference over both LevelDB and older Bitcoin's older BDB?

Bitcoin didn't migrate to LevelDB for the wallet (and still hasn't).

There are security-relevant bugs in the version of BDB that Bitcoin uses, and that's the reason why we upgraded: #1255. This has caused us nothing but problems because the version of BDB that was current at the time (6.2.23) is quite buggy, but that is not something we could have predicted.

solardiz · 2020-05-07T12:51:42Z

Thank you for explaining this, @daira. Makes sense.

One other thought I had: should BDB compaction possibly be sometimes explicitly invoked by zcashd, to prevent the wallet from growing this large unnecessarily? Maybe on node startup when a command-line option is given, and that option should be advertised in a warning message printed when the wallet size is above a threshold and it hasn't been compacted for a while? I don't know much about BDB compaction, so am just thinking out loud. That 2+ GB wallet.dat started growing even quicker after that point (from 2.0 to 2.4 GB in just a few days), but starting anew with re-imported private key and rescanned blockchain it reduced to 100 MB and seems to have slowed down its further growth. My guess is that invoking BDB compaction could result in the same, but maybe be quicker than a blockchain rescan. (Given the "Red Hat" bug above, it probably can't be done on a live node without triggering deadlocks as well. But it is probably safe to do on startup.)

solardiz added the C-bug Category: This is a bug label May 6, 2020

daira added thread safety A-wallet-database Area: Wallet database and serialization labels May 7, 2020

LarryRuane mentioned this issue Feb 17, 2021

RPC stops working and http queue fills up #4970

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock in Berkeley DB between msghand and txnotify threads #4502

Deadlock in Berkeley DB between msghand and txnotify threads #4502

solardiz commented May 6, 2020

daira commented May 7, 2020 •

edited

solardiz commented May 7, 2020

Deadlock in Berkeley DB between msghand and txnotify threads #4502

Deadlock in Berkeley DB between msghand and txnotify threads #4502

Comments

solardiz commented May 6, 2020

Describe the issue

Can you reliably reproduce the issue?

If so, please list the steps to reproduce below:

Expected behaviour

Actual behaviour + errors

The version of Zcash you were using:

Machine specs:

Any extra information that might be useful in the debugging process.

Do you have a backup of ~/.zcash directory and/or take a VM snapshot?

daira commented May 7, 2020 • edited

solardiz commented May 7, 2020

Do you have a backup of `~/.zcash` directory and/or take a VM snapshot?

daira commented May 7, 2020 •

edited