Replace SigCache implementation with CuckooCache #5066

str4d · 2021-04-20T09:46:54Z

Cherry-picked from the following upstream PRs:

str4d · 2021-04-20T09:47:39Z

zkbot · 2021-04-20T09:47:44Z

⌛ Trying commit 21841d215df633340a0bb999321c473104f2627d with merge bb6e5fad4c57864874b83e996e419f9efa82f681...

daira · 2021-05-04T13:31:11Z

I'm a bit confused as to why it's necessary to do this signature caching at all.

We have a mempool that is limited in size/cost according to ZIP 401. We also have the ability to look up transactions by their txid (or, after NU5, a transaction including authorizing data by its authorizing digest). So, why not store a flag along with every in-memory signature saying whether or not it has been validated yet? We shouldn't need a separate cache.

I see the argument for following Bitcoin in order to be able to more easily port their changes, but there seems to be a lot of complexity here (programming lock-free data structures is notoriously difficult and error-prone) that I don't think we need in the medium/long term.

str4d · 2021-05-07T23:30:43Z

why not store a flag along with every in-memory signature saying whether or not it has been validated yet?

That is the approach taken for CBlockIndex objects (their "validity level"), as we have a global store for those. CTransaction objects are not global, so we cannot rely on the tx instance we see inside VerifySignature being the exact same one that was obtained from the mempool (in fact, right now that is absolutely not the case, since we still haven't backported upstream's CompactBlock network logic). The only place we could reliably store that information is CTransaction objects in the mempool, which requires locking to read and is not designed for this purpose.

Meanwhile, the sigcache is a global that stores a flag for every validated signature, and is already integrated into the transparent signature verification stack.

there seems to be a lot of complexity here (programming lock-free data structures is notoriously difficult and error-prone) that I don't think we need in the medium/long term.

CuckooCache was added to Bitcoin Core 5.5 years ago, and has been used in production with significantly higher signature throughput than we have in our chain. But putting that aside, I only see two kinds of locking-related errors that could occur with a cache:

Deadlocks: these IMHO are more likely to happen if we try to reuse the mempool for caching.
False negatives: if we get a cache miss due to a locking error, then we just validate that signature twice. No big deal.

In particular, we will not see false positives (which would be a consensus-critical error if they occurred, causing an unvalidated signature to be assumed valid), because the cache entries are still SHA-256 hashes as before. All that changes in this PR is that the performance of looking up those cache entries improves.

daira · 2021-05-09T07:34:51Z

The several-years usage in Bitcoin is why I'm not vetoing this PR!

I still think it's objectively a bad design. We need a transaction cache, not a signature cache. If transactions are not duplicated in memory then signatures won't be, because all signatures in Zcash are dependent on a digest of a particular transaction.

The potential errors that worry me about lock-free data structures in general are memory-safety errors, not just incorrect operation. Bitcoin uses locking and threads particularly badly, and we inherit that, but locks need not be a performance issue in a good concurrency design.

str4d · 2021-05-09T12:44:14Z

We need a transaction cache, not a signature cache.

I think the time to adjust the cache style would be if/when we add batch verification of transparent signatures. Until then, I do not see the benefit of spending our engineering time deviating from upstream here.

The potential errors that worry me about lock-free data structures in general are memory-safety errors, not just incorrect operation.

This would be a good candidate to switch to a Rust primitive! And certainly after this PR it should be a pretty modular component to swap out.

str4d · 2022-05-13T22:07:21Z

Rebased to fix merge conflicts.

softminus · 2022-06-07T20:54:06Z

This would be a good candidate to switch to a Rust primitive! And certainly after this PR it should be a pretty modular component to swap out.

I'm not sure how heavily it's been tested/reviewed but I did find a version of the CuckooCache written in Rust https://github.com/JeremyRubin/cuckoocache

SQUASHME: Change cuckoocache to only work for powers of two, to avoid mod operator SQUASHME: Update Documentation and simplify logarithm logic SQUASHME: OSX Build Errors SQUASHME: minor Feedback from sipa + bluematt SQUASHME: DOCONLY: Clarify a few comments. (cherry picked from commit bitcoin/bitcoin@c9e69fb)

SQUASHME: Update Tests for other SQUASHMEs (cherry picked from commit bitcoin/bitcoin@67dac4e)

(cherry picked from commit bitcoin/bitcoin@7482781)

In Olaoluwa Osuntokun's recent protocol proposal they were using a mod in an inner loop. I wanted to suggest a normative protocol change to use the trick we use here, but to find an explanation of it I had to dig up the PR on github. After I posted about it several other developers commented that it was very interesting and they were unaware of it. I think ideally the code should be self documenting and help educate other contributors about non-obvious techniques that we use. So I've written a description of the technique with citations for future reference. (cherry picked from commit bitcoin/bitcoin@dd869c6)

…integer conversion (cherry picked from commit bitcoin/bitcoin@9142dfe)

(cherry picked from commit bitcoin/bitcoin@7aad3b6)

(cherry picked from commit bitcoin/bitcoin@98fbd1c)

(cherry picked from commit bitcoin/bitcoin@3f098cc)

This moves the SignatureCacheHasher to the sigcache header, out of the anonymous namespace, so that the tests can import it. (cherry picked from commit bitcoin/bitcoin@f9c8807)

Identifiers beginning with an underscore followed immediately by an uppercase letter are reserved. (cherry picked from commit bitcoin/bitcoin@bc70ab5) Zcash: We merged the other half of this in zcash/zcash@36463d4.

str4d · 2022-07-08T21:02:03Z

Rebased to fix trivial merge conflict with #6043, and bring in new CI components.

nuttycom

utACK. I verified the port of each commit.

src/cuckoocache.h

str4d · 2022-07-12T11:28:54Z

src/script/sigcache.cpp

+    Get(const uint256& entry, const bool erase)
    {
        boost::shared_lock<boost::shared_mutex> lock(cs_sigcache);
-        return setValid.count(entry);
+        return setValid.contains(entry, erase);
    }

-    void Erase(const uint256& entry)
+    void Set(uint256& entry)
    {
        boost::unique_lock<boost::shared_mutex> lock(cs_sigcache);
-        setValid.erase(entry);
+        setValid.insert(entry);
    }


To finish addressing @daira's std::memory_order_relaxed concern: CSignatureCache::Get and CSignatureCache::Set are the only locations where CuckooCache is accessed, and they use a boost::unique_lock to ensure that writes and reads are properly synchronized.

JeremyRubin · 2022-07-12T22:22:42Z

cool to see this pulled in!

@daira is a bit off in understanding this as a signature cache -- it's a generic cache that can be used for signatures.

You can throw at it anything that you can deterministically hash. So transactions + witnesses should also work, as long as when accepted to the mempool and included in a block the hash would be the same.

originally in bitcoin we just cached the signatures, but technically it can do an arbitrary entries, see https://github.com/bitcoin/bitcoin/blob/a7f3479ba3fda4c9fb29bd7080165744c02ee921/src/validation.cpp#L1650

this caches the entire transaction witness so is, as you say, an entire transaction cache. https://github.com/bitcoin/bitcoin/blob/a7f3479ba3fda4c9fb29bd7080165744c02ee921/src/validation.cpp#L1706

note:

There is one interesting bit in here that IIRC is "mildly incorrect", which is that for cache entries which generate duplicate indexes (which is somewhat rare) they will not properly move to the next location. E.g., if the hashes are [1,2,3,1,5,6,7], then
the element will start in position 1, then get bumped to 2, 3, and then back to 1. When it moves back to 1, it will then go to 2, 3, etc. This isn't a big deal because for reasonably sized tables it's pretty rare that this happens, and when it does happen, you still have a reasonable number of hash values for the cuckoo cache to work properly.

however, in theory, you could patch this by having a dynamic offset (e.g., counting total number of inserts, random, etc) that shifts the starting index where you scan for the element from (e.g., there are 8 keys, so doing for(i = 0; i< 8; ++i) {if (current == loc[i+offset & 7]) { // move to next slot }}), or if you do something like rejection sampling and rehashing for neglible performance impact (e.g., if the hash contains duplicates, hash the hash again and see if the projection yields duplicate indexes).

I did out the math somewhere and it's a negligible difference, but just an FYI. It can't be adversarially taken advantage of if your hashes are salted. It therefore wasn't really worth fixing in Bitcoin, but I wanted to make sure the knowledge passed along if you're pulling it in.

JeremyRubin · 2022-07-12T22:25:18Z

also one note on safety:

the users of cuckoocache in Bitcoin use stricter memory safety properties than the underlying cache actually needs, but that's OK. In theory it's safe to forgoe the read lock entirely since the checkqueue should already be synchronized via the release of the work items to the processors, and the lock taken in block validation is sufficient to exclude any writers.

but as @daira notes, memory is scary :)

str4d · 2022-07-12T22:55:37Z

Thank you for the notes, and for the upstream PR!

You can throw at it anything that you can deterministically hash. So transactions + witnesses should also work, as long as when accepted to the mempool and included in a block the hash would be the same.

Yep; we're about to start using it to cache validity of proofs and signatures for the shielded components at a "bundle" level (i.e. caching the validity of the Sapling part of a tx as a unit, and the Orchard part, etc) since that's the simplest way to integrate into the current stack. We could also start caching at a transaction level, but that's likely a further refactor away (and is made a bit more complex by us supporting both v5 transactions with non-malleable txids and v4 transactions without them).

str4d added I-performance Problems and improvements with respect to performance C-upstream-port Category: Changes that are ported from the Bitcoin Core codebase. labels Apr 20, 2021

str4d added this to the Core Sprint 2021-14 milestone Apr 20, 2021

mdr0id closed this Apr 20, 2021

mdr0id reopened this Apr 21, 2021

mdr0id added safe-to-build Used to send PR to prod CI environment and removed safe-to-build Used to send PR to prod CI environment labels Apr 21, 2021

r3ld3v modified the milestones: Core Sprint 2021-14, Core Sprint 2021-16 Apr 26, 2021

mdr0id added safe-to-build Used to send PR to prod CI environment and removed safe-to-build Used to send PR to prod CI environment labels May 3, 2021

str4d added A-chain-sync Area: Chain synchronization / Initial Block Download E-help-wanted Call for participation: Help is requested to fix this issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 25, 2021

str4d dismissed a stale review via 97875ef September 29, 2021 13:01

str4d force-pushed the cuckoocache branch 2 times, most recently from b11195a to 0577526 Compare September 29, 2021 21:26

str4d dismissed a stale review via ef5aa2b May 13, 2022 22:06

str4d force-pushed the cuckoocache branch from 0577526 to ef5aa2b Compare May 13, 2022 22:06

JeremyRubin and others added 10 commits July 8, 2022 21:01

Add unit tests for the CuckooCache

090ea70

SQUASHME: Update Tests for other SQUASHMEs (cherry picked from commit bitcoin/bitcoin@67dac4e)

Allow non-power-of-2 signature cache sizes

d0a12fd

(cherry picked from commit bitcoin/bitcoin@7482781)

Use explicit casting in cuckoocache's compute_hashes(...) to clarify …

43693df

…integer conversion (cherry picked from commit bitcoin/bitcoin@9142dfe)

doc: Doxygen-friendly CuckooCache comments

d012fbf

(cherry picked from commit bitcoin/bitcoin@7aad3b6)

Use correct C++11 header for std::swap()

6c08613

(cherry picked from commit bitcoin/bitcoin@98fbd1c)

Decrease testcase sizes in cuckoocache tests

1fee91e

(cherry picked from commit bitcoin/bitcoin@3f098cc)

Deduplicate SignatureCacheHasher

8e053da

This moves the SignatureCacheHasher to the sigcache header, out of the anonymous namespace, so that the tests can import it. (cherry picked from commit bitcoin/bitcoin@f9c8807)

Fix header guards using reserved identifiers

7498c60

Identifiers beginning with an underscore followed immediately by an uppercase letter are reserved. (cherry picked from commit bitcoin/bitcoin@bc70ab5) Zcash: We merged the other half of this in zcash/zcash@36463d4.

str4d force-pushed the cuckoocache branch from ef5aa2b to 7498c60 Compare July 8, 2022 21:01

str4d added safe-to-build Used to send PR to prod CI environment and removed safe-to-build Used to send PR to prod CI environment labels Jul 8, 2022

nuttycom approved these changes Jul 8, 2022

View reviewed changes

daira reviewed Jul 8, 2022

View reviewed changes

src/cuckoocache.h Show resolved Hide resolved

str4d commented Jul 12, 2022

View reviewed changes

str4d mentioned this pull request Jul 12, 2022

Backport more recent changes to CuckooCache #6072

Open

nuttycom merged commit cc9449f into zcash:master Jul 12, 2022

str4d deleted the cuckoocache branch July 12, 2022 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace SigCache implementation with CuckooCache #5066

Replace SigCache implementation with CuckooCache #5066

str4d commented Apr 20, 2021 •

edited

str4d commented Apr 20, 2021

zkbot commented Apr 20, 2021

daira commented May 4, 2021

str4d commented May 7, 2021

daira commented May 9, 2021 •

edited

str4d commented May 9, 2021

str4d commented May 13, 2022

softminus commented Jun 7, 2022

str4d commented Jul 8, 2022

nuttycom left a comment

str4d Jul 12, 2022

JeremyRubin commented Jul 12, 2022

JeremyRubin commented Jul 12, 2022

str4d commented Jul 12, 2022

Replace SigCache implementation with CuckooCache #5066

Replace SigCache implementation with CuckooCache #5066

Conversation

str4d commented Apr 20, 2021 • edited

str4d commented Apr 20, 2021

zkbot commented Apr 20, 2021

daira commented May 4, 2021

str4d commented May 7, 2021

daira commented May 9, 2021 • edited

str4d commented May 9, 2021

str4d commented May 13, 2022

softminus commented Jun 7, 2022

str4d commented Jul 8, 2022

nuttycom left a comment

Choose a reason for hiding this comment

str4d Jul 12, 2022

Choose a reason for hiding this comment

JeremyRubin commented Jul 12, 2022

note:

JeremyRubin commented Jul 12, 2022

str4d commented Jul 12, 2022

str4d commented Apr 20, 2021 •

edited

daira commented May 9, 2021 •

edited