New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace SigCache implementation with CuckooCache #5066
Conversation
@zkbot try |
⌛ Trying commit 21841d215df633340a0bb999321c473104f2627d with merge bb6e5fad4c57864874b83e996e419f9efa82f681... |
I'm a bit confused as to why it's necessary to do this signature caching at all. We have a mempool that is limited in size/cost according to ZIP 401. We also have the ability to look up transactions by their txid (or, after NU5, a transaction including authorizing data by its authorizing digest). So, why not store a flag along with every in-memory signature saying whether or not it has been validated yet? We shouldn't need a separate cache. I see the argument for following Bitcoin in order to be able to more easily port their changes, but there seems to be a lot of complexity here (programming lock-free data structures is notoriously difficult and error-prone) that I don't think we need in the medium/long term. |
That is the approach taken for Meanwhile, the sigcache is a global that stores a flag for every validated signature, and is already integrated into the transparent signature verification stack.
In particular, we will not see false positives (which would be a consensus-critical error if they occurred, causing an unvalidated signature to be assumed valid), because the cache entries are still SHA-256 hashes as before. All that changes in this PR is that the performance of looking up those cache entries improves. |
The several-years usage in Bitcoin is why I'm not vetoing this PR! I still think it's objectively a bad design. We need a transaction cache, not a signature cache. If transactions are not duplicated in memory then signatures won't be, because all signatures in Zcash are dependent on a digest of a particular transaction. The potential errors that worry me about lock-free data structures in general are memory-safety errors, not just incorrect operation. Bitcoin uses locking and threads particularly badly, and we inherit that, but locks need not be a performance issue in a good concurrency design. |
I think the time to adjust the cache style would be if/when we add batch verification of transparent signatures. Until then, I do not see the benefit of spending our engineering time deviating from upstream here.
This would be a good candidate to switch to a Rust primitive! And certainly after this PR it should be a pretty modular component to swap out. |
b11195a
to
0577526
Compare
Rebased to fix merge conflicts. |
I'm not sure how heavily it's been tested/reviewed but I did find a version of the CuckooCache written in Rust https://github.com/JeremyRubin/cuckoocache |
SQUASHME: Change cuckoocache to only work for powers of two, to avoid mod operator SQUASHME: Update Documentation and simplify logarithm logic SQUASHME: OSX Build Errors SQUASHME: minor Feedback from sipa + bluematt SQUASHME: DOCONLY: Clarify a few comments. (cherry picked from commit bitcoin/bitcoin@c9e69fb)
SQUASHME: Update Tests for other SQUASHMEs (cherry picked from commit bitcoin/bitcoin@67dac4e)
(cherry picked from commit bitcoin/bitcoin@7482781)
In Olaoluwa Osuntokun's recent protocol proposal they were using a mod in an inner loop. I wanted to suggest a normative protocol change to use the trick we use here, but to find an explanation of it I had to dig up the PR on github. After I posted about it several other developers commented that it was very interesting and they were unaware of it. I think ideally the code should be self documenting and help educate other contributors about non-obvious techniques that we use. So I've written a description of the technique with citations for future reference. (cherry picked from commit bitcoin/bitcoin@dd869c6)
…integer conversion (cherry picked from commit bitcoin/bitcoin@9142dfe)
(cherry picked from commit bitcoin/bitcoin@7aad3b6)
(cherry picked from commit bitcoin/bitcoin@98fbd1c)
(cherry picked from commit bitcoin/bitcoin@3f098cc)
This moves the SignatureCacheHasher to the sigcache header, out of the anonymous namespace, so that the tests can import it. (cherry picked from commit bitcoin/bitcoin@f9c8807)
Identifiers beginning with an underscore followed immediately by an uppercase letter are reserved. (cherry picked from commit bitcoin/bitcoin@bc70ab5) Zcash: We merged the other half of this in zcash/zcash@36463d4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK. I verified the port of each commit.
Get(const uint256& entry, const bool erase) | ||
{ | ||
boost::shared_lock<boost::shared_mutex> lock(cs_sigcache); | ||
return setValid.count(entry); | ||
return setValid.contains(entry, erase); | ||
} | ||
|
||
void Erase(const uint256& entry) | ||
void Set(uint256& entry) | ||
{ | ||
boost::unique_lock<boost::shared_mutex> lock(cs_sigcache); | ||
setValid.erase(entry); | ||
setValid.insert(entry); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To finish addressing @daira's std::memory_order_relaxed
concern: CSignatureCache::Get
and CSignatureCache::Set
are the only locations where CuckooCache
is accessed, and they use a boost::unique_lock
to ensure that writes and reads are properly synchronized.
cool to see this pulled in! @daira is a bit off in understanding this as a signature cache -- it's a generic cache that can be used for signatures. You can throw at it anything that you can deterministically hash. So transactions + witnesses should also work, as long as when accepted to the mempool and included in a block the hash would be the same. originally in bitcoin we just cached the signatures, but technically it can do an arbitrary entries, see https://github.com/bitcoin/bitcoin/blob/a7f3479ba3fda4c9fb29bd7080165744c02ee921/src/validation.cpp#L1650 this caches the entire transaction witness so is, as you say, an entire transaction cache. https://github.com/bitcoin/bitcoin/blob/a7f3479ba3fda4c9fb29bd7080165744c02ee921/src/validation.cpp#L1706 note:There is one interesting bit in here that IIRC is "mildly incorrect", which is that for cache entries which generate duplicate indexes (which is somewhat rare) they will not properly move to the next location. E.g., if the hashes are [1,2,3,1,5,6,7], then however, in theory, you could patch this by having a dynamic offset (e.g., counting total number of inserts, random, etc) that shifts the starting index where you scan for the element from (e.g., there are 8 keys, so doing for(i = 0; i< 8; ++i) {if (current == loc[i+offset & 7]) { // move to next slot }}), or if you do something like rejection sampling and rehashing for neglible performance impact (e.g., if the hash contains duplicates, hash the hash again and see if the projection yields duplicate indexes). I did out the math somewhere and it's a negligible difference, but just an FYI. It can't be adversarially taken advantage of if your hashes are salted. It therefore wasn't really worth fixing in Bitcoin, but I wanted to make sure the knowledge passed along if you're pulling it in. |
also one note on safety: the users of cuckoocache in Bitcoin use stricter memory safety properties than the underlying cache actually needs, but that's OK. In theory it's safe to forgoe the read lock entirely since the checkqueue should already be synchronized via the release of the work items to the processors, and the lock taken in block validation is sufficient to exclude any writers. but as @daira notes, memory is scary :) |
Thank you for the notes, and for the upstream PR!
Yep; we're about to start using it to cache validity of proofs and signatures for the shielded components at a "bundle" level (i.e. caching the validity of the Sapling part of a tx as a unit, and the Orchard part, etc) since that's the simplest way to integrate into the current stack. We could also start caching at a transaction level, but that's likely a further refactor away (and is made a bit more complex by us supporting both v5 transactions with non-malleable txids and v4 transactions without them). |
Cherry-picked from the following upstream PRs: