-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computation of hSig requires 1 byte more than a BLAKE2b input block #36
Comments
Why not just hash joinSplitPubKey before hSig? We may want to hash something else someday anyway, if we change the format and behavior of the transaction. |
That's true, that we might want to hash something else. I was thinking we would change hSigtag in that case, but hashing \ 0. Leave it as-is. I'm leaning toward 2 now; anyone else have a preference? [Edit: added option 0.] |
The numbering in your comment is off (markdown doesn't work as you expect). [Edit by Daira: fixed] I want option \ 2 (hashing the public key before it goes into hSig calculation) and I think it would be weird to truncate randomSeed to 31 bytes. |
There's another problem with using BLAKE2b: hSig is 256 bits (and the PRF definition depends on that), so we would have to truncate the output of BLAKE2b from 512 to 256 bits. The collision resistance of BLAKE2b truncated to 256 bits doesn't follow from that of BLAKE2b, and \ 3. Switch back to [full] SHA-256, and don't worry about the fact that there are multiple input blocks. (It shouldn't matter.) [Edit: the use of BLAKE2b for the KDF doesn't have this problem –even though we are truncating its output to 256 bits there also– because in that case we're using it as a randomness extractor, not for collision resistance.] |
The collision resistance of truncation is well-studied, I think. Let me see if I can get references. |
BLAKE2b truncated to 256-bit output is certainly as safe as BLAKE2s against collisions, and I think it is probably safer than option \3. Merkle-Damgard-chaining of SHA256, because there is a long history of nasty surprises around Merkle-Damgard chaining, and although I believe we're avoiding all those nasty surprises with option \3., I still don't like messing around that close to the corpses of our forerunners. \4. Use BLAKE2s. The implementation from RFC 7693 (https://tools.ietf.org/html/rfc7693) is 175 lines of C code: https://github.com/mjosaarinen/blake2_mjosref/blob/master/blake2s.c Just copy that file into Zcash source base so it isn't an extra dependency. |
@zookozcash: For SHA-256, we're using the full hash on a fixed-length input, only for collision resistance, and with the first block domain-separated from any of the other uses of the compression function in our protocol. I believe that's an unproblematic usage regardless of any deficiencies of Merkle-Dåmgard. (If it isn't, a lot of protocols are in trouble!) I don't think using BLAKE2s would solve anything here; remember that we are assuming collision-resistance of SHA-256 with MD-chaining already in order for the commitment to be secure. (In the case of the KDF, there was a substantial justification for switching to BLAKE2b in order to simplify the analysis, but that wouldn't be the case for using BLAKE2s for hSig.) |
BTW, I agree that heuristically there's no reason to believe that BLAKE2b truncated to 256 bits is any less secure than BLAKE2s. It would nevertheless be an extra security assumption that is not needed if we use SHA-256. The benefit of not importing another implementation of BLAKE2s is reducing the amount of crypto code where bugs may lurk, which is a very important consideration IMHO. |
The analysis is simpler by relying on "BLAKE2b truncated to n bits of output has n/2-bit collision-resistance", than by relying on "we didn't make a mistake in the input sizes, windowing, length-extensions, or domain-separation in our use of SHA-256". That BLAKE/BLAKE2 has the right properties when its output is truncated has been thoroughly studied by a lot symmetric crypto pros, so it very much simplifies the analysis we have to do if we can rely on that. It was carefully analyzed as part of the SHA-3 process, for starters, as well as subsequently. (I'm hoping Samuel Neves will comment here and give the best reference on specifically analysis of partial collision attacks.) That we're using SHA-256 and/or the SHA-256-compression-function safely is not analyzed by anyone other than us, so it is a lot more dangerous. |
@daira is correct that there is no formal reduction from collision resistance on the full output to collision resistance on parts of it. In particular, near collisions can make the truncated version of a hash function weak, while the full output remains collision-resistant. That being said, the standard for hash function security today is stronger than the usual collision and (second) preimage resistance. You want a hash to be indifferentiable from a random oracle, and an efficient near collision algorithm on truncated BLAKE2b certainly qualifies as a distinguisher---enough to break the security claims of the function. As far as I'm aware, the best near collision results on BLAKE reach 4 rounds at practical complexity, and 5 rounds at an impractical one. For BLAKE2b it has been shown that you can find near-collisions on 208 bits in 2^61 time, and full collisions for BLAKE2s, in the chosen-IV setting (the IV in BLAKE2 plays a larger part in security compared to the original BLAKE). In the regular real-world setting, BLAKE2 seems actually stronger than BLAKE with regards to near collisions,which reach only up to 3 rounds. I guess the overall point here is that the near-collision resistance of BLAKE(2) does not appear to be very far off (in number of attacked rounds) from its collision resistance. |
@zookozcash wrote:
I guess I just disagree on this. "BLAKE2b truncated to n bits of output" is effectively a separate hash from BLAKE2b. I want to minimize the number of hashes we use. Domain separation is quite a straightforward thing to reason about: the domains of the uses of a given hash function in a given protocol are either separated or they aren't. (In Zcash they aren't, because the uses of the compression function in the commitment tree are not separated from other uses. That complicates the analysis, but it does so independently of this decision — using SHA-256 here would not make that complication any worse. [Edit: https://github.com/zcash/zcash/issues/792#issuecomment-204619370 would be a way of fixing this, which we may or may not use.]) Length extension is not an issue here because the input is fixed-length. I appreciate the effort put into convincing me of the security of BLAKE2b truncated to 256 bits. However, when I made the decision between SHA-256 and BLAKE2b for the hSig hash, it only just fell in favour of BLAKE2b, and the fact that the hSig hash input happened to fit in one block was a factor. Now I see that it doesn't actually fit in one block (without additional complexity as in option \ 2), and also that we would need to make an extra assumption about collision resistance of the truncated hash. Those two things tip the balance back in favour of SHA-256 being the best option to minimize protocol complexity and assumptions (because we already use it, with similar assumptions, for the note commitment). |
For the record, I disagree. I think this is an example of the dangers of "provable security" that people like Koblitz, Menezes, and DJB have warned about. Our options here are between: a) a more complicated protocol which rests on the collision-resistance of SHA-256-compression function, or b) a simpler protocol which rests on the collision-resistance of BLAKE2b-truncated-to-256-bits. I assert, and I think Daira would agree, that the danger of a bug in the "more complicated protocol" part (or in some unforseen interaction between that part and something else) is orders of magnitude greater than the danger of a failure of the "collision-resistance of BLAKE2b-truncated-to-256-bits" part. In fact, the collision-resistance of BLAKE2b-truncated-to-256-bits is better than the collision-resistance of SHA-256-compression-function, according to the best modern cryptanalysis [*]. Therefore option "a" above comes with a substantially increased danger (the more complicated protocol) plus a mildly increased danger (the worse primitive), compared to option "b", but in return for that trade-off, it offers the possibility that its security proof (a.k.a. "security reduction") can be grounded in a property which is more familiar as the grounding for these sorts of reductions. If I understand the criticisms of the "provable security" approach that have been leveled by Koblitz, Menezes, and DJB (which is doubtful), this is the sort of decision that they would warn against. [edited to add] I believe we can make this secure either way, and I'm content to let Daira make this decision, provided that we have intensive scrutiny of the "more complicated protocol" part, if she chooses option "a". Also I'm very curious if this comment causes her to change her mind again. [*] There is no reason to think that BLAKE2b-truncated-to-256-bits is less safe than SHA-256-compression function against collisions, and indeed there is substantial reason to think that it is more safe, because the best known attacks on SHA-256 leave less of SHA-256 untouched than the best known attacks on BLAKE2b. (This is called "security margin".) And, as Samuel mentioned in #36 (comment), those "best known attacks" are not only collision attacks, in which case they would not directly apply to the question of the relative strengths of these two options "a" and "b" above. Instead they are distinguisher attacks (of various kinds) that do apply directly to this question. Here is an example: "Second-Order Differential Collisions for Reduced SHA-256" Note especially the discussion about distinguishers in the Introduction, which echoes sneves's comment above, and this quote: "the attacks give a clear indication that if we compare the security of SHA-256 to the security of the third round SHA-3 candidates [BLAKE, Grøstl, JH, Keccak, and Skein], in the this setting, then SHA-256 has one of the lowest security margins." This paper is comparing SHA-256 against BLAKE here, and saying that BLAKE has a greater security margin than SHA-256 when using "best known distinguisher attack" as the metric. As sneves pointed out above, BLAKE2 appears to be even stronger than BLAKE against near-collisions. I have no idea if this is the current best attack on SHA-256. Perhaps sneves knows. This is just one that I happened upon with a little searching. |
I don't know much about BLAKE2 but the front page of the website says "BLAKE2b (or just BLAKE2) is optimized for 64-bit platforms—including NEON-enabled ARMs—and produces digests of any size between 1 and 64 bytes." Can we not just choose the output length to be 32 bytes? |
I think the part of #36 (comment) that Daira should reconsider is "'BLAKE2b truncated to n bits of output' is effectively a separate hash from BLAKE2b.". As #36 (comment) and #36 (comment) show, that is not true in the modern tradition of hash function analysis. |
(But I guess it is true in the current tradition of provable security analysis…) |
References for the things I alluded to in #36 (comment): Another Look at “Provable Security” Error-prone cryptographic designs (slides) (See the diagram captioned “the big picture” on page 13.) The fundamental goal of “provable security” (slides) response to "The fundamental goal of “provable security”" (blog post) (Historical note: the "Crypto for 2020" conference in Tenerife in January 2013 was organized by DJB and Tanja Lange, who invited Matt, Daira, and me. It was there that Matt, Daira, and I met in real life for the first time, which eventually led to the formation of the Zcash project. It is a happy memory for me.) The impact of security proofs: two troublesome case studies (slides) |
re: #36 (comment) Good point, Taylor. BLAKE2 offers an API to specify the output length in bytes, and producing different lengths isn't done simply by truncation. I should have thought of that earlier. I'm curious if Daira feels the same way about "BLAKE2b-set-to-produce-256-bit-outputs" as she does about "BLAKE2b-truncated-to-256-bit-outputs". |
One thing that bothers me about using the SHA-256 compression function instead of full-SHA-256 is that allowing attacks like "(semi-) free-start" (meaning the attacker gets to influence or control the IV) further reduces SHA-256's known security margin. Again, I'm not sure what the state of the art of this sort of thing is, but here's an example: http://eprint.iacr.org/2015/350 The point is that relying on SHA-256-compression-function complicates security analysis — for me — by making me worry about whether the data we're feeding into the "IV" slot is attacker-controlled, or if there are other weaknesses in our use of the SHA-256-compression-function that would have been overlooked by cryptographers who are more focused on full-SHA-256. So I'm a bit uncomfortable with using SHA-256-compression-function, but I think it can be done safely — just that it requires, in my opinion, more analysis to ensure that it is safe, compared to using full-SHA-256 or BLAKE2. |
There's a lot to respond to here (perhaps disproportionate to the importance of the decision, since I'm almost certain there would be zero difference in the security of the resulting protocol). I will do so, but for now, note that the implementation of BLAKE2b in libsodium unfortunately doesn't support setting the digest length — that API was removed: https://github.com/jedisct1/libsodium/blob/master/src/libsodium/crypto_generichash/blake2/ref/blake2b-ref.c#L88 . I am wary of any decision on the spec that would add implementation complexity by trying to work around that or adding another crypto dependency. This decision is really about assessing the overall impact of different kinds of complexity, which is why I think we're arguing about it at such length. |
By the way, there was never any proposal to use the SHA-256 compression function alone for this usage. What is required is a collision-resistant hash on an input that is at least 1024 bits, so SHA256Compress doesn't fit. |
I thought there was a randomness requirement for hSig but according to this comment #25 (comment) there actually is not. |
Yes, I originally thought there was a pseudorandomness requirement, which was partly why I was concerned about the use of an iterated hash to compute hSig. Now I'm pretty certain that there is no such requirement. |
In reply to #36 (comment): Wait, what is the proposal "\3" in #36 (comment)? Isn't it to use two invocations of the SHA-256-compression-function? Where is proposal \3 written? Let's see, it would be section §5.1 of https://github.com/zcash/zips/blob/674c5614f2607eca3129dcaee6a9bbf5bd5c4475/protocol/protocol.pdf, right? That's the most recent revision of the protocol spec before you switched to BLAKE2b. Ah, this version has full-SHA-256 instead of two invocations of SHA-256-compression-function. Good. |
@zookozcash No, option \ 3 was to use full SHA-256. It turns out I was mistaken in saying that the libsodium implementation of BLAKE2b doesn't support setting the output length. It simplified the API by removing the call to set the output length explicitly, but it is still a parameter to |
I'd worked it out independently, but thanks for documenting this explicitly, @jedisct1 :-) |
I have decided to use BLAKE2b with the output length set to 256 bits (for both the hSig hash and the KDF, with different personalisation). I've also decided to reject option \ 1. That leaves a remaining choice between options \ 0 (that is, stay with the two-block input to BLAKE2b) and \ 2. I need to do a little more checking of how hSig is used (and might be used in future) in order to choose between those. Opinions welcome. |
Fixed on master (option \ 2 was taken). |
joinSplitPubKey
is 33 bytes, so the input to BLAKE2b is 32+32+32+33 = 129 bytes. A BLAKE2b input block is 128 bytes. I believe it's secure, because we only need collision resistance — but it's untidy: if I were wrong about the hSig only needing collision resistance, this might unnecessarily complicate the security proof. So I intend to reducerandomSeed
to 31 bytes, if no-one can think of an objection.The text was updated successfully, but these errors were encountered: