Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation of hSig requires 1 byte more than a BLAKE2b input block #36

Closed
daira opened this issue Mar 31, 2016 · 27 comments
Closed

Computation of hSig requires 1 byte more than a BLAKE2b input block #36

daira opened this issue Mar 31, 2016 · 27 comments
Assignees

Comments

@daira
Copy link
Collaborator

daira commented Mar 31, 2016

joinSplitPubKey is 33 bytes, so the input to BLAKE2b is 32+32+32+33 = 129 bytes. A BLAKE2b input block is 128 bytes. I believe it's secure, because we only need collision resistance — but it's untidy: if I were wrong about the hSig only needing collision resistance, this might unnecessarily complicate the security proof. So I intend to reduce randomSeed to 31 bytes, if no-one can think of an objection.

@daira daira self-assigned this Mar 31, 2016
@ebfull
Copy link
Contributor

ebfull commented Mar 31, 2016

Why not just hash joinSplitPubKey before hSig? We may want to hash something else someday anyway, if we change the format and behavior of the transaction.

@daira
Copy link
Collaborator Author

daira commented Mar 31, 2016

That's true, that we might want to hash something else. I was thinking we would change hSigtag in that case, but hashing joinSplitPubKey would work just as well. So the options are:

\ 0. Leave it as-is.
\ 1. Reduce randomSeed to 31 bytes.
\ 2. Let joinSplitPubKeyHash = BLAKE2b("ZcashECDSAPubKey", joinSplitPubKey) and use that instead of joinSplitPubKey to compute hSig.

I'm leaning toward 2 now; anyone else have a preference?

[Edit: added option 0.]

@ebfull
Copy link
Contributor

ebfull commented Mar 31, 2016

The numbering in your comment is off (markdown doesn't work as you expect). [Edit by Daira: fixed] I want option \ 2 (hashing the public key before it goes into hSig calculation) and I think it would be weird to truncate randomSeed to 31 bytes.

@daira
Copy link
Collaborator Author

daira commented Mar 31, 2016

There's another problem with using BLAKE2b: hSig is 256 bits (and the PRF definition depends on that), so we would have to truncate the output of BLAKE2b from 512 to 256 bits. The collision resistance of BLAKE2b truncated to 256 bits doesn't follow from that of BLAKE2b, and is probably not may not be well-studied. BLAKE2s is not in libsodium. So, I suggest:

\ 3. Switch back to [full] SHA-256, and don't worry about the fact that there are multiple input blocks. (It shouldn't matter.)

[Edit: the use of BLAKE2b for the KDF doesn't have this problem –even though we are truncating its output to 256 bits there also– because in that case we're using it as a randomness extractor, not for collision resistance.]

@zookozcash
Copy link

The collision resistance of truncation is well-studied, I think. Let me see if I can get references.

@zookozcash
Copy link

BLAKE2b truncated to 256-bit output is certainly as safe as BLAKE2s against collisions, and I think it is probably safer than option \3. Merkle-Damgard-chaining of SHA256, because there is a long history of nasty surprises around Merkle-Damgard chaining, and although I believe we're avoiding all those nasty surprises with option \3., I still don't like messing around that close to the corpses of our forerunners.

\4. Use BLAKE2s. The implementation from RFC 7693 (https://tools.ietf.org/html/rfc7693) is 175 lines of C code: https://github.com/mjosaarinen/blake2_mjosref/blob/master/blake2s.c Just copy that file into Zcash source base so it isn't an extra dependency.

@daira
Copy link
Collaborator Author

daira commented Mar 31, 2016

@zookozcash: For SHA-256, we're using the full hash on a fixed-length input, only for collision resistance, and with the first block domain-separated from any of the other uses of the compression function in our protocol. I believe that's an unproblematic usage regardless of any deficiencies of Merkle-Dåmgard. (If it isn't, a lot of protocols are in trouble!)

I don't think using BLAKE2s would solve anything here; remember that we are assuming collision-resistance of SHA-256 with MD-chaining already in order for the commitment to be secure. (In the case of the KDF, there was a substantial justification for switching to BLAKE2b in order to simplify the analysis, but that wouldn't be the case for using BLAKE2s for hSig.)

@daira
Copy link
Collaborator Author

daira commented Mar 31, 2016

BTW, I agree that heuristically there's no reason to believe that BLAKE2b truncated to 256 bits is any less secure than BLAKE2s. It would nevertheless be an extra security assumption that is not needed if we use SHA-256. The benefit of not importing another implementation of BLAKE2s is reducing the amount of crypto code where bugs may lurk, which is a very important consideration IMHO.

@zookozcash
Copy link

The analysis is simpler by relying on "BLAKE2b truncated to n bits of output has n/2-bit collision-resistance", than by relying on "we didn't make a mistake in the input sizes, windowing, length-extensions, or domain-separation in our use of SHA-256". That BLAKE/BLAKE2 has the right properties when its output is truncated has been thoroughly studied by a lot symmetric crypto pros, so it very much simplifies the analysis we have to do if we can rely on that.

It was carefully analyzed as part of the SHA-3 process, for starters, as well as subsequently. (I'm hoping Samuel Neves will comment here and give the best reference on specifically analysis of partial collision attacks.)

That we're using SHA-256 and/or the SHA-256-compression-function safely is not analyzed by anyone other than us, so it is a lot more dangerous.

@sneves
Copy link

sneves commented Mar 31, 2016

@daira is correct that there is no formal reduction from collision resistance on the full output to collision resistance on parts of it. In particular, near collisions can make the truncated version of a hash function weak, while the full output remains collision-resistant.

That being said, the standard for hash function security today is stronger than the usual collision and (second) preimage resistance. You want a hash to be indifferentiable from a random oracle, and an efficient near collision algorithm on truncated BLAKE2b certainly qualifies as a distinguisher---enough to break the security claims of the function.

As far as I'm aware, the best near collision results on BLAKE reach 4 rounds at practical complexity, and 5 rounds at an impractical one. For BLAKE2b it has been shown that you can find near-collisions on 208 bits in 2^61 time, and full collisions for BLAKE2s, in the chosen-IV setting (the IV in BLAKE2 plays a larger part in security compared to the original BLAKE). In the regular real-world setting, BLAKE2 seems actually stronger than BLAKE with regards to near collisions,which reach only up to 3 rounds. I guess the overall point here is that the near-collision resistance of BLAKE(2) does not appear to be very far off (in number of attacked rounds) from its collision resistance.

@daira
Copy link
Collaborator Author

daira commented Apr 1, 2016

@zookozcash wrote:

The analysis is simpler by relying on "BLAKE2b truncated to n bits of output has n/2-bit collision-resistance", than by relying on "we didn't make a mistake in the input sizes, windowing, length-extensions, or domain-separation in our use of SHA-256".

I guess I just disagree on this. "BLAKE2b truncated to n bits of output" is effectively a separate hash from BLAKE2b. I want to minimize the number of hashes we use. Domain separation is quite a straightforward thing to reason about: the domains of the uses of a given hash function in a given protocol are either separated or they aren't. (In Zcash they aren't, because the uses of the compression function in the commitment tree are not separated from other uses. That complicates the analysis, but it does so independently of this decision — using SHA-256 here would not make that complication any worse. [Edit: https://github.com/zcash/zcash/issues/792#issuecomment-204619370 would be a way of fixing this, which we may or may not use.]) Length extension is not an issue here because the input is fixed-length.

I appreciate the effort put into convincing me of the security of BLAKE2b truncated to 256 bits. However, when I made the decision between SHA-256 and BLAKE2b for the hSig hash, it only just fell in favour of BLAKE2b, and the fact that the hSig hash input happened to fit in one block was a factor. Now I see that it doesn't actually fit in one block (without additional complexity as in option \ 2), and also that we would need to make an extra assumption about collision resistance of the truncated hash. Those two things tip the balance back in favour of SHA-256 being the best option to minimize protocol complexity and assumptions (because we already use it, with similar assumptions, for the note commitment).

@zookozcash
Copy link

For the record, I disagree.

I think this is an example of the dangers of "provable security" that people like Koblitz, Menezes, and DJB have warned about.

Our options here are between:

a) a more complicated protocol which rests on the collision-resistance of SHA-256-compression function, or

b) a simpler protocol which rests on the collision-resistance of BLAKE2b-truncated-to-256-bits.

I assert, and I think Daira would agree, that the danger of a bug in the "more complicated protocol" part (or in some unforseen interaction between that part and something else) is orders of magnitude greater than the danger of a failure of the "collision-resistance of BLAKE2b-truncated-to-256-bits" part.

In fact, the collision-resistance of BLAKE2b-truncated-to-256-bits is better than the collision-resistance of SHA-256-compression-function, according to the best modern cryptanalysis [*].

Therefore option "a" above comes with a substantially increased danger (the more complicated protocol) plus a mildly increased danger (the worse primitive), compared to option "b", but in return for that trade-off, it offers the possibility that its security proof (a.k.a. "security reduction") can be grounded in a property which is more familiar as the grounding for these sorts of reductions.

If I understand the criticisms of the "provable security" approach that have been leveled by Koblitz, Menezes, and DJB (which is doubtful), this is the sort of decision that they would warn against.

[edited to add]
That said, there is real value in the positive trade-off that option "a" is offering us: that the security proof rests on a more widely-appreciated property, even if that property is slightly less certain of holding up.

I believe we can make this secure either way, and I'm content to let Daira make this decision, provided that we have intensive scrutiny of the "more complicated protocol" part, if she chooses option "a". Also I'm very curious if this comment causes her to change her mind again.

[*] There is no reason to think that BLAKE2b-truncated-to-256-bits is less safe than SHA-256-compression function against collisions, and indeed there is substantial reason to think that it is more safe, because the best known attacks on SHA-256 leave less of SHA-256 untouched than the best known attacks on BLAKE2b. (This is called "security margin".)

And, as Samuel mentioned in #36 (comment), those "best known attacks" are not only collision attacks, in which case they would not directly apply to the question of the relative strengths of these two options "a" and "b" above. Instead they are distinguisher attacks (of various kinds) that do apply directly to this question.

Here is an example:

"Second-Order Differential Collisions for Reduced SHA-256"
Biryukov, Lamberger, Mendel, Nicolić
2011
http://www.iacr.org/archive/asiacrypt2011/70730269/70730269.pdf

Note especially the discussion about distinguishers in the Introduction, which echoes sneves's comment above, and this quote: "the attacks give a clear indication that if we compare the security of SHA-256 to the security of the third round SHA-3 candidates [BLAKE, Grøstl, JH, Keccak, and Skein], in the this setting, then SHA-256 has one of the lowest security margins."

This paper is comparing SHA-256 against BLAKE here, and saying that BLAKE has a greater security margin than SHA-256 when using "best known distinguisher attack" as the metric. As sneves pointed out above, BLAKE2 appears to be even stronger than BLAKE against near-collisions.

I have no idea if this is the current best attack on SHA-256. Perhaps sneves knows. This is just one that I happened upon with a little searching.

@defuse
Copy link
Contributor

defuse commented Apr 1, 2016

I don't know much about BLAKE2 but the front page of the website says "BLAKE2b (or just BLAKE2) is optimized for 64-bit platforms—including NEON-enabled ARMs—and produces digests of any size between 1 and 64 bytes." Can we not just choose the output length to be 32 bytes?

@zookozcash
Copy link

I think the part of #36 (comment) that Daira should reconsider is "'BLAKE2b truncated to n bits of output' is effectively a separate hash from BLAKE2b.". As #36 (comment) and #36 (comment) show, that is not true in the modern tradition of hash function analysis.

@zookozcash
Copy link

(But I guess it is true in the current tradition of provable security analysis…)

@zookozcash
Copy link

References for the things I alluded to in #36 (comment):

Another Look at “Provable Security”
Koblitz, Menezes
2004
https://eprint.iacr.org/2004/152

Error-prone cryptographic designs (slides)
Daniel J. Bernstein
2015
https://cr.yp.to/talks/2015.01.07/slides-djb-20150107-a4.pdf

(See the diagram captioned “the big picture” on page 13.)

The fundamental goal of “provable security” (slides)
Daniel J. Bernstein
2013
http://cr.yp.to/talks/2013.01.23/slides.pdf

response to "The fundamental goal of “provable security”" (blog post)
Matthew D. Green
2013
http://blog.cryptographyengineering.com/2013/01/in-defense-of-provable-security.html

(Historical note: the "Crypto for 2020" conference in Tenerife in January 2013 was organized by DJB and Tanja Lange, who invited Matt, Daira, and me. It was there that Matt, Daira, and I met in real life for the first time, which eventually led to the formation of the Zcash project. It is a happy memory for me.)

The impact of security proofs: two troublesome case studies (slides)
Daniel J. Bernstein
2014
https://cr.yp.to/talks/2014.01.10/slides-djb-20140110-a4.pdf

@zookozcash
Copy link

re: #36 (comment)

Good point, Taylor. BLAKE2 offers an API to specify the output length in bytes, and producing different lengths isn't done simply by truncation. I should have thought of that earlier. I'm curious if Daira feels the same way about "BLAKE2b-set-to-produce-256-bit-outputs" as she does about "BLAKE2b-truncated-to-256-bit-outputs".

@zookozcash
Copy link

One thing that bothers me about using the SHA-256 compression function instead of full-SHA-256 is that allowing attacks like "(semi-) free-start" (meaning the attacker gets to influence or control the IV) further reduces SHA-256's known security margin. Again, I'm not sure what the state of the art of this sort of thing is, but here's an example: http://eprint.iacr.org/2015/350

The point is that relying on SHA-256-compression-function complicates security analysis — for me — by making me worry about whether the data we're feeding into the "IV" slot is attacker-controlled, or if there are other weaknesses in our use of the SHA-256-compression-function that would have been overlooked by cryptographers who are more focused on full-SHA-256.

So I'm a bit uncomfortable with using SHA-256-compression-function, but I think it can be done safely — just that it requires, in my opinion, more analysis to ensure that it is safe, compared to using full-SHA-256 or BLAKE2.

@daira
Copy link
Collaborator Author

daira commented Apr 2, 2016

There's a lot to respond to here (perhaps disproportionate to the importance of the decision, since I'm almost certain there would be zero difference in the security of the resulting protocol). I will do so, but for now, note that the implementation of BLAKE2b in libsodium unfortunately doesn't support setting the digest length — that API was removed: https://github.com/jedisct1/libsodium/blob/master/src/libsodium/crypto_generichash/blake2/ref/blake2b-ref.c#L88 . I am wary of any decision on the spec that would add implementation complexity by trying to work around that or adding another crypto dependency. This decision is really about assessing the overall impact of different kinds of complexity, which is why I think we're arguing about it at such length.

@daira
Copy link
Collaborator Author

daira commented Apr 2, 2016

By the way, there was never any proposal to use the SHA-256 compression function alone for this usage. What is required is a collision-resistant hash on an input that is at least 1024 bits, so SHA256Compress doesn't fit.

@defuse
Copy link
Contributor

defuse commented Apr 2, 2016

I thought there was a randomness requirement for hSig but according to this comment #25 (comment) there actually is not.

@daira
Copy link
Collaborator Author

daira commented Apr 3, 2016

Yes, I originally thought there was a pseudorandomness requirement, which was partly why I was concerned about the use of an iterated hash to compute hSig. Now I'm pretty certain that there is no such requirement.

@zookozcash
Copy link

In reply to #36 (comment):

Wait, what is the proposal "\3" in #36 (comment)? Isn't it to use two invocations of the SHA-256-compression-function? Where is proposal \3 written? Let's see, it would be section §5.1 of https://github.com/zcash/zips/blob/674c5614f2607eca3129dcaee6a9bbf5bd5c4475/protocol/protocol.pdf, right? That's the most recent revision of the protocol spec before you switched to BLAKE2b.

Ah, this version has full-SHA-256 instead of two invocations of SHA-256-compression-function. Good.

@daira
Copy link
Collaborator Author

daira commented Apr 3, 2016

@zookozcash No, option \ 3 was to use full SHA-256.

It turns out I was mistaken in saying that the libsodium implementation of BLAKE2b doesn't support setting the output length. It simplified the API by removing the call to set the output length explicitly, but it is still a parameter to blake2_init here. This resolves the concern about having to add another dependency to use that mode. (To answer @zookozcash's earlier question, yes, it does make a difference to me, because that mode is asserted to be collision-resistant by the BLAKE2 spec, rather than as something we have to assert separately.)

@daira
Copy link
Collaborator Author

daira commented Apr 3, 2016

I'd worked it out independently, but thanks for documenting this explicitly, @jedisct1 :-)

@daira
Copy link
Collaborator Author

daira commented Apr 3, 2016

I have decided to use BLAKE2b with the output length set to 256 bits (for both the hSig hash and the KDF, with different personalisation). I've also decided to reject option \ 1. That leaves a remaining choice between options \ 0 (that is, stay with the two-block input to BLAKE2b) and \ 2. I need to do a little more checking of how hSig is used (and might be used in future) in order to choose between those. Opinions welcome.

@daira
Copy link
Collaborator Author

daira commented Apr 8, 2016

Fixed on master (option \ 2 was taken).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants