New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p/conn/secret_connection.go allows MITM #3010

Open
HarryR opened this Issue Dec 12, 2018 · 17 comments

Comments

Projects
None yet
6 participants
@HarryR
Copy link

HarryR commented Dec 12, 2018

The MakeSecretConnection function has a bug which allows a third party to MITM the connection and decrypt/modify data in a way which both sides think the long-term key of either side is the same.

This is due to a lack of validation that the ephemeral or long-term keys are not of low-order.

  1. In shareEphPub the ephemeral keys are transmitted to each other.
  2. Then in computeDHSecret a shared secret is derived from the two ephemeral keys using curve25519.ScalarMult
  3. Using that secret the send and receive keys, and a challenge, are derived using deriveSecretAndChallenge
  4. The challenge and long-term public key is then signed and transmitted to the other party using

The problem stems from computeDHSecret not validating the ephemeral public key, if the connection is intercepted and an ephemeral key consisting of all zeros is injected then the secret from computeDHSecret will be the same for both parties for every handshake with any key. That is - the two parties, because of the MITM injecting a null ephemeral key in place of the remotes key, will then both derive the same secret which is also known to the MITM actor.

The derived recvSecret, sendSecret and challenge is now also known by the MITM actor, and will be the same for every connection handshake that is intercepted. The authSigMsg by the long-term keys of either side will also be the same for every intercepted connection, and because the send/recv secrets are known the MITM actor can replay the auth signature message to the other party to impersonate any public key that it's previously observed.

So:

  • Full active intercept with decrypt/re-encrypt
  • Impersonation of any peer using signature re-play

This means SecretConnection isn't secure if connections are intercepted / redirected.

And any peers identity can be impersonated at the connection-level (as in, it'll show that peer in the remote peers list), unless the long-term public key of the peer is validated elsewhere (e.g. expecting a signature from the remote peers nodeInfo.ID() cannot be forged, only re-played during the handshake).

Affected functions:

Example code:

package main

import (
	"fmt"
	crand "crypto/rand"
	"golang.org/x/crypto/curve25519"
	"golang.org/x/crypto/nacl/box"
)


func genEphKeys() (ephPub, ephPriv *[32]byte) {
	var err error
	ephPub, ephPriv, err = box.GenerateKey(crand.Reader)
	if err != nil {
		panic("Could not generate ephemeral keypairs")
	}
	return
}

func computeDHSecret(remPubKey, locPrivKey *[32]byte) (shrKey *[32]byte) {
	shrKey = new([32]byte)
	curve25519.ScalarMult(shrKey, locPrivKey, remPubKey)
	return
}

func main() {
	//remPubKey, _ := genEphKeys()
	var remPubKey [32]byte
	_, locPrivKey := genEphKeys()

	shrKey := computeDHSecret(&remPubKey, locPrivKey)

	fmt.Println("remPubKey =", remPubKey)
	fmt.Println("locPrivKey =", locPrivKey)
	fmt.Println("shrKey =", shrKey)
}

Suggested fix:

Make a reader specifically for public keys, perform low-order point validation on the keys after reading.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

Handling of all-zero X25519 public keys is a somewhat contentious issue, and something of a red herring given the attack described here. I've generally been weakly in favor of detecting and rejecting them, but explicit handling of them is unnecessary in a correctly-designed protocol.

We can start by looking at what Noise does:

https://noiseprotocol.org/noise.html

Section 12.1:

Executes the Curve25519 DH function (aka "X25519" in [7]). Invalid public key values will produce an output of all zeros.

Alternatively, implementations are allowed to detect inputs that produce an all-zeros output and signal an error instead. This behavior is discouraged because it adds complexity and implementation variance, and does not improve security. This behavior is allowed because it might match the behavior of some software.

The Noise protocol spec effectively says that implementations MAY detect inputs that produce all-zero outputs, but SHOULD ignore them and output all zeroes in this case.

Backing up a bit and looking at the reported issue, we can see the real problem here:

The problem stems from computeDHSecret not validating the ephemeral public key, if the connection is intercepted and an ephemeral key consisting of all zeros is injected

In this case, an active attacker is modifying one of the handshake messages. The real issue here is handshake malleability: a well designed protocol MUST detect and reject this regardless of how the messages were modified. Exactly how the attacker is modifying the message is irrelevant: any modifications to any handshake parameters MUST be rejected. The injection of an all-zero public key is something of a red herring: if an attacker can successfully modify the ephemeral keys without being detected, they could just as easily substitute one for which they know the discrete log, rather than one which is all zeroes.

TLS and Noise both solve this problem with transcript hashing (Noise calls it a "handshake hash" - see section 2.2): both sides compute a digest of all messages sent and received during a handshake. If both sides do not compute the same transcript hash, this MUST abort the connection in some way. In TLS or Noise this manifests as a MAC verification failure.

My recommendation here would be to adopt one of the Noise handshake patterns outright, or barring that, adding a "handshake hash" similar to the one Noise uses to the existing handshake.

As for the all-zero public key issue, I could go either way on that. If the protocol were designed correctly, this is an attack which can only be performed by one of the legitimate connection endpoints (i.e. self-MitM, a threat I don't care about). There are good arguments on both sides for detecting it or ignoring it, although as I said earlier I'm weakly in favor of detecting it, but I feel that way because I think it might provide value detecting buggy implementations, not for security purposes.

@HarryR

This comment has been minimized.

Copy link
Author

HarryR commented Dec 13, 2018

I am suggesting a low-effort fix which doesn't break network-layer compatibility with existing clients.

if an attacker can successfully modify the ephemeral keys without being detected, they could just as easily substitute one for which they know the discrete log, rather than one which is all zeroes.

With SecretConnection you could use an arbitrary ephemeral key, but you can't re-play the challenge signature of either side (which you can do, using a low-order point as the interceptor/malicious-actor's ephemeral key, hence this ticket).

If you are expecting a specific peer and after the handshake the peers long-term ID is different then you can detect interception and disconnect the connection. But... I can't see where in the code verifies that the long-term key that it ended up connecting to was the one it expected (e.g. if you are connecting to a peer with a known long-term key, but get another one instead, you have no authenticity).

For necessary point validation checks, see: https://github.com/jedisct1/libsodium/blob/cfb0f94704841f943a5a11d9e335da409c55d58a/src/libsodium/crypto_core/ed25519/core_ed25519.c#L7

This bug is specifically to do with not validating the ephemeral keys with the current protocol. Validating the public keys fixes the bug.


Re: Noise and validating points being unnecessary, this pushes a hash of the entire conversation from both sides into the challenge signature, rather than just something derived from the DH kex between ephemeral keys. It binds the long-term keys to the contents of the handshake regardless of whether or not the ephemeral points are valid.

However, it doesn't protect against a malicious actor forcing you to derive a predictable session key, which would allow one party to intercept many connections after the handshake, without any communication with the malicious actor.

There are a handful of other bugs throughout Tendermint and related projects which stem from not validating public keys, I highly suggest you do it, just switching to Noise will not solve those problem, and IMO doesn't really fully fix this one either.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

I'd agree that point validation provides a band-aid for this specific attack, and as I said earlier is probably a good idea in general. However:

For necessary point validation checks, see:
https://github.com/jedisct1/libsodium/blob/cfb0f94704841f943a5a11d9e335da409c55d58a/src/libsodium/crypto_core/ed25519/core_ed25519.c#L7

Note that these checks are for the twisted Edwards form of Curve25519 (i.e. "edwards25519"), and most of them are irrelevant or orthogonal to this discussion (the has_small_order one in particular is actually relevant).

The context of this discussion is Diffie-Hellman over the Montgomery form of Curve25519, a.k.a. "X25519". The two curve forms have different wire formats: Ed25519 uses a compressed point, whereas X25519 uses a Montgomery-u coordinate (a.k.a. "x") alone, and in doing so strategically eliminates all other classes of invalid points besides the low order ones we're discussing here (see below).

The point validation you're suggesting can be found on the original Curve25519 ECDH web site:

https://cr.yp.to/ecdh.html

How do I validate Curve25519 public keys?
Don't. The Curve25519 function was carefully designed to allow all 32-byte strings as Diffie-Hellman public keys. Relevant lower-level facts: the number of points of this elliptic curve over the base field is 8 times the prime 2^252 + 27742317777372353535851937790883648493; the number of points of the twist is 4 times the prime 2^253 - 55484635554744707071703875581767296995. This is discussed in more detail in the curve25519 paper.
There are some unusual non-Diffie-Hellman elliptic-curve protocols that need to ensure ``contributory'' behavior. In those protocols, you should reject the 32-byte strings that, in little-endian form, represent 0, 1, 325606250916557431795983626356110631294008115727848805560023387167927233504
(which has order 8), 39382357235489614581723060781553021112529911719440698176882885853963445705823 (which also has order 8), 2^255 - 19 - 1, 2^255 - 19, 2^255 - 19 + 1, 2^255 - 19 + 325606250916557431795983626356110631294008115727848805560023387167927233504, 2^255 - 19 + 39382357235489614581723060781553021112529911719440698176882885853963445705823, 2(2^255 - 19) - 1, 2(2^255 - 19), and 2(2^255 - 19) + 1. But these exclusions are unnecessary for Diffie-Hellman.

The context being discussed here is Diffie-Hellman.

So is djb wrong when he says "Don't"? Again, that's debatable. The "May the Forth" paper suggests it's probably a good idea to defend against microarchitectural attacks, for example:

Rejecting Known Bad Points.
To protect against small subgroup attacks against Curve25519 and related curves that have
a small set of low-order elements, an implementation can simply
check if the received public key is in the set. Bernstein [12] provides
a full list of these points for Curve25519, but suggests that rejecting
these points is only necessary for protocols that wish to ensure
“contributory” behavior. Langley and Hamburg [53] have a similar
suggestion. We argue that rejecting these points would also give
better side-channel protection. While this protection may seem
unnecessary when used with constant-time code, as Kaufmann
et al. [50] demonstrate, constant-time code is fragile and may fail
to provide adequate protection.

All that said: the core problem here is handshake malleability, not low order points. Malleating these keys MUST result in a MAC verify fail for the protocol to be secure.

@HarryR

This comment has been minimized.

Copy link
Author

HarryR commented Dec 13, 2018

Bernstein [12] provides a full list of these points for Curve25519, but suggests that rejecting these points is only necessary for protocols that wish to ensure “contributory” behavior.

For contributory behaviour (e.g. Diffie-Hellman KEX) it is necessary to verify the point is not of low order. Checks I'd use are:

  • Multiply the point by the cofactor (8) and check the point isn't infinity to verify it's not a point of low-order.
  • Then multiply original point by the prime-ordered sub-group (L) to verify it is infinity (to check it's on the main group)
  • Verify curve equation holds for point
  • Verify the output after DH KEX isn't all zeros as a sanity measure, although technically unnecessary?

The sodium implementation has hard-coded low-order points, but needs to do a scalar multiply by L to verify it's on the prime-ordered group.

I'm unsure about the canonical test, it does some bit fiddling and the differences for ed25519 vs curve25519 aren't immediately obvious.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

For contributory behaviour (e.g. Diffie-Hellman KEX) it is necessary to verify the point is not of low order.

While there are good reasons for doing so which I have already covered, the Curve25519 web site explicitly calls out Diffie-Hellman as a use case for which the low order check is unnecessary.

https://cr.yp.to/ecdh.html

Here is the relevant text about contributory behavior:

There are some unusual non-Diffie-Hellman (edit: emphasis mine) elliptic-curve protocols that need to ensure "contributory'' behavior. In those protocols, you should reject [...] But these exclusions are unnecessary for Diffie-Hellman.

@HarryR

This comment has been minimized.

Copy link
Author

HarryR commented Dec 13, 2018

Righto. I will leave this for you to figure out.

However, I would argue that the statement is wrong, because it lead to the bug described above.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

The same bug does not exist in TLS or Noise, regardless of whether they check for low-order points, and that's because the root cause is a malleable, attacker-controlled ephemeral key.

Beyond that, I think "low order points" are a complete red herring here: this attack is specific to the point at Montgomery-u 0, which so happens to be in the set of low order points, and is possible because any number multiplied by 0 is 0.

To my knowledge you cannot perform a similar attack with any of the other blacklisted points.

@HarryR

This comment has been minimized.

Copy link
Author

HarryR commented Dec 13, 2018

Beyond that, I think "low order points" are a complete red herring here: this attack is specific to the point at Montgomery-u 0, which so happens to be in the set of low order points, and is possible because any number multiplied by 0 is 0.

The same attack applies to all points of low-order, for example

func main() {
        _, locPrivKey := genEphKeys()

        /* 325606250916557431795983626356110631294008115727848805560023387167927233504
           (order 8) */
        /*
        remPubKey := [32]byte{ 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae, 0x16, 0x56, 0xe3,
          0xfa, 0xf1, 0x9f, 0xc4, 0x6a, 0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32,
          0xb1, 0xfd, 0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8, 0x00 }
        */

        /* 39382357235489614581723060781553021112529911719440698176882885853963445705823
           (order 8) */
        /*
        remPubKey := [32]byte{ 0x5f, 0x9c, 0x95, 0xbc, 0xa3, 0x50, 0x8c, 0x24, 0xb1, 0xd0, 0xb1,
          0x55, 0x9c, 0x83, 0xef, 0x5b, 0x04, 0x44, 0x5c, 0xc4, 0x58, 0x1c,
          0x8e, 0x86, 0xd8, 0x22, 0x4e, 0xdd, 0xd0, 0x9f, 0x11, 0x57 }
        */

        /* p-1 (order 2) */
        remPubKey := [32]byte{ 0xec, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
          0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
          0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }

        shrKey := computeDHSecret(&remPubKey, locPrivKey)

        fmt.Println("remPubKey =", remPubKey)
        fmt.Println("locPrivKey =", locPrivKey)
        fmt.Println("shrKey =", shrKey)
}
@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

Touché. I'm not able to execute that code right now, but is shrKey zero?

@HarryR

This comment has been minimized.

Copy link
Author

HarryR commented Dec 13, 2018

Yup it is all zeros.

With other curves there should be n potential points as the result when one of the points is of order n, but I think x25519 may be weird in that sense and just return all zeros instead, is worth double-checking.

A sufficient check, instead of key validation, may be validating that the result of the scalarmult is non-zero.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

Yeah, so the Noise protocol spec discusses that case here:

https://noiseprotocol.org/noise.html

Alternatively, implementations are allowed to detect inputs that produce an all-zeros output and signal an error instead (Section 12.1)

I think it would probably be best to detect and return an error in this case, but note that Noise is also designed to be robust against it:

Executes the Curve25519 DH function (aka "X25519" in [7]). Invalid public key values will produce an output of all zeros.

@HarryR

This comment has been minimized.

Copy link
Author

HarryR commented Dec 13, 2018

So the all-zeros check can be used as a band-aid for this bug, which is sufficient to prevent all security problems described in the original ticket.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Dec 13, 2018

I think the path forward here is:

  • Implement a check for the blacklisted low order points, ala the X25519 has_small_order() function in libsodium
  • Circle back on how to generally solve handshake malleability within Secret Connection

@Liamsi Liamsi referenced this issue Dec 17, 2018

Merged

secret connection: check for low order points #3040

1 of 4 tasks complete

@ebuchman ebuchman added this to the v0.29.0 milestone Jan 13, 2019

@Liamsi Liamsi self-assigned this Jan 21, 2019

melekes added a commit that referenced this issue Jan 29, 2019

secret connection: check for low order points (#3040)
> Implement a check for the blacklisted low order points, ala the X25519 has_small_order() function in libsodium

(#3010 (comment))
resolves first half of #3010
@ebuchman

This comment has been minimized.

Copy link
Contributor

ebuchman commented Feb 7, 2019

So we're currently checking a black list, but to really catch everything we should either (or both?)

  • multiply by cofactor and check if result is infinity
  • check if the computed shared secret is all zeros

See #3040 (comment) and #3040 (comment)

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Feb 7, 2019

@ebuchman the blacklist stops the bleeding. To really fix the core issue, the handshake needs to ensure that both sides saw the same conversation.

Note that well-designed protocols (e.g. Noise) tolerate these points and can handle a shared secret of zero, because if an attacker tried to perform this same attack on Noise, it would result in a MAC verification failure.

Really implementing the blacklist is a best practice, and coincidentally stops this attack, but the core problem is the handshake is malleable.

@jaekwon

This comment has been minimized.

Copy link
Contributor

jaekwon commented Feb 16, 2019

but the core problem is the handshake is malleable.

I don't agree yet. As @HarryR mentioned, you can't spoof the challenge assuming that we check that the persistent pubkey is valid. After this step, there's no problem with malleability in the STS protocol which SecretConnection intends to implement. And, a MITM doesn't make sense in the context where we don't check pubkeys anyways.

I tried searching for code where we authenticate a peer against its NetAddress.ID and couldn't find it. I don't see a reason to switch to Noise, but a need to ensure that the node's ID is authenticated e.g. after dialing from the address book.

Given that we got this wrong once, I see the benefit of using Noise, but let's also fix SecretConnection STS as it provides deniability.

@tarcieri

This comment has been minimized.

Copy link

tarcieri commented Feb 16, 2019

@jaekwon malleability is something of a classical problem in authenticated key exchanges, and one it's taken TLS several attempts to get correct. Transcript hashing would address this particular problem without the need to blacklist low order points.

Additionally, it's the foundation on which safe protocol downgrade mechanisms are possible. See section 6: http://www.cs.ox.ac.uk/files/10583/downgrade-taxonomy-18.pdf

It's true the low order point blacklist mitigates this particular issue, and offhand aside from supporting downgrades to the protocol in its current form I can't name a specific protocol parameter an attacker can malleate which would break the protocol. But allowing a transcript discrepancy between peers in the key exchange for an encrypted connection is wiggle room for attackers and a classical source of vulnerabilities.

@Liamsi Liamsi referenced this issue Feb 17, 2019

Merged

Fix p2p id check #3321

2 of 4 tasks complete

melekes added a commit that referenced this issue Feb 18, 2019

p2p: check secret conn id matches dialed id (#3321)
ref: [#3010 (comment)](#3010 (comment))

> I tried searching for code where we authenticate a peer against its NetAddress.ID and couldn't find it. I don't see a reason to switch to Noise, but a need to ensure that the node's ID is authenticated e.g. after dialing from the address book.

* p2p: check secret conn id matches dialed id

* Fix all p2p tests & make code compile

* add simple test for dialing with wrong ID

* update changelog

* address review comments

* yet another place where to use IDAddressString and fix
testSetupMultiplexTransport

@Liamsi Liamsi referenced this issue Feb 22, 2019

Open

WIP: secret connection check all zeroes #3347

0 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment