Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

Avoiding duplicate signatures #318

Closed
mdyring opened this issue Jul 27, 2019 · 12 comments
Closed

Avoiding duplicate signatures #318

mdyring opened this issue Jul 27, 2019 · 12 comments
Labels
security Security-critical issues

Comments

@mdyring
Copy link
Contributor

mdyring commented Jul 27, 2019

Related to #314, we're testing multiple (5) validators connecting to a single KMS process.

This turns up in the validator logs, so it seems the multiple PreVote/PreCommits are broadcast as expected:

gaiad[1486]: E[2019-07-27|09:20:43.397] Error attempting to add vote                 module=consensus err="Existing vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A A5AC3E828136 @ 2019-07-27T09:20:43.057849935Z}; New vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A 3F7B5DCF5942 @ 2019-07-27T09:20:43.061941252Z}: Non-deterministic signature"
gaiad[1486]: E[2019-07-27|09:20:43.405] Error attempting to add vote                 module=consensus err="Existing vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A A5AC3E828136 @ 2019-07-27T09:20:43.057849935Z}; New vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A 3BFFA669DF1C @ 2019-07-27T09:20:43.09168945Z}: Non-deterministic signature"
gaiad[1486]: E[2019-07-27|09:20:43.406] Error attempting to add vote                 module=consensus err="Existing vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A A5AC3E828136 @ 2019-07-27T09:20:43.057849935Z}; New vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A 26B42A52E8DA @ 2019-07-27T09:20:43.071819843Z}: Non-deterministic signature"
gaiad[1486]: E[2019-07-27|09:20:43.423] Error attempting to add vote                 module=consensus err="Existing vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A A5AC3E828136 @ 2019-07-27T09:20:43.057849935Z}; New vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A 9580C1CF6F62 @ 2019-07-27T09:20:43.13180719Z}: Non-deterministic signature"
gaiad[1486]: E[2019-07-27|09:20:43.511] Error attempting to add vote                 module=consensus err="Existing vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A A5AC3E828136 @ 2019-07-27T09:20:43.057849935Z}; New vote: Vote{8:32363E5BAA6B 67906/00/2(Precommit) 2031A4FCA22A 26B42A52E8DA @ 2019-07-27T09:20:43.071819843Z}: Non-deterministic signature"

According to this log output they are not duplicate when diving into the tendermint source:
https://github.com/tendermint/tendermint/blob/b73cfe878682e0b73d9a21ec1d8dc456ddd90215/types/vote_set.go#L181

Log from the KMS side for above block:

tmkms[19431]: 09:20:38 [info] [gaia-13004@35.156.81.246:26659] signed PreVote:2031A4FC h/r/s:67906/0/6 (0 ms)
tmkms[19431]: 09:20:38 [info] [gaia-13004@35.158.126.240:26659] signed PreVote:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@18.196.63.108:26659] signed PreVote:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@18.196.17.223:26659] signed PreVote:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@54.93.169.244:26659] signed PreVote:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@54.93.169.244:26659] signed PreCommit:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@18.196.63.108:26659] signed PreCommit:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]
tmkms[19431]: 09:20:38 [info] [gaia-13004@18.196.17.223:26659] signed PreCommit:2031A4FC h/r/s:67906/0/6 (0 ms) [dup]

I am not sure if this is as expected - perhaps it would be better to not provide a signature for duplicate block ids?

@mdyring
Copy link
Contributor Author

mdyring commented Jul 27, 2019

Just for additional info, this is how things look when proposing (5 vals, one if which makes the race):

tmkms[19431]: 12:00:24 [warn] [gaia-13004:35.156.81.246:26659] attempt to double sign at height/round/step: 69564/0/3
tmkms[19431]: 12:00:24 [info] [gaia-13004@18.196.17.223:26659] signed Proposal:B38091FB h/r/s:69564/0/3 (0 ms)
tmkms[19431]: 12:00:24 [warn] [gaia-13004:35.158.126.240:26659] attempt to double sign at height/round/step: 69564/0/3
tmkms[19431]: 12:00:24 [warn] [gaia-13004:18.196.63.108:26659] attempt to double sign at height/round/step: 69564/0/3
tmkms[19431]: 12:00:24 [info] [gaia-13004@18.196.17.223:26659] signed PreVote:B38091FB h/r/s:69564/0/6 (0 ms)
tmkms[19431]: 12:00:24 [info] [gaia-13004@35.156.81.246:26659] signed PreVote:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:24 [error] [gaia-13004@tcp://54.93.169.244:26659] attempted double sign: step regression: round regression at height:69564 round:0 last step:6 new step:3
tmkms[19431]: 12:00:24 [info] [gaia-13004@18.196.63.108:26659] signed PreVote:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:24 [info] [gaia-13004@35.158.126.240:26659] signed PreVote:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:24 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:24 [info] [gaia-13004@18.196.17.223:26659] signed PreCommit:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:24 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:24 [info] [gaia-13004@18.196.63.108:26659] signed PreCommit:B38091FB h/r/s:69564/0/6 (0 ms) [dup]
tmkms[19431]: 12:00:25 [info] [gaia-13004@54.93.169.244:26659] signed PreCommit:B38091FB h/r/s:69564/0/6 (0 ms) [dup]

@tarcieri
Copy link
Contributor

Those are definitely duplicates (the block IDs and h/r/s confirm it) and the "Non-deterministic signature" messages occur somewhat routinely during Tendermint consensus.

If you aren't getting permanently jailed you haven't experienced a double signing event.

@tarcieri tarcieri added the security Security-critical issues label Jul 27, 2019
@tarcieri
Copy link
Contributor

Note: I will have someone from Tendermint take a look at this to double check before closing this issue

@tarcieri
Copy link
Contributor

Also note that if I were simply to change the logic to disallow signing blocks with the same h/r/s and block ID, it wouldn't sign precommits after signing a prevote, which would break the KMS entirely.

@zmanian
Copy link
Contributor

zmanian commented Jul 28, 2019

So Tendermint signatures are currently non deterministic because the message includes a timestamp.

This error message will be seen when Tendermint sees different signatures for the same block id at the same height.. In a setup where multiple full nodes are talking to the same signing oracle, this expected behavior and should be the most fault tolerant behavior.

Thanks for testing this.

@mdyring
Copy link
Contributor Author

mdyring commented Jul 28, 2019

Thanks for clarifying @zmanian.

We're planning to run with this setup on cosmoshub-3, and I think others might want to do the same down the line, but I think the log output of Tendermint with errors severity will be a distraction.

Ideally I think this setup should not log consensus errors, so thinking about two options:

  1. KMS side: store what action was signed (prevote/precommit/proposal) and only provide a single signature per <action, height,round,step>.
  2. Simply request tendermint team to log these different signatures with info or debug severity instead of as an error.

Not sure if 1) is even possible, can KMS return an error/empty signature to Tendermint yet?

@tarcieri
Copy link
Contributor

tarcieri commented Jul 28, 2019

I think 1 is bad from an availability perspective, and allowing it has no disadvantages other than surfacing to the user what is happening in a non-terrifying way.

The real fix for 2 is to remove the timestamp from signed blocks in which case the signatures will be fully deterministic. I think this is going to happen soon?

can KMS return an error/empty signature to Tendermint yet?

It already returns errors in event a validator requests signing a different block ID at the same height/round/step.

@mdyring
Copy link
Contributor Author

mdyring commented Jul 28, 2019

Thanks @tarcieri, see #323 related to last point.

I'll nudge the Tendermint devs to decrease severity level for this.

@mdyring
Copy link
Contributor Author

mdyring commented Jul 29, 2019

I think 1 is bad from an availability perspective, and allowing it has no disadvantages other than surfacing to the user what is happening in a non-terrifying way.

On second thought:

I agree availability might be slightly better when signing everything as there is a better chance of getting the signature propagated throughout the network.

However I think signing "aggressively" with the YubiHSM and Ledger is undesirable due to the limited capacity of these devices. On a YubiHSM, signing for a single chain with two validators will be getting close to capacity.

@tarcieri
Copy link
Contributor

Regarding doing anything about it prior to the release: even if we were to do something, doing it now would be a big invasive change right before a release, where all of the other things I've been doing have been tweaking loglines. I don't think it's worth delaying the release over, and I think it's a risky thing to change at the last minute.

When I did tweak the loglines though, I made all double signing attempts errors, because I do consider them errors.

I think having several validator instances haphazardly hammering the KMS in a completely uncoordinated manner to produce duplicate signatures is not a good steady state of affairs and also makes the KMS the one vital lynchpin preventing double signing.

It would be better to use some external coordination service to elect one validator as active, and handle failover when that validator goes inactive. This is the approach commonly used by e.g. Certificate Transparency log signers. This also provides a belt-and-suspenders approach to preventing double signing.

Until there's good solutions for that, however, I'd agree this is the best we can do. That said we only plan on using this configuration for failover and don't plan on running it in perpetuity.

@mdyring
Copy link
Contributor Author

mdyring commented Jul 29, 2019

Agree on wrapping the 0.6.0 release without major changes and thanks for your patience with all these last minute issues.

Since we are already using YubiHSM and thus relying on the tmkms to prevent double signing in software, I am not (very) worried about adding multiple validators. It seems the main risk would be a race condition bypassing the double signing prevention.

Once we are confident this is working well by stress-testing with 5 validators, we'll probably run a setup with two validators and a single tmkms instance to allow for regular maintenance on the validators without missing blocks. Would love to see a specialised solution for this, but I think tmkms can serve this specific purpose well.

@zmanian
Copy link
Contributor

zmanian commented Jul 30, 2019

Yeah this is something we can definitely do better on.

But it requires some careful thinking on how.

Everything gets way better once we remove timestamps from votes cause then we can just have a warm cache for signatures

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
security Security-critical issues
Projects
None yet
Development

No branches or pull requests

3 participants