-
Notifications
You must be signed in to change notification settings - Fork 42
Avoiding duplicate signatures #318
Comments
Just for additional info, this is how things look when proposing (5 vals, one if which makes the race):
|
Those are definitely duplicates (the block IDs and h/r/s confirm it) and the "Non-deterministic signature" messages occur somewhat routinely during Tendermint consensus. If you aren't getting permanently jailed you haven't experienced a double signing event. |
Note: I will have someone from Tendermint take a look at this to double check before closing this issue |
Also note that if I were simply to change the logic to disallow signing blocks with the same h/r/s and block ID, it wouldn't sign precommits after signing a prevote, which would break the KMS entirely. |
So Tendermint signatures are currently non deterministic because the message includes a timestamp. This error message will be seen when Tendermint sees different signatures for the same block id at the same height.. In a setup where multiple full nodes are talking to the same signing oracle, this expected behavior and should be the most fault tolerant behavior. Thanks for testing this. |
Thanks for clarifying @zmanian. We're planning to run with this setup on cosmoshub-3, and I think others might want to do the same down the line, but I think the log output of Tendermint with errors severity will be a distraction. Ideally I think this setup should not log consensus errors, so thinking about two options:
Not sure if 1) is even possible, can KMS return an error/empty signature to Tendermint yet? |
I think 1 is bad from an availability perspective, and allowing it has no disadvantages other than surfacing to the user what is happening in a non-terrifying way. The real fix for 2 is to remove the timestamp from signed blocks in which case the signatures will be fully deterministic. I think this is going to happen soon?
It already returns errors in event a validator requests signing a different block ID at the same height/round/step. |
On second thought: I agree availability might be slightly better when signing everything as there is a better chance of getting the signature propagated throughout the network. However I think signing "aggressively" with the YubiHSM and Ledger is undesirable due to the limited capacity of these devices. On a YubiHSM, signing for a single chain with two validators will be getting close to capacity. |
Regarding doing anything about it prior to the release: even if we were to do something, doing it now would be a big invasive change right before a release, where all of the other things I've been doing have been tweaking loglines. I don't think it's worth delaying the release over, and I think it's a risky thing to change at the last minute. When I did tweak the loglines though, I made all double signing attempts errors, because I do consider them errors. I think having several validator instances haphazardly hammering the KMS in a completely uncoordinated manner to produce duplicate signatures is not a good steady state of affairs and also makes the KMS the one vital lynchpin preventing double signing. It would be better to use some external coordination service to elect one validator as active, and handle failover when that validator goes inactive. This is the approach commonly used by e.g. Certificate Transparency log signers. This also provides a belt-and-suspenders approach to preventing double signing. Until there's good solutions for that, however, I'd agree this is the best we can do. That said we only plan on using this configuration for failover and don't plan on running it in perpetuity. |
Agree on wrapping the 0.6.0 release without major changes and thanks for your patience with all these last minute issues. Since we are already using YubiHSM and thus relying on the tmkms to prevent double signing in software, I am not (very) worried about adding multiple validators. It seems the main risk would be a race condition bypassing the double signing prevention. Once we are confident this is working well by stress-testing with 5 validators, we'll probably run a setup with two validators and a single tmkms instance to allow for regular maintenance on the validators without missing blocks. Would love to see a specialised solution for this, but I think tmkms can serve this specific purpose well. |
Yeah this is something we can definitely do better on. But it requires some careful thinking on how. Everything gets way better once we remove timestamps from votes cause then we can just have a warm cache for signatures |
Related to #314, we're testing multiple (5) validators connecting to a single KMS process.
This turns up in the validator logs, so it seems the multiple PreVote/PreCommits are broadcast as expected:
According to this log output they are not duplicate when diving into the tendermint source:
https://github.com/tendermint/tendermint/blob/b73cfe878682e0b73d9a21ec1d8dc456ddd90215/types/vote_set.go#L181
Log from the KMS side for above block:
I am not sure if this is as expected - perhaps it would be better to not provide a signature for duplicate block ids?
The text was updated successfully, but these errors were encountered: