-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consensus: implement double sign risk reduction adr-051 #5147
consensus: implement double sign risk reduction adr-051 #5147
Conversation
👋 Thanks for creating a PR! Before we can merge this PR, please make sure that all the following items have been
Thank you for your contribution to Tendermint! 🚀 |
Codecov Report
@@ Coverage Diff @@
## master #5147 +/- ##
==========================================
- Coverage 62.48% 62.34% -0.14%
==========================================
Files 258 258
Lines 27129 27152 +23
==========================================
- Hits 16951 16929 -22
- Misses 8699 8735 +36
- Partials 1479 1488 +9
|
e2a26b4
to
786890e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
consensus/state.go
Outdated
for _, s := range lastCommit.Signatures { | ||
if s.BlockIDFlag == types.BlockIDFlagCommit && bytes.Equal(s.ValidatorAddress, valAddr.Address()) { | ||
cs.Logger.Error("Error checking double sign risk reduction logic", "err", ErrDoubleSignRiskReduction) | ||
return ErrDoubleSignRiskReduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a place where we panic...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when return error on there func (cs *State) OnStart() error
, panic occurs in SwitchToConsensus
log example
I[2020-07-24|08:04:15.056] Started node module=main nodeInfo="{ProtocolVersion:{P2P:9 Block:12 App:1} DefaultNodeID:fe0092406535cb943f5e15e3c6c33718089e1d3a ListenAddr:tcp://0.0.0.0:26656 Network:chain-hsa2Bm Version:0.33.6 Channels:40202122233038606100 Moniker:8CBB29C542D2D308 Other:{TxIndex:on RPCAddress:tcp://0.0.0.0:26657}}"
I[2020-07-24|08:04:15.367] Executed block module=state height=37 validTxs=0 invalidTxs=0
I[2020-07-24|08:04:15.367] Committed state module=state height=37 txs=0 appHash=0000000000000000
panic: Failed to start consensus state: double sign detected or restarted in DoubleSignCheckHeight
conS:
ConsensusState
conR:
ConsensusReactor
goroutine 45 [running]:
github.com/tendermint/tendermint/consensus.(*Reactor).SwitchToConsensus(0xc00025f880, 0xc, 0x1, 0xc0002a8440, 0x6, 0xc0002a8450, 0xc, 0x25, 0xc00258ebc0, 0x20, ...)
github.com/tendermint/tendermint/consensus/reactor.go:127 +0x2cb
github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc000b44700, 0xc0024af800)
github.com/tendermint/tendermint/blockchain/v0/reactor.go:319 +0x11be
created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
github.com/tendermint/tendermint/blockchain/v0/reactor.go:110 +0x89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice if you could add a small section about this feature in the docs, possibly here: https://docs.tendermint.com/master/tendermint-core/validators.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'm late to the game here, so don't let me block this, but I was asked for an opinion. Personally I feel like this misses too many corner cases, and that we probably don't want to weaken the incentives for operators to manage their keys and nodes properly in the first place. Ideally, a private key should only be used by a single node. From what I understand, the primary use-case here is validator failover. Alternative solutions might be to e.g. allow multiple private keys per validator (in prioritized order) or use some sort of distributed lock service to make sure only a single validator replica is attempting to vote for a given height. I haven't given this much thought though, just a couple of ideas. |
Keys are the kind of thing that are worth getting right the first time, so I think it's OK if we take a step back just to double-check this approach. In this case, we probably should have looped in the folks tagged above at the ADR step of the process rather than at this step. But ah well - we're learning and getting better at that. Going to tag @tarcieri on this one too, since I bet he might have an opinion on it 😉 |
I'm a fan of a belt-and-suspenders approach here. There's been a lot of discussion of various ways to better mitigate double signing on the KMS side, but given slashing risk, it's nice to also improve checks on the validator side too. The only thing I might additionally suggest here is a potential optional enhancement (perhaps a configurable one), would be a sort of "warm up" phase. Concretely, if you had two validators, one of them could sign only the odd numbered blocks, and another the even numbered ones (much like the similar trick for database sharding by primary key). Or for 3 validators, it'd be if their "validator index" matched (block_height % 3), (block_height % 3) + 1, (block_height % 3) + 2 respectively. This would give some time for validators to signal to each other that they're online, and also a way of assigning a priority so if two (or more) validators do happen to go active at the same time, the lowest priority ones (perhaps the ones with the highest "validator index") can observe the highest priority one is signing, at that point deliberately stop signing, and the highest priority one can witness it before it begins signing every block. All of that seems like a potential next step for this ADR/PR, which otherwise seems like a decent start. |
Thanks for the opinion. We also agree that allowing multiple validators should be ultimate goal, but achieving it with very high certainty of non-double-signing for any edge case is quite a challenge for us. Because the result of double-signing is catastrophic, we hope the future solution should provide us 100% logical certainty that double-signing will not occur in any edge case. We will continue to participate on the next step discussion! |
I think |
@@ -366,6 +367,25 @@ func (cs *State) OnStart() error { | |||
return err | |||
} | |||
|
|||
// Double Signing Risk Reduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually sufficient to only do this in OnStart()
? If a node has e.g. fast sync disabled, or for whatever other reason isn't caught up with the chain (for example because block replay took a long time), then a different validator can submit votes for heights that this node doesn't know about yet. You should be able to reproduce this simply by starting a new node with a duplicate key and fast sync disabled.
I believe we will have to run this check again before casting our first vote, after consensus has caught up with the chain head, or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that would be a more secure way.
What do you think about adding check logic again in gossipDataRoutine
(height and round matched) for that? and I could be set the DoubleSignCheckHeight
as a flag to zero to make it work only once here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I'm not very familiar with the intricacies of the consensus module, but I'd say the check should run right before we try to cast our first vote, probably in State
somewhere, not Reactor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. Erik. This is Hyung from B-Harvest. We are quite confident that the proposed additional check will satisfy raised concern about covering different route of risk.
Because this work has been staled, could you elaborate more detail on the codebase about your concerns? It will be helpful to understand your opinion.
We will wait several days before we move on with our proposed approach.
Thank you for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it won't work without fast-sync / state-sync. We can create a separate issue for improving it. I doubt there's a clear point in code where "right before we try to cast our first vote" happens, so it might be difficult to pin point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'll create a separate issue for improving it after merged this fast-sync version
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@@ -366,6 +367,25 @@ func (cs *State) OnStart() error { | |||
return err | |||
} | |||
|
|||
// Double Signing Risk Reduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we extract this into a function please?
@dongsam can you also rebase the PR agaist the latest master? |
the former one |
13b04b6
to
d0c0bd8
Compare
Hmm... github still says "This branch is out-of-date with the base branch" |
…o ADR-51-double_sign_risk_reduction
Implementation spec of Double Signing Risk Reduction ADR-51 by B-Harvest
DoubleSignCheckHeight
config variable to ConsensusConfig for "How many blocks looks back to check existence of the node's consensus votes when before joining consensus"consensus.double_sign_check_height
toconfig.toml
andtendermint node
as flag for setDoubleSignCheckHeight
consensus.double_sign_check_height
to0
( it could be adjustable in this PR, disable when 0 )Refs
docs/
) and code commentsFiles changed
in the Github PR explorer