New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Commit Timeout by splitting into a Quorum and Remainder Commit #7945
Comments
I disagree with the summary. As far as I understand Therefore, removing the timeout will remove liveness guarantees. |
I don't quite understand this explanation about affecting liveness.
I was under the impression that |
Just looking at the code now, I can't even see when/where |
If in practice the first 2/3 precommits agree on the block it does not prevent liveness. If they disagree it should. My point about liveness was about faulty executions. I guess they never happen on the hub.
In the arXiv paper the algorithm counts all precommits. @cason could you let us know what do you think? |
I think that The timeout that is required for ensuring liveness is In summary, it is common to confuse this two timeouts with similar names, but I think that only the |
I didn't fully get the point here either. If a validator receives In addition, the |
This timeout is scheduled in the |
Well, this flag is meant to skip the timeout when all the In other words, since the point of this timeout is to retrieve as many |
Regarding the proposal, I am not sure if and how you can rewrite a block, in order to include additional This would allow, for instance, two validators having different sets of |
You wouldn't be rewriting a block. The additional precommits would be added in the next block (h + 2) by the next proposer. It just means that nodes need to keep votes around for two heights instead of one. |
@cason Do you have read the Hotstuff paper? |
Hi Alex, I've read HotStuff paper and I'm quite positive they refer to the The main idea of responsiveness is that once the systems starts to behave synchronous, you decide within a fixed amount of rounds. You cannot ensure that a decision happens (liveness) while the system is asynchronous, because messages may not arrive or may be arbitrarily delayed. But when the system becomes synchronous, you should be able to decide, but in case of Byzantine protocols this is not that simple. First, because the proposer of the first round in the synchronous period can be Byzantine, which can prevent progress in any protocol until it is replaced by a correct proposer. Second, and this is specific of Tendermint, two correct processes may start a round in the synchronous period having different views of the consensus state. More specifically, a correct process might not know that a value was locked by other correct processes. In this case, even in a synchronous period and it being a correct process, it can propose a value that other correct processes will reject because they are looked in another value. This problem is solved in different ways by Tendermint and HotStuff. In one of the several versions of HotStuff, an additional communication step is added to ensure that if a correct process locked a value, all correct processes will learn about that before the end of the round. This additional round has this only purpose, as it does not is required for safety, nor for liveness, but only for responsiveness. But it is an additional communication step that is executed in all rounds, so it adds some latency also for the regular (failure-free) operation of HotStuff. Tendermint deals with the same problem by adopting the |
@cason Thanks for posting this explanation. To help me understand, do you mean that the waiting that Tendermint does in the precommit step is in place of the extra communication step in the hotstuff algorithm? Hotstuff does the extra consensus step to make sure that if a value is locked in a round, everyone will see it with this extra step. Tendermint does not ensure that the locked value will be seen by everyone in a round, but it does ensure that if a process missed the locked value, it will still make progress to the next round upon seeing the quorum of votes from other processes. Are they analogous because they both ensure that processes will make progress to the next round and not lock an incorrect value? |
Tendermint achieves this property of everyone being able to learn about a value locked by some correct processes due to gossip, as the votes that lead to a lock are eventually received by all processes. This does not happens with HotStuff, as Byzantine processes may send votes that enable a lock to only some processes, instead of all of them. The votes of Byzantine processes usually don't have impact on BFT protocols, as they voting power is limited to Votes from Byzantine processes become an issue in a scenario when correct processes are partitioned among two distinct values. In this scenario, Byzantine processes may use their votes to lead some, few ( The above scenario is known as hidden lock attack, and might, in the worst case, put the system in a bivalent state that prevents progress for good. That is, Byzantine processes, by playing very well with their votes and relying on some substantial amount of asynchrony, can make to system to jump between locked values, so that to never decide a value. HotStuff prevents the hidden lock attack by adding a communication step when correct processes inform other about values they locked in a round. Tendermint deals with the hidden lock attack by adding a new variable, The difference between the approaches is that in Tendermint the |
Hi @cason, Thanks for your explaination. |
Yes, you cannot ensure decision in less than f rounds, because they can be coordinated by Byzantine processes. But once a round started after In Tendermint, the possibility of having hidden locks may prevent such from happening, but due to the |
Protocol Change Proposal
Summary
TimeoutCommit
is a local consensus parameter which indicates the duration a node waits to receive additional precommit votes after having received the necessary 2/3 required to commit the block. The reason for this timeout is that having more precommits votes decreases the likelihood that a fork was produced in this block by the fact that more voting power would have been required to generate the fork (i.e. seeing 2/3+ votes means that 1/3+ could have double voted to generate a fork whilst 90% votes means that it would take at least 57% (67% - (100%-90%)) of voting power to double sign). I propose an alternative method which removesTimeoutCommit
, speeding up consensus, whilst still allowing for greater confidence in network integrity. The cost of this proposal is greater technical complexity.Proposal
Proposers of Tendermint currently append the precommit votes they saw in the previous height to the block they propose at the current height. This cannonicalizes the commit once the proposed block is agreed upon in consensus. Rather than having a single
LastCommit
, I propose separating these into two values: aQuorumCommit
and aRemainderCommit
.The
QuorumCommit
constitutes the necessary 2/3+ precommits for height h - 1, whilst theRemainderCommit
constitutes the remaining precommits for height h - 2. Thus validators have the duration of an entire height with which to still collect votes for theRemainderCommit
. Note, that it's also feasible to extend this to 3, 10 or even 100 heights in the past but this adds further complexity which I don't think is necessary.Persistence of Commits
Once a block gets committed, the node should add the
RemainderCommit
to theQuorumCommit
and store this at the height of the block it is associated with to make it easier to retrieve theSignedHeader
for light client verification (which is also used for block sync and state sync).Verification
Nodes using sequential verification (i.e. for block sync) will receive the
SignedHeader
and only need to check the signatures of the quorum commit to verify theHeader
(and thus the rest of theBlock
). Nodes using skipping verification (i.e. light clients) will figure out the overlay and may require signatures from both commits to verify theSignedHeader
.Signature Aggregation
It is likely we will adopt some form of signature aggregation in the near future. In this case the signatures in the quorum commit will be aggregated and the signatures in the remainder commit will be aggregated. Perhaps there might need some more thought into what is the best structure that supports both consensus and light client verification.
Just a final clarification: this is just an idea, none of this has been formally verified in any capacity and their might be holes in my thinking. I welcome anyone to challenge or build on top of this :)
For Admin Use
The text was updated successfully, but these errors were encountered: