Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we break validBlock property if a process can leave round upon receiving 2f+1 Precommit nil messages? #1690

Closed
milosevic opened this issue Jun 5, 2018 · 4 comments
Assignees
Labels
C:consensus Component: Consensus T:bug Type Bug (Confirmed)
Projects
Milestone

Comments

@milosevic
Copy link
Contributor

milosevic commented Jun 5, 2018

Let assume that we are in the period after GST. A correct process p sent Precommit id(v) message at time t in round r. By validBlock property, every correct process that enters round r+1 should receive the same Polka p received before leaving round r.

The process p received valid Proposal and 2f+1 matching Prevote messages at time t. At least f+1 of those Prevote messages is from correct processes, and let denote this set of correct processes with C. By majority intersection property, any set of 2f+1 Prevote messages in round r contains at least single message from the set C. Let denote with time t1, the earliest point in time a process from C sent it's Prevote message in round r.

Note that a correct process can't send Precommit message before receiving 2f+1 Prevote messages. Therefore, no correct process can send Precommit message in round r before time t1.
If a correct process q send Precommit nil message, this implies that he waited timeoutPrevote before. Therefore, no correct process can send Precommit nil before time t1 + timeoutPrevote. Let assume that some correct process q received 2f+1 Precommit nil messages at time t1 + timeoutPrevote. According to current version of algorithm, q will start round r+1 at time t1 + timeoutPrevote.

If a correct process has started timeoutPrevote at time t1, this implies it receives 2f+1 Prevote messages. Therefore, every correct process will start timeoutPrevote the latest at time t1 + Delta (as we are in period after GST). So the latest point in time p can send Precommit v is t1 + Delta + timeoutPrevote.

Note that if a faulty process sends Prevote v to p just before timeout expires, p might send Precommit v at time t1 + Delta + timeoutPrevote, and process q at that point in time already left round r.

The potential solution for this issue is introducing timeoutPrecommit in case a process receives 2f+1 Precommit nil before it starts round r+1.

@milosevic milosevic added T:bug Type Bug (Confirmed) C:consensus Component: Consensus labels Jun 5, 2018
@jaekwon
Copy link
Contributor

jaekwon commented Jun 8, 2018

Awesome, thank you. Here’s a stab at the proof in terms of the updated protocol.

Say that the earliest a correct processor sends a precommit is time T_a. The last time a correct process precommits is T_a + Delta due to gossiping of prevotes (after GST, Delta is finite and defined by network conditions and number of validators etc). Delta is thus also the maximum bound between when the first correct process and last correct process can pre-commit.

There exists some round R after GST where Delta (prevotes) + Delta (polka) < timeout_precommit where precommit_timeout is sufficiently long for the polka to be broadcast from the victim correct validator to all correct validators before round r+1. And if the proposer at r+1 is correct then we’re going to terminate at r+1. So there exists some r after GST where all conditions are satisfied and we terminate.

@jaekwon
Copy link
Contributor

jaekwon commented Jun 8, 2018

Here's the corresponding update to the consensus algo:

diff --git a/consensus/state.go b/consensus/state.go
index 338a14a9..2ae49b8d 100644
--- a/consensus/state.go
+++ b/consensus/state.go
@@ -1524,7 +1524,9 @@ func (cs *ConsensusState) addVote(vote *types.Vote, peerID p2p.ID) (added bool,
                blockID, ok := precommits.TwoThirdsMajority()
                if ok {
                        if len(blockID.Hash) == 0 {
-                               cs.enterNewRound(height, vote.Round+1)
+                               cs.enterNewRound(height, vote.Round)
+                               cs.enterPrecommit(height, vote.Round)
+                               cs.enterPrecommitWait(height, vote.Round)
                        } else {
                                cs.enterNewRound(height, vote.Round)
                                cs.enterPrecommit(height, vote.Round)

@xla xla added this to the launch milestone Aug 2, 2018
@jaekwon
Copy link
Contributor

jaekwon commented Sep 1, 2018

Just noting that this is related to but distinct from #1745 which is already merged in. The above issue is about prevotes/precommits of +2/3 any, where as this issue is about precommits of +2/3 nil, two separate issues.

@ebuchman ebuchman modified the milestones: v1.0, launch Sep 13, 2018
@milosevic milosevic added this to Planned in current iteration Sep 25, 2018
@milosevic milosevic self-assigned this Sep 25, 2018
@milosevic milosevic moved this from Planned to Ongoing in current iteration Sep 26, 2018
@milosevic milosevic moved this from Ongoing to Review Needed in current iteration Sep 26, 2018
@xla xla removed this from Review Needed in current iteration Sep 26, 2018
@xla xla added this to Queued in Launch via automation Sep 26, 2018
@xla xla moved this from Queued to Ongoing in Launch Sep 26, 2018
@milosevic milosevic moved this from Ongoing to Done in Launch Oct 4, 2018
@ebuchman
Copy link
Contributor

ebuchman commented Oct 5, 2018

Done in #2493

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:consensus Component: Consensus T:bug Type Bug (Confirmed)
Projects
No open projects
Launch
  
Done
Development

No branches or pull requests

4 participants