Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus: spec and impl for BFT time #1146

Closed
ebuchman opened this issue Jan 24, 2018 · 24 comments
Closed

consensus: spec and impl for BFT time #1146

ebuchman opened this issue Jan 24, 2018 · 24 comments
Assignees
Labels
Milestone

Comments

@ebuchman
Copy link
Contributor

@ebuchman ebuchman commented Jan 24, 2018

Now that timestamps are included in votes, we can use them to construct a BFT time to be included in the next header

Need to write a spec for how to do this and then implement it.

@melekes

This comment has been minimized.

Copy link
Contributor

@melekes melekes commented Mar 7, 2018

@milosevic can you link a spec here?

@milosevic

This comment has been minimized.

@zmanian

This comment has been minimized.

Copy link
Contributor

@zmanian zmanian commented Mar 15, 2018

I have concerns about BFT time.

  1. BFT time as per the spec doesn't appear to weighted by voting power. I believe contributions to BFT time should be weighted by voting power. You basically should get a number of entries in the least proportional to voting power.

  2. I think there should be max variance/ std deviation for submitted times and if we cannot generate a consensus with 2/3rd of the voting power and low enough variance we should fail the round. If the variance exceeds the limits, outlier values should be removed from the vote and then we retests. Remove(Max(median-min),(max-median) but we also need to check if we still have 2/3rd voting power.

  3. The BFT time spec does not define Byzatine behavior in regards to time in votes. I think if the delta between a confirmed time and your vote time is greater than some value it should be slashable.

@jlandrews

This comment has been minimized.

Copy link
Contributor

@jlandrews jlandrews commented Mar 16, 2018

I'm not caught up on all the backstory, but what's the argument against having the proposer set a timestamp, and the validators simply accept or reject it? Using some rule such as, reject if before previous, or ahead of validator local time by more than x.

Having the votes contain a timestamp and then taking the median seems more complicated. Also #1319 could be done much more efficiently if each validator is signing the same canonical piece of data for their prevotes/precommits, as we would only need one copy of the vote, a bitfield to specify validators signing, and the aggregated signature. With distinct timestamps in votes we will need to include each of them fully in the last commit set, while everything else can be aggregated or included as a single copy.

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented Mar 16, 2018

Great point @jlandrews . Timestamps in votes definitely make them more complicated, for instance we have to deal with the possibility of two distinct votes for the same Height/Round/block but with different timestamps, which has already led to more complexity in the privValidator.

That said, it does seem really valuable to have timestamps in the votes for debugging, though maybe they don't need to be signed.

A concern with letting the proposer just set the time, or with other forms of restrictions on time, is that we don't have any mechanism to resync clocks, so if the clocks drift apart, we might halt forever.

The whole thing warrants a larger discussion. Perhaps we can convene a call.

@milosevic

This comment has been minimized.

Copy link
Contributor

@milosevic milosevic commented Mar 19, 2018

Thanks for great points @zmanian @jlandrews. Regarding @zmanian comments: fully agree regarding 1). The current version is not precise enough about median computation, and it sounds more like assuming uniform distribution of votes like in classical BFT. I believe it can be easily fixed by phrasing it in terms of voting power, as in Tendermint we can effectively see vote messages as aggregators of individual votes, where each voting power unit is one vote. So if you treat a vote msg (v1, ts1) with voting power 23 as 23 votes for value (v1,ts1) then median will also gives us the right value. Regarding 2) and 3), I believe it deserves a bit longer discussion. Having time as part of consensus mechanism can lead to termination issues, so we need to be very careful with this. It would be also great understanding the tradeoffs and design in the context of using BLS signatures in Tendermint (#1319), and we should probably not jump into the implementation before at least proper discussion and specification of the potential solution. I will propose a fix in the context of the current spec to precisely refer to voting power, and @ebuchman could maybe organise a call regarding broader discussion.

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented Mar 19, 2018

Thanks @milosevic.

Let's aim for a call later this week. We can co-ordinate on slack and write the results up here after.

@zmanian

This comment has been minimized.

Copy link
Contributor

@zmanian zmanian commented Mar 20, 2018

I think proposer selected timestamp is fine if the rest of the quorum will refused to prevote a block where the delta between validator system time and proposer time > x

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented Apr 27, 2018

Note re #320 we should also ensure that the genesis.json has a set time

@Chjango

This comment has been minimized.

Copy link

@Chjango Chjango commented May 11, 2018

Notes from @chjj for @jlandrews:

I'm not sure I'm qualified to understand the full scope of what you guys need here, since it sounds like you want a truly consensus-based time mechanism, but if you want to go a more decentralized route, something similar to the way bitcoin calculates network adjusted time might be an option.

On the bitcoin p2p layer, time samples are gathered from every peer a node connects to and stored in a set. Every so often, the median time offset is pulled from that set and chosen as the network adjusted time offset. Whenever the system time is retrieved, the offset is applied against it. The system time, plus the adjusted offset, is used to ensure block timestamps are no more than 2 hours in the future. To verify block timestamps are not too far in the past, a median timestamp is pulled from a set of the previous 11 blocks and used as the oldest possible timestamp.

This may not be what you're looking for since it isn't a strictly "consensus" time mechanism: time is instead calculated as an emergent property of the network (and as such, it needs that 2 hour fudge window to guarantee the majority of nodes don't fall out of consensus).

There's trade-offs for each method. Bitcoin's network adjusted time is definitely not exact, and there may be potential "time sample attacks" on the p2p layer (though, no one has ever successfully executed an attack of this nature on bitcoin), but I think it's probably more decentralized than leaving it up to the validators.

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented May 14, 2018

A number of discussions since have decided:

  • dont sign the timestamp in votes (will allow for BLS compression)
  • set rules for validity of timestamp included by proposal in block

I like the idea of incorporating something from the network into the system time before proposing it, but we don't currently have timestamped messages at the peer layer.

@ebuchman ebuchman mentioned this issue May 14, 2018
15 of 19 tasks complete
@jaekwon

This comment has been minimized.

Copy link
Contributor

@jaekwon jaekwon commented May 14, 2018

I'm working on this now, I'll make a PR soon in the coming day or so. It will allow blocks to have "temporary" invalidity.

This might change, but the behavior is probably something like this... A node will not by itself prevote for (nor propose a newly constructed) a block where the block's timestamp does not satisfy some criterion. But if it sees a POL for it, or a commit for it, it will play along (and log some error message in the logs). If it seems a POL for it, the block is considered valid, and the node may even propose that block.

The criterion is a function that becomes more liberal over time... The result depends on when it's called (or we can implement the function to be a pure function, same diff) and what the round is. It accepts blocks between "min_acceptable_time" and "max_acceptable_time" where "max_acceptable_time" keeps increasing, but faster than the clock. "min_acceptable_time" can also keep going back in time, but it will never be lower or equal to the previous committed block's time. Proposers must be careful to propose a block time that is greater than the previous block time.

I think the "min_acceptable_time" can maybe just be equal to the previous block time + iota on round 2 or something like that.

@jaekwon

This comment has been minimized.

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented May 21, 2018

@jaekwon can you help clarify a few things please:

  • what is the relation between wiggle and timeout_propose?
  • is the proposer expected to set her internal clock to last_time_stamp + iota if she is behind ?
@milosevic

This comment has been minimized.

Copy link
Contributor

@milosevic milosevic commented May 22, 2018

Note that we don't use lastBlockTime in the computation of the MaxValidTime, so we could have MaxValidTime < MinValidTime. Imagine the following scenario with 4 processes:
last block time is 4, iota is 1, wiggle is 2
(p1, 5), (p2, 5), (p3, 2), p4 is faulty

For process p3, MinValidTime is 5, and MaxValidTime is 4. We probably want MaxValidTime > MinValidTime always. Probably the fix is to have
MaxValidTime = max(now, lastBlockTime).Add(wiggle).Add(wiggle*wiggle_r*round).

@jaekwon In your example with 100 validators, should we consider a process with a faulty clock as a correct one? As a more general question, do we assume some clock drift between (some number, for example f+1) of correct processes as a correctness requirement?In my view, that shouldn't be the case. BFT time should be eventual property and termination of Tendermint shouldn't depend on clock drifts of correct processes, i.e, we should terminate even if clocks of all correct processes are screwed. One approach would be trying to enforce strict time only in the first round and then for higher rounds requiring only that time for the block h + 1 is in the range (lastBlockTime + iota, lastBlockTime + k* iota), where k and iota are configuration parameters. This will ensure that eventually time is close to clock time as we will during good period probably be accepting first round Proposal as correct processes will synchronise their clocks during good period.

One concern I am having with the current proposal is how long it will take for a process to accept proposal if his clock is drifted forward significantly (with wiggle_r mechanism)? This directly affects termination of the consensus. I would prefer to minimise stuff that affect termination.

@ebuchman Why do you think there is a relation between wiggle and timeout_propose? These seems to be quite different mechanisms. My intuition for wiggle is acceptable clock drift between processes, which if validators use NTP (which should probably be recommended practice) shouldn't be pretty high, I guess order of hundreds of ms.

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented May 22, 2018

BFT time should be eventual property and termination of Tendermint shouldn't depend on clock drifts of correct processes, i.e, we should terminate even if clocks of all correct processes are screwed.

Agree here.

I'm a bit concerned again about this proposer-based BFT time. It seems it might be more ideal to try and have a background protocol to do BFT time with signed vote-like msgs just at a lower frequency - could be every minute or every few minutes.

Do we really need to try and get such high precision timestamps if it might slow down the chain or create significant dependence on synced clocks ?

Why do you think there is a relation between wiggle and timeout_propose?

Because we admit that it might take timeoutPropose to receive a block, so we should always be willing to accept a block from now-timeoutPropose. But if wiggle < timeoutPropose, then we'd reject blocks even after waiting timoutPropose for them. So seems they may be related somewhat

@milosevic

This comment has been minimized.

Copy link
Contributor

@milosevic milosevic commented May 22, 2018

I'm a bit concerned again about this proposer-based BFT time. It seems it might be more ideal to try and have a background protocol to do BFT time with signed vote-like msgs just at a lower frequency - could be every minute or every few minutes.

Agree. We could have very simple rule based on lastBlockTime in most of the cases just to enforce monotonicity and then periodically compute lastBlockTime based on signed time field from precommits (note that this will require having time field inside the Vote but not set most of the time).

@jaekwon

This comment has been minimized.

Copy link
Contributor

@jaekwon jaekwon commented May 23, 2018

Implementation is complete: #1615

Compiling responses to the points raised above...

@jaekwon jaekwon mentioned this issue May 23, 2018
@jaekwon

This comment has been minimized.

Copy link
Contributor

@jaekwon jaekwon commented May 23, 2018

what is the relation between wiggle and timeout_propose

Since you mention it, I think MinValidTime should incorporate timeout_propose (or rather round_start_time-wiggle). Otherwise, they're independent.

is the proposer expected to set her internal clock to last_time_stamp + iota if she is behind ?

No.

Note that we don't use lastBlockTime in the computation of the MaxValidTime, so we could have MaxValidTime < MinValidTime.

Good point, but I think that's OK. That just means that no block can satisfy the subjective time validity property, which can happen if your clock is far behind what it should be.

In your example with 100 validators, should we consider a process with a faulty clock as a correct one? As a more general question, do we assume some clock drift between (some number, for example f+1) of correct processes as a correctness requirement?

I think we should have that as a correctness requirement, but keep track of both -- complete BFT correctness, and time-faulty correctness. And there could be many kinds of time-faults, depending on the model... and in our case we choose the hardware clock and its properties as the basis for the model.

I agree that we should implement an "emergent" "background" (to borrow Zarko and Ethan's words) BFT protocol that will periodically sync clocks.

But I think we should also have a different BFT between these sync periods... during which there are higher requirements on what is considered "correct", namely, that each correct validators' clocks are within "wiggle" of each other. So they're clustered together within a span of "wiggle". This shouldn't be difficult as long as wiggle is large enough, and we do the "background" BFT syncing often enough, and assume in our model that the machine clocks are sufficiently accurate such that drift happens within "wiggle".

What I've implemented is this second "foreground" BFT system. I think we should launch with this system (though maybe we can simplify it slightly if we determined to do so), and also initially require that Cosmos Hub validators need to have their clocks synced with real global time using existing tools and services... which doesn't seem like the worst thing to require of our validators.

Then, after launch, we can work on "background" BFT sync, which would automatically ensure that everyone's clocks are within wiggle.

BFT time should be eventual property and termination of Tendermint shouldn't depend on clock drifts of correct processes, i.e, we should terminate even if clocks of all correct processes are screwed.

Before I mentioned two BFT systems... the first is the "background" BFT process that syncs everyone's clocks to be within "wiggle", and the second is the "foreground" BFT process tht assumes everyone's clocks are going to be within "wiggle".

The "foreground" BFT time algo assumes that clocks are within some reasonasble bounds of each other (e.g. "wiggle"). Liveness is not affected as long as +2/3 are within some finite bounds of each other and the drift is sufficiently slow.

The benefit of having this "foreground" BFT time system is that the blockchain's time more accurately reflects the real time. I think this makes for a better blockchain, and may be critical in applications that involve, say, betting.

One approach would be trying to enforce strict time only in the first round and then for higher rounds requiring only that time for the block h + 1 is in the range (lastBlockTime + iota, lastBlockTime + k* iota), where k and iota are configuration parameters. This will ensure that eventually time is close to clock time as we will during good period probably be accepting first round Proposal as correct processes will synchronise their clocks during good period.

That sounds like more or less what I'm suggesting, except instead of k*iota I'm calling it wiggle+(wiggle*wiggle_r*round). The only difference is the variable component wiggle*wiggle_r*round, which is makes it even more permissive even when more than 1/3 are not time-correct -- i.e. the drift was so bad that the tolerance bounds for the "foreground" BFT failed. Without the wiggle*wiggle_r*round component, consensus can practically hang much more seriously in the case of BFT-time failure.

One concern I am having with the current proposal is how long it will take for a process to accept proposal if his clock is drifted forward significantly (with wiggle_r mechanism)? This directly affects termination of the consensus. I would prefer to minimise stuff that affect termination.

We should have some limitation on how much of a forward/future block time a validator is willing to accept, for a "foreground" BFT time system as this one. What should that limitation be?

I'm a bit concerned again about this proposer-based BFT time. It seems it might be more ideal to try and have a background protocol to do BFT time with signed vote-like msgs just at a lower frequency - could be every minute or every few minutes.

That's what I'm suggesting, except I'm suggesting that we do the background thing after launch.

Agree. We could have very simple rule based on lastBlockTime in most of the cases just to enforce monotonicity and then periodically compute lastBlockTime based on signed time field from precommits (note that this will require having time field inside the Vote but not set most of the time).

That's what my PR does, with better protection against the block time from jumping forward too much.

We can remove the round=0 exceptional logic for MinValidTime() to simplify the algorithm, that would simplify it a little more.

@jaekwon

This comment has been minimized.

Copy link
Contributor

@jaekwon jaekwon commented May 24, 2018

Updated the PR to remove round=0 exception logic.

4ec86ee

@milosevic

This comment has been minimized.

Copy link
Contributor

@milosevic milosevic commented May 24, 2018

I like this algorithm, I think it's good stuff if we do assume that correct processes do not drift a lot from each other. I was just wondering if we want to add such additional requirement as that's indeed additional liveness requirement. The other option is not depending at all on clock drifts, so even if clocks of all correct processes are broken, we would still terminate as they are still able to measure local timeouts (this is the only requirement we can't avoid). So the question I am having is whether due to some (potential) applications that have higher requirement on BFT time precision (with respect to wall time), we want to always add requirement on Tendermint liveness. Should this choice be discussed now or we proceed with reviewing Jae's code and postpone this discussion?

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented May 24, 2018

I think ideally if we're going to impose this extra requirement on Tendermint liveness it should be optional per network - not everyone using Tendermint will need or want this, so it would be unfortunate to subject them to it since it could actually compromise their liveness. Maybe it goes in ConsensusParams ?

@milosevic

This comment has been minimized.

Copy link
Contributor

@milosevic milosevic commented May 24, 2018

This is also what I had in mind. Don't know how easy is to implement it like that

@ebuchman

This comment has been minimized.

Copy link
Contributor Author

@ebuchman ebuchman commented Jul 19, 2018

Replaced by #2013

@ebuchman ebuchman closed this Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.