-
Notifications
You must be signed in to change notification settings - Fork 59
Update specification of light client algorithm to align with the code #61
Conversation
milosevic
commented
Nov 15, 2019
- Fix timing issues by introducing Delta parameter
bb6b481
to
3fbdc58
Compare
- Fix timing issues by introducing Delta parameter
3fbdc58
to
a4b68ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass. Added some comments. In general I think we could simplify more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comment mainly to help with readability. For the purpose of this document I think having (bool, err) as return in CheckSupport() makes the pseudocode harder to read.
Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com>
e17582c
to
2306108
Compare
4d60a43
to
069906a
Compare
…ent_algo # Conflicts: # spec/consensus/light-client.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good changes! Left a few comments.
60303ef
to
74d6c04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few concerns from reviewing the latest implementation update (informalsystems/tendermint-rs#100 (review))
spec/consensus/light-client.md
Outdated
Store.Add(h2) | ||
return nil | ||
} | ||
if err != ErrTooMuchChange return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this here, doesn't it mean we never start bisecting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in line 359 with err != nil
and, if err != ErrTooMuchChange
, the only possible values for err
(looking at CheckSupport()
) are:
ErrHeaderNotWithinTrustedPeriod
ErrInvalidAdjacentHeaders
They are both indicators that current bisection has failed, further recursive calls should not be done and therefore the call stack will unwind propagating this error to the final caller (VerifyHeader()
).
I mentioned this in an earlier comment, it is better to write 359 and others like:
if fatalCheckSupportError(err) return err
Also, potential future error types added to CheckSupport()
would be easier to implement without changing all the callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could use sth else than an error to indicate that there was too much change in the validator set (going from h1 to h2). Returning a bool for instance. Or simply renaming the method name to indicate the only relevant case directly: ValidatorsChangedTooMuch
(or EnoughIntersection
? I'm sure there are better names).
FWIW, linking the related discussion regarding the rust implementation: https://github.com/interchainio/tendermint-rs/pull/100/files#r360502272
spec/consensus/light-client.md
Outdated
|
||
_Verification Condition:_ We may need a Tendermint invariant stating that if _h2.Header.height = h1.Header.height + 1_ then _signers(h2.Commit) \subseteq h1.Header.NextV_. | ||
h2 := Commit(height) | ||
if !verify(h2) { return ErrInvalidHeader(h2) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have a flow where we CheckSupport before we verify ?
spec/consensus/light-client.md
Outdated
if CheckSupport(h1,h2,trustlevel) { | ||
return true | ||
if isWithinTrustedPeriod(h2) { | ||
Store.add(h2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant? CanTrust
already has stored h2
here. At least in the bisection case because:
func CanTrustBisection(h1,h2,trustThreshold) error {
assume h1.Header.Height < h2.header.Height
err = CheckSupport(h1,h2,trustThreshold)
if err == nil {
Store.Add(h2)
return nil
}
if err != ErrTooMuchChange return err
pivot := (h1.Header.height + h2.Header.height) / 2
hp := Commit(pivot)
if !verify(hp) return ErrInvalidHeader(hp)
err = CanTrustBisection(h1,hp,trustThreshold)
if err == nil {
Store.Add(hp)
err2 = CanTrustBisection(hp,h2,trustThreshold)
if err2 == nil {
Store.Add(h2)
return nil
}
return err2
}
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bisection needs to store headers in between (the pivot headers). But maybe we leave it to the caller to actually deal with the last header h2
?
I think the name of this method (CanTrust
/ CanTrustBisection
) should actually indicate that it is updating the store and not just answering the question "Can we trust this header based on h1".
spec/consensus/light-client.md
Outdated
func Bisection(h1,h2,trustlevel) bool{ | ||
if CheckSupport(h1,h2,trustlevel) { | ||
return true | ||
if isWithinTrustedPeriod(h2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this twice? When we fetch the commit / signed header above and then again here? Shouldn't we assume that we are still within the trusted period here?
h2 := Commit(height)
if !verify(h2) { return ErrInvalidHeader(h2) }
if !isWithinTrustedPeriod(h2) { return ErrHeaderNotWithinTrustedPeriod(h2) }
// ...
if isWithinTrustedPeriod(h2) { /*...*/ }
If isWithinTrustedPeriod
was false, we would have already returned with an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because of #57 but I agree it doesn't really make sense, especially if time is going to be passed in
spec/consensus/light-client.md
Outdated
err = CanTrust(trusted_h, untrusted_h, trustThreshold) // or CanTrustBisection((trusted_h, untrusted_h, trustThreshold) | ||
if err != nil { return err } | ||
|
||
if isWithinTrustedPeriod(untrusted_h) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this re-checking is necessary, I think it needs an explanation (see #61 (comment)).
spec/consensus/light-client.md
Outdated
if err != nil { return err } | ||
|
||
if isWithinTrustedPeriod(untrusted_h) { | ||
Store.add(untrusted_h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant, at least in the CanTrustBisection
case untrusted_h
was already stored.
I think I may have a simpler structure - please see informalsystems/tendermint-rs#114 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work
spec/consensus/light-client.md
Outdated
while `Header.Time` corresponds to the [BFT time](bft-time.md). In this note, we assume that clocks of correct processes | ||
are synchronized (for example using NTP), and therefore there is bounded clock drift (CLOCK_DRIFT) between local clocks and | ||
BFT time. More precisely, for every correct process p and every header (correctly generated by the Tendermint consensus) | ||
time (BFT time) the following inequality holds: `Header.Time < now + CLOCK_DRIFT`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include the lower bound here? Or at least mention why we don't care about it ? It might also be helpful to clarify that we mostly need this to hold for the light client's local clock, not just the validators ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what would be the lower bound? Not sure if we can say anything more precise than genesis time, but not sure how this is useful. My understanding is that upper bound ensures that we don't consider headers that are outside the assumption that lite client clock is in sync with blockchain time; as time progresses, not sure what we can say about the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but it's currently written from the perspective of validators. For the validators, we want a lower bound (I think?). For the light client, we only need the upper bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me when this inequality should hold. I guess when the header is generated? Also, there are two moving parts in the definition, i.e., "Header.Time" and "now", so it is not clear who to blame when the inequality is violated. I guess our assumption is that "Header.Time" is always correct (by definition; it serves as a time reference for the system), and that violation of the inequality means that the Lite Client is faulty. IOW it is in the responsibility of the lite client to keep its clock synchronized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inequality holds from the moment header is generated. I was assuming that header is always correct, i.e., that header is coming from the main chain that is not forked. We need to make this assumption more clear. As we also assume that lite client processes are synchronised with respect to BFT time, header that does not satisfy this inequality must come from a faulty full node. We can think about trying to weaken this assumption to eventually hold. I think that it would be useful trying to more precisely understand attack vectors in case local lite client clock drifts more than clockDrift. Ideally, safety should not be violated in case this assumption does not hold, but only termination should temporary be violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We rely on bfttime for the failure model and the checks heavily. So if the lite clients clock is off too far, everything might be lost. The question is, whether within the Bisection we should/can check whether the lite client's clock is synchronized (and what timing assumptions this would impose), or whether safety has to rely on the synchronization of the lite client's clock.
spec/consensus/light-client.md
Outdated
The function _Bisection_ checks whether to trust header _h2_ based on the trusted header _h1_. It does so by calling | ||
the function _CheckSupport_ in the process of | ||
bisection/recursion. _CheckSupport_ implements the trusted period method and, for two adjacent headers (in term of heights), it checks uninterrupted sequence of proof. | ||
now = System.Time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want the function to have access to the system clock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of this function is to illustrate how to correctly use VerifyBisection
function. In case we don't check if we are still within trusted period of initial trusted state after VerifyBisection
is executed, we can't give precise guarantees to the user. I don't know how to avoid this. As most of complexity is happening within VerifyBisection
this shouldn't be a big problem from the testing perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge this and iterate from there? We can also merge #73 into this or into master after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should merge this!
Move light specs to their own dir, add readme and diagram
22af52f
to
026fdde
Compare