Skip to content
This repository has been archived by the owner on Feb 23, 2022. It is now read-only.

Update specification of light client algorithm to align with the code #61

Merged
merged 23 commits into from
Jan 23, 2020

Conversation

milosevic
Copy link
Contributor

  • Fix timing issues by introducing Delta parameter

@milosevic milosevic force-pushed the zm_lite_client_algo branch 6 times, most recently from bb6b481 to 3fbdc58 Compare November 15, 2019 11:45
- Fix timing issues by introducing Delta parameter
Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass. Added some comments. In general I think we could simplify more.

spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comment mainly to help with readability. For the purpose of this document I think having (bool, err) as return in CheckSupport() makes the pseudocode harder to read.

spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com>
Copy link
Contributor

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good changes! Left a few comments.

spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns from reviewing the latest implementation update (informalsystems/tendermint-rs#100 (review))

Store.Add(h2)
return nil
}
if err != ErrTooMuchChange return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this here, doesn't it mean we never start bisecting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in line 359 with err != nil and, if err != ErrTooMuchChange, the only possible values for err (looking at CheckSupport()) are:

  • ErrHeaderNotWithinTrustedPeriod
  • ErrInvalidAdjacentHeaders

They are both indicators that current bisection has failed, further recursive calls should not be done and therefore the call stack will unwind propagating this error to the final caller (VerifyHeader()).

I mentioned this in an earlier comment, it is better to write 359 and others like:
if fatalCheckSupportError(err) return err
Also, potential future error types added to CheckSupport() would be easier to implement without changing all the callers.

Copy link
Contributor

@liamsi liamsi Dec 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could use sth else than an error to indicate that there was too much change in the validator set (going from h1 to h2). Returning a bool for instance. Or simply renaming the method name to indicate the only relevant case directly: ValidatorsChangedTooMuch (or EnoughIntersection? I'm sure there are better names).

FWIW, linking the related discussion regarding the rust implementation: https://github.com/interchainio/tendermint-rs/pull/100/files#r360502272


_Verification Condition:_ We may need a Tendermint invariant stating that if _h2.Header.height = h1.Header.height + 1_ then _signers(h2.Commit) \subseteq h1.Header.NextV_.
h2 := Commit(height)
if !verify(h2) { return ErrInvalidHeader(h2) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have a flow where we CheckSupport before we verify ?

if CheckSupport(h1,h2,trustlevel) {
return true
if isWithinTrustedPeriod(h2) {
Store.add(h2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant? CanTrust already has stored h2 here. At least in the bisection case because:

func CanTrustBisection(h1,h2,trustThreshold) error {
  assume h1.Header.Height < h2.header.Height

  err = CheckSupport(h1,h2,trustThreshold)
  if err == nil {
    Store.Add(h2)
    return nil
  }
  if err != ErrTooMuchChange return err

  pivot := (h1.Header.height + h2.Header.height) / 2
  hp := Commit(pivot)
  if !verify(hp) return ErrInvalidHeader(hp)

  err = CanTrustBisection(h1,hp,trustThreshold)
  if err == nil {
      Store.Add(hp)
      err2 = CanTrustBisection(hp,h2,trustThreshold)
      if err2 == nil {
        Store.Add(h2)
        return nil
      }
      return err2
  }
  return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bisection needs to store headers in between (the pivot headers). But maybe we leave it to the caller to actually deal with the last header h2?

I think the name of this method (CanTrust / CanTrustBisection) should actually indicate that it is updating the store and not just answering the question "Can we trust this header based on h1".

func Bisection(h1,h2,trustlevel) bool{
if CheckSupport(h1,h2,trustlevel) {
return true
if isWithinTrustedPeriod(h2) {
Copy link
Contributor

@liamsi liamsi Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this twice? When we fetch the commit / signed header above and then again here? Shouldn't we assume that we are still within the trusted period here?

  h2 := Commit(height)
  if !verify(h2) { return ErrInvalidHeader(h2) }
  if !isWithinTrustedPeriod(h2) { return ErrHeaderNotWithinTrustedPeriod(h2) }
  // ...
  if isWithinTrustedPeriod(h2) { /*...*/ }

If isWithinTrustedPeriod was false, we would have already returned with an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because of #57 but I agree it doesn't really make sense, especially if time is going to be passed in

err = CanTrust(trusted_h, untrusted_h, trustThreshold) // or CanTrustBisection((trusted_h, untrusted_h, trustThreshold)
if err != nil { return err }

if isWithinTrustedPeriod(untrusted_h) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this re-checking is necessary, I think it needs an explanation (see #61 (comment)).

if err != nil { return err }

if isWithinTrustedPeriod(untrusted_h) {
Store.add(untrusted_h)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant, at least in the CanTrustBisection case untrusted_h was already stored.

@ebuchman
Copy link
Contributor

I think I may have a simpler structure - please see informalsystems/tendermint-rs#114

Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
while `Header.Time` corresponds to the [BFT time](bft-time.md). In this note, we assume that clocks of correct processes
are synchronized (for example using NTP), and therefore there is bounded clock drift (CLOCK_DRIFT) between local clocks and
BFT time. More precisely, for every correct process p and every header (correctly generated by the Tendermint consensus)
time (BFT time) the following inequality holds: `Header.Time < now + CLOCK_DRIFT`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include the lower bound here? Or at least mention why we don't care about it ? It might also be helpful to clarify that we mostly need this to hold for the light client's local clock, not just the validators ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what would be the lower bound? Not sure if we can say anything more precise than genesis time, but not sure how this is useful. My understanding is that upper bound ensures that we don't consider headers that are outside the assumption that lite client clock is in sync with blockchain time; as time progresses, not sure what we can say about the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but it's currently written from the perspective of validators. For the validators, we want a lower bound (I think?). For the light client, we only need the upper bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me when this inequality should hold. I guess when the header is generated? Also, there are two moving parts in the definition, i.e., "Header.Time" and "now", so it is not clear who to blame when the inequality is violated. I guess our assumption is that "Header.Time" is always correct (by definition; it serves as a time reference for the system), and that violation of the inequality means that the Lite Client is faulty. IOW it is in the responsibility of the lite client to keep its clock synchronized.

Copy link
Contributor Author

@milosevic milosevic Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inequality holds from the moment header is generated. I was assuming that header is always correct, i.e., that header is coming from the main chain that is not forked. We need to make this assumption more clear. As we also assume that lite client processes are synchronised with respect to BFT time, header that does not satisfy this inequality must come from a faulty full node. We can think about trying to weaken this assumption to eventually hold. I think that it would be useful trying to more precisely understand attack vectors in case local lite client clock drifts more than clockDrift. Ideally, safety should not be violated in case this assumption does not hold, but only termination should temporary be violated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We rely on bfttime for the failure model and the checks heavily. So if the lite clients clock is off too far, everything might be lost. The question is, whether within the Bisection we should/can check whether the lite client's clock is synchronized (and what timing assumptions this would impose), or whether safety has to rely on the synchronization of the lite client's clock.

The function _Bisection_ checks whether to trust header _h2_ based on the trusted header _h1_. It does so by calling
the function _CheckSupport_ in the process of
bisection/recursion. _CheckSupport_ implements the trusted period method and, for two adjacent headers (in term of heights), it checks uninterrupted sequence of proof.
now = System.Time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want the function to have access to the system clock ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this function is to illustrate how to correctly use VerifyBisection function. In case we don't check if we are still within trusted period of initial trusted state after VerifyBisection is executed, we can't give precise guarantees to the user. I don't know how to avoid this. As most of complexity is happening within VerifyBisection this shouldn't be a big problem from the testing perspective.

spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
spec/consensus/light-client.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge this and iterate from there? We can also merge #73 into this or into master after

Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should merge this!

Move light specs to their own dir, add readme and diagram
@milosevic milosevic changed the title Add non-recursive specification of Bisection algorithm Update specification of light client algorithm to align with the code Jan 23, 2020
@milosevic milosevic merged commit 033a0cb into master Jan 23, 2020
@melekes melekes deleted the zm_lite_client_algo branch February 17, 2020 10:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants