Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abci++: vote extensions are needed in several cases where we currently don't have them #8272

Closed
thanethomson opened this issue Apr 7, 2022 · 3 comments
Assignees
Labels
C:abci Component: Application Blockchain Interface
Projects

Comments

@thanethomson
Copy link
Contributor

There are currently 3 cases where we need vote extensions but don't have them, and if we don't have them we panic:

  1. A fresh node joining an existing network that block sync's and then switches over to consensus
  2. An existing node that goes down and comes back again at a later stage
  3. The whole network goes down and comes back up again simultaneously

See this E2E test run for examples of 1 and 2.

The core reason for this is that we currently don't store precommits' extensions anywhere. When we switch over from block sync to consensus we call CommitToVoteSet which effectively simulates the precommits. This worked for ABCI, but won't for ABCI++, since we don't store the vote extensions anywhere at present, causing panics in nodes attempting to switch from block sync to consensus.

Some possible solutions we've considered thus far:

  1. Create a new type, ExtendedCommit, which contains a commit and its associated vote extensions and extension signatures. Upon each commit, store the last_seen_commit (of type ExtendedCommit, with vote extensions and their signatures) in the block store.
  2. Either:
    1. Extend the StatusResponse for the block sync protocol to also supply the node's last_seen_commit, including vote extensions.
    2. Devise a workaround such that, if a node needs to propose as it switches from block sync to consensus (for which it needs vote extensions and their signatures), signal internally to the node that they've just switched over such that they produce an invalid proposal. This proposal will be rejected by the network and the next proposer will be chosen.
@thanethomson thanethomson added the C:abci Component: Application Blockchain Interface label Apr 7, 2022
@thanethomson thanethomson added this to To do in ABCI++ via automation Apr 7, 2022
@thanethomson thanethomson self-assigned this Apr 7, 2022
@cmwaters
Copy link
Contributor

cmwaters commented Apr 7, 2022

For 2 and 3, shouldn't the consensus WAL persist the messages so we could recover in the event that a node crashed.

Is there perhaps a need to do something like the seen commit where before we flush the WAL at the end of the height, we always persist the vote extensions to disk for the next height?

As to 1, I think we agreed that it's fine that a node coming out of blocksync is unable to immediately be the proposer in the following height (unless they can quickly receive all the vote extensions from the previous height)

@thanethomson
Copy link
Contributor Author

For 2 and 3, shouldn't the consensus WAL persist the messages so we could recover in the event that a node crashed.

After in-depth discussions, the WAL won't suffice in cases where someone's fallen behind while they've been offline. The WAL just doesn't store enough data for such cases, and we wouldn't want it to. It might help in case (3).

Is there perhaps a need to do something like the seen commit where before we flush the WAL at the end of the height, we always persist the vote extensions to disk for the next height?

Not sure about the timing there, but it seems to me as though storing the seen commit upon commit would probably suffice for our purposes.

The rabbit hole went a bit deeper in discussions after I logged this issue, as there's actually a 4th case where we don't have vote extensions when we need them:

  1. A node is happily running consensus and then its network connection dies. When it comes back online it's still in consensus but has no way of accessing the vote extensions that were sent while it was offline. All other nodes that send it votes while catching up will lack the vote extensions and their corresponding signatures because other nodes reconstruct their votes from commits.

From the last discussions, we've opted to introduce a panic at this point such that the node restarts in block sync mode and then switches to consensus. In future we'll be looking into a way of making the transitions between block sync and consensus and back again smoother.

@sergio-mena
Copy link
Contributor

Closed by #8433

ABCI++ automation moved this from In progress to Done Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:abci Component: Application Blockchain Interface
Projects
No open projects
Development

No branches or pull requests

3 participants