Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make rollback command usable to fix missed hard upgrades #9171

Closed

Conversation

tsutsu
Copy link
Contributor

@tsutsu tsutsu commented Aug 5, 2022

Fixes #9174.

This PR addresses this edge case, by making the tendermint rollback command also remove the blockstore data for the block from which the invalid app-state was derived. This fixes cases where it's not the application's execution of the block that's incorrect, but the block itself that's invalid. (Presumably this condition only ever pertains in a hard-fork scenario — i.e. when a block that would be valid under node-software version N, becomes invalid under node-software version N+1; and such a block is produced by a validator who hasn't upgraded to version N+1.)

This code has been smoke-tested in a real-world use-case: I wrote this code "in anger", as a hotpatch to fix an evmos node that was experiencing exactly the problem described in #9174, so that the node could resume sync rather than needing to re-start sync from genesis. It worked!

Coincidentally, adding this logic also enables tendermint rollback to be used repeatedly, to walk the state back by more than one block. Previously, once tendermint rollback had been used once, the "state height + 1 == blockstore height" branch of state.Rollback would always be followed, early-returning success without doing anything. This code-path now just removes the blockstore data for the "pending" block before proceeding to purge the state + blockstore of the "latest" block.

PR checklist

  • Tests written/updated, or no tests needed
  • CHANGELOG_PENDING.md updated, or no changelog entry needed
  • Updated relevant documentation (docs/) and code comments, or no
    documentation updates needed

@tsutsu tsutsu requested a review from ebuchman as a code owner August 5, 2022 00:23
@tsutsu tsutsu requested a review from a team August 5, 2022 00:23
@cmwaters
Copy link
Contributor

cmwaters commented Aug 8, 2022

Thanks for opening this PR. I think the hard-fork scenario is a valid case. I will try to review this once I get a few free cycles.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Aug 19, 2022
@cmwaters cmwaters removed the stale for use by stalebot label Aug 19, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Aug 30, 2022
@thanethomson thanethomson added S:wip Work in progress (prevents stalebot from automatically closing) and removed stale for use by stalebot labels Aug 31, 2022
@thanethomson thanethomson removed the S:wip Work in progress (prevents stalebot from automatically closing) label Sep 20, 2022
@thanethomson
Copy link
Contributor

@cmwaters did #9261 supersede this PR?

@cmwaters
Copy link
Contributor

cmwaters commented Oct 4, 2022

Yeah. I'm going to close this PR in favour of #9261.

@cmwaters cmwaters closed this Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot rollback from bad blocks received from validators who miss manual upgrades
3 participants