Make rollback command usable to fix missed hard upgrades #9171
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #9174.
This PR addresses this edge case, by making the
tendermint rollback
command also remove theblockstore
data for the block from which the invalid app-state was derived. This fixes cases where it's not the application's execution of the block that's incorrect, but the block itself that's invalid. (Presumably this condition only ever pertains in a hard-fork scenario — i.e. when a block that would be valid under node-software version N, becomes invalid under node-software version N+1; and such a block is produced by a validator who hasn't upgraded to version N+1.)This code has been smoke-tested in a real-world use-case: I wrote this code "in anger", as a hotpatch to fix an evmos node that was experiencing exactly the problem described in #9174, so that the node could resume sync rather than needing to re-start sync from genesis. It worked!
Coincidentally, adding this logic also enables
tendermint rollback
to be used repeatedly, to walk the state back by more than one block. Previously, oncetendermint rollback
had been used once, the "state height + 1 == blockstore height" branch ofstate.Rollback
would always be followed, early-returning success without doing anything. This code-path now just removes the blockstore data for the "pending" block before proceeding to purge the state + blockstore of the "latest" block.PR checklist
CHANGELOG_PENDING.md
updated, or no changelog entry neededdocs/
) and code comments, or nodocumentation updates needed