Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow snapshot state-sync if local snapshots exist #8679

Closed
ValentinTrinque opened this issue Jul 4, 2023 · 4 comments · Fixed by #8697
Closed

Disallow snapshot state-sync if local snapshots exist #8679

ValentinTrinque opened this issue Jul 4, 2023 · 4 comments · Fixed by #8697

Comments

@ValentinTrinque
Copy link
Contributor

To improve the snapshot initialization workflow, we first need to make the loading mode (state-sync or local store) in configuration much more strict. It should be either the state-sync, either from local storage. If both are configured then we should throw an error.

This work prepares the next step that will make the snaphsot engine load the snapshots right upon initialization.

@wwestgarth
Copy link
Contributor

wwestgarth commented Jul 5, 2023

Thinking more, I don't think we want to be that strict by looking directly at the config options because I think it will be a step back in usability. Currently you can start a node with StartHeight as -1 and statesync on, and you'll join the network. If a protocol upgrade happens you then need to do nothing at all. If we force them to swicth statesync off then when we hit the protocol upgrade it will fail, unless they take action. And validators hate taking action.

The way Tendermint handles the statesync option is:

  • if its set to true and there is no block data locally, performs statesync
  • if its set to true and there IS block data locally, it skips the statesync

So from cores point of view if we have a start-height set and statesync is on three things can happen:

  • there isn't block data, there are also no vega snapshots, tendermint will start statesync and we say OK
  • there is block data, and there are vega snapshots, tendermint will NOT statesync and core will load from the latest snapshot
  • there ISN't block data, but there ARE vega snapshots, tendermint will try to statesync.

In the last situation currently we will throw away all existing snapshots and let tendermint statesync. I believe this is the wrong thing to do and we should instead fail when Tendermint tries to statesync. It means the core node has got into an inconsistent state where they have unsafe-reset-all'd to get rid of the Tendermint data but we still have core snapshots.

And thats the root cause in all of this: we don't remove all snapshots when you do vega unsafe_reset_all. If we did (and we should) then we should never really get into situation three.

So I think we need two things here:

  • Add snapshot removal to vega unsafe_reset_all
  • If we are in a weird state where only half has been reset fail i.e in func (e *Engine) ApplySnapshotChunk() if we have local snapshots, then abort the statesync.

With the above I think we'll then be able to do the moving of the local snapshot load to a more logical place.

@ValentinTrinque
Copy link
Contributor Author

It makes a lot of sense. Let's do this instead.

@ValentinTrinque
Copy link
Contributor Author

I will repurpose this issue.

@ValentinTrinque ValentinTrinque changed the title Disallow specifying both state-sync and local storage loading in snapshot configuration Align our snapshot engine initialization on tendermint behavior Jul 5, 2023
@ValentinTrinque ValentinTrinque changed the title Align our snapshot engine initialization on tendermint behavior Align our snapshot management on tendermint behavior Jul 5, 2023
@ValentinTrinque
Copy link
Contributor Author

ValentinTrinque commented Jul 5, 2023

This check must account for the configured StartHeight. To simplify this check, we could make the StartHeight strictly positive and follow the workflow:

if localSnapshots { // so ignoring state-sync
    if startHeight == 0 {
        // Loading from latest local snapshot,
    } else {
	// Try reloading snapshot from specified height.
	// Error -> No snapshot for version XXX
    }
} else {
    if startHeight == 0 {
         // Replay the chain or state-sync if enabled. Up to tendermint to decide.
    } else {
	// Wait for state-sync to offer expected snapshot for height
    }
}

This will be implemented in #8680

@ValentinTrinque ValentinTrinque changed the title Align our snapshot management on tendermint behavior Abort snapshot state-sync if local snapshots exist Jul 5, 2023
@ValentinTrinque ValentinTrinque changed the title Abort snapshot state-sync if local snapshots exist Disallow snapshot state-sync if local snapshots exist Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants