Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] fix db checkpoint async bug #2982

Closed

Conversation

yontyon
Copy link
Contributor

@yontyon yontyon commented Apr 3, 2023

  • Problem Overview
    The current implementation of the db checkpoint feature has a synchronization bug:
    While we take the db checkpoint in the background, we don't align anything with the checkpoint sequence number, i.e. the block number, bft metadata, pending reserved pages, and more.
    Once we put more client requests in a different resolution than 150 we start to see a wide set of issues:
    For example:
    1. The recovered replica won't start from a stable checkpoint, instead, it starts from the point where the db checkpoint was taken.
    2. In a case where the checkpoint was taken in the middle of another execution phase, we won't have the pending reserved pages to recover correctly.
    3. We trim the block at the point where the db checkpoint was taken, but we don't update the bft metadata accordingly.
      Below is an example of part of these issues:
      On replica 0, block 302 was created on sequence number 304
0|03-04-2023 10:48:24.125|INFO |skvbctest.replica|post-execution-thread|||304||executeWriteCommand|L:482|ConditionalWrite message handled; writesCounter=302 currBlock=302 | [SQ:1215]

However, on recovery, the recovered replica has the same block was created on sequence number 305:

5|03-04-2023 10:48:34.293|INFO |skvbctest.replica|post-execution-thread|||305||executeWriteCommand|L:482|ConditionalWrite message handled; writesCounter=1 currBlock=302 | [SQ:3]

This PR proposes a fix, in which, we pin the bft sequence number before starting the async part, and align everything accordingly.

This PR doesn't handle the case of explicitly creating db checkpoint by the operator, as it assumes to be used for clients only (which cares only about the blockchain)

  • Testing Done
    CI + Changing an existing test to verify the changes

@yontyon yontyon requested a review from a team as a code owner April 3, 2023 10:45
@@ -109,7 +109,19 @@ class ReplicaBlockchain : public IBlocksDeleter,
std::optional<categorization::Updates> getBlockUpdates(BlockId block_id) const override final {
return reader_->getBlockUpdates(block_id);
}

// find the first block which has the given sequence number in its metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can multiple blocks have the same sequence number in their metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but they will all be placed sequentially one after another, so as long as we stop on the first one it's ok

@@ -934,7 +934,7 @@ async def test_restore_from_snapshot_of_other(self, bft_network, tracker):

crashed_replica = list(bft_network.random_set_of_replicas(1, {initial_prim}))[0]
bft_network.stop_replica(crashed_replica)
await skvbc.send_n_kvs_sequentially(DB_CHECKPOINT_WIN_SIZE) # run till the next checkpoint
await skvbc.send_n_kvs_sequentially(DB_CHECKPOINT_WIN_SIZE + 100) # run till the next checkpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe int(DB_CHECKPOINT_WIN_SIZE * 1.5) so that if we change this constant the test remains valid?

@WildFireFlum
Copy link
Contributor

WildFireFlum commented Apr 3, 2023

This PR proposes a fix, in which, we pin the bft sequence number before starting the async part, and align everything accordingly.
Can you detail what data you persist differently from before and why?

@yontyon yontyon force-pushed the trim-db-checkpoint-to-latest-checkpoint branch from c1719ad to b7e4591 Compare April 13, 2023 11:40
auto new_snap_shot = RecoverySnapshot{&native_client_->rawDB()};
chkpnt_snap_shots_[last_reachable_id] = new_snap_shot.get();
chkpnt_snap_shots_[block_id_at_chkpnt] = new_snap_shot.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect; the newly created snapshot represents the current state of the blockchain i.e. the state of the latest block id, while the change assumes that it represents a block id at the checkpoint, which is a block in the past.

@yontyon yontyon closed this Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants