New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] fix db checkpoint async bug #2982
[WIP] fix db checkpoint async bug #2982
Conversation
@@ -109,7 +109,19 @@ class ReplicaBlockchain : public IBlocksDeleter, | |||
std::optional<categorization::Updates> getBlockUpdates(BlockId block_id) const override final { | |||
return reader_->getBlockUpdates(block_id); | |||
} | |||
|
|||
// find the first block which has the given sequence number in its metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can multiple blocks have the same sequence number in their metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but they will all be placed sequentially one after another, so as long as we stop on the first one it's ok
@@ -934,7 +934,7 @@ async def test_restore_from_snapshot_of_other(self, bft_network, tracker): | |||
|
|||
crashed_replica = list(bft_network.random_set_of_replicas(1, {initial_prim}))[0] | |||
bft_network.stop_replica(crashed_replica) | |||
await skvbc.send_n_kvs_sequentially(DB_CHECKPOINT_WIN_SIZE) # run till the next checkpoint | |||
await skvbc.send_n_kvs_sequentially(DB_CHECKPOINT_WIN_SIZE + 100) # run till the next checkpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe int(DB_CHECKPOINT_WIN_SIZE * 1.5) so that if we change this constant the test remains valid?
|
c1719ad
to
b7e4591
Compare
auto new_snap_shot = RecoverySnapshot{&native_client_->rawDB()}; | ||
chkpnt_snap_shots_[last_reachable_id] = new_snap_shot.get(); | ||
chkpnt_snap_shots_[block_id_at_chkpnt] = new_snap_shot.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect; the newly created snapshot represents the current state of the blockchain i.e. the state of the latest block id, while the change assumes that it represents a block id at the checkpoint, which is a block in the past.
The current implementation of the db checkpoint feature has a synchronization bug:
While we take the db checkpoint in the background, we don't align anything with the checkpoint sequence number, i.e. the block number, bft metadata, pending reserved pages, and more.
Once we put more client requests in a different resolution than 150 we start to see a wide set of issues:
For example:
Below is an example of part of these issues:
On replica 0, block 302 was created on sequence number 304
However, on recovery, the recovered replica has the same block was created on sequence number 305:
This PR proposes a fix, in which, we pin the bft sequence number before starting the async part, and align everything accordingly.
This PR doesn't handle the case of explicitly creating db checkpoint by the operator, as it assumes to be used for clients only (which cares only about the blockchain)
CI + Changing an existing test to verify the changes