Skip to content

raftstore: apply res may be dropped silently #13160

@5kbpers

Description

@5kbpers

Bug Report

What version of TiKV are you using?

Versions after 4.0.

What did happened?

In #8487 we changed the way of sending ApplyRes to try_send

fn notify(&self, apply_res: Vec<ApplyRes<EK::Snapshot>>) {
for r in apply_res {
self.router.try_send(
r.region_id,
PeerMsg::ApplyRes {
res: ApplyTaskRes::Apply(r),
},
);
}
}

In extreme cases, the mailbox of PeerFsms may be full and some of ApplyRes may be lost.
Then some regions that was supposed to created by split can not be actually created here.

fn on_ready_split_region(
&mut self,
derived: metapb::Region,
regions: Vec<metapb::Region>,
new_split_regions: HashMap<u64, apply::NewSplitPeer>,
) {

And as only part of ApplyRes was lost, the range of the original region could still be changed by other admin commands. Then a region that is actually overlapped with current regions can still be created.

let mut is_overlapped = false;
let mut regions_to_destroy = vec![];
for (key, id) in meta.region_ranges.range((
Excluded(data_key(msg.get_start_key())),
Unbounded::<Vec<u8>>,
)) {

Here is an example:

  • Region A [1, 100) was splitted into region B [1, 50) and region C [50, 100), wrote peer states of B and C into KVDB, and the corresponding ApplyRes was lost so region C was not created.
  • Then region C was splitted into region D [50, 75) and region E [75, 100)
  • Region D can be created in the store with the first message, it received a snapshot and wrote its peer state to KVDB
  • Then store restarted, it would continuously panic at clear_stale_data
    fn clear_stale_data(&self, meta: &StoreMeta) -> Result<()> {
    let t = TiInstant::now();
    let mut ranges = Vec::new();
    let mut last_start_key = keys::data_key(b"");
    for region_id in meta.region_ranges.values() {
    let region = &meta.regions[region_id];
    let start_key = keys::enc_start_key(region);
    ranges.push((last_start_key, start_key));
    last_start_key = keys::enc_end_key(region);
    }
    ranges.push((last_start_key, keys::DATA_MAX_KEY.to_vec()));

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-4.0This bug affects 4.0.x versions.affects-5.0This bug affects 5.0.x versions.affects-5.1This bug affects 5.1.x versions.affects-5.2This bug affects 5.2.x versions.affects-5.3This bug affects 5.3.x versions.affects-5.4This bug affects the 5.4.x(LTS) versions.affects-6.0affects-6.1This bug affects the 6.1.x(LTS) versions.affects-6.2severity/majortype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions