A 12GB region is not splitted automatically, cause TiSpark's failure due to too large coproc response. #15863
Labels
affects-7.5
severity/moderate
type/bug
Type: Issue - Confirmed a bug
user_report
The issue is reported by real TiKV user from their environment.
Bug Report
What version of TiKV are you using?
v6.5
What operating system and CPU are you using?
Steps to reproduce
Import many SST files to a region to make it larger than region-max-size.
What did you expect?
The region can be split correctly
What did happened?
The region is not split.
Logs:
tikv-2023-10-21T04-19-59.819.log:{"level":"INFO","caller":"peer.rs:5773","message":"on split","time":"2023/10/21 04:06:55.797 +00:00","source":"split checker","split_keys":"10 keys range from ? to ?","peer_id":69583985,"region_id":69583975}
tikv-2023-10-21T04-19-59.819.log:{"level":"INFO","caller":"pd.rs:1098","message":"try to batch split region","time":"2023/10/21 04:06:55.798 +00:00","task":"batch_split","region":"id: 69583975 start_key: ? end_key: ? region_epoch { conf_ver: 302 version: 5278 } peers { id: 69583977 store_id: 29 } peers { id: 69583979 store_id: 68611552 } peers { id: 69583985 store_id: 36 } peers { id: 69583986 store_id: 12007 role: Learner }","new_region_ids":"[new_region_id: 77356595 new_peer_ids: 77356596 new_peer_ids: 77356597 new_peer_ids: 77356598 new_peer_ids: 77356599, new_region_id: 77356600 new_peer_ids: 77356601 new_peer_ids: 77356602 new_peer_ids: 77356603 new_peer_ids: 77356604, new_region_id: 77356605 new_peer_ids: 77356606 new_peer_ids: 77356607 new_peer_ids: 77356608 new_peer_ids: 77356609, new_region_id: 77356610 new_peer_ids: 77356611 new_peer_ids: 77356612 new_peer_ids: 77356613 new_peer_ids: 77356614, new_region_id: 77356615 new_peer_ids: 77356616 new_peer_ids: 77356617 new_peer_ids: 77356618 new_peer_ids: 77356619, new_region_id: 77356620 new_peer_ids: 77356621 new_peer_ids: 77356622 new_peer_ids: 77356623 new_peer_ids: 77356624, new_region_id: 77356625 new_peer_ids: 77356626 new_peer_ids: 77356627 new_peer_ids: 77356628 new_peer_ids: 77356629, new_region_id: 77356630 new_peer_ids: 77356631 new_peer_ids: 77356632 new_peer_ids: 77356633 new_peer_ids: 77356634, new_region_id: 77356635 new_peer_ids: 77356636 new_peer_ids: 77356637 new_peer_ids: 77356638 new_peer_ids: 77356639, new_region_id: 77356640 new_peer_ids: 77356641 new_peer_ids: 77356642 new_peer_ids: 77356643 new_peer_ids: 77356644]","region_id":69583975}
tikv-2023-10-23T17-57-02.378.log:{"level":"INFO","caller":"peer.rs:4770","message":"propose conf change peer","time":"2023/10/21 04:34:35.597 +00:00","kind":"Simple","changes":"[change_type: AddLearnerNode peer { id: 77887276 store_id: 27 role: Learner }]","peer_id":69583985,"region_id":69583975}
As we can see the split check runs, but some how the batch split does not go through, probably because propose failure for some reason. Right now we don't log the propose failure so the exact reason of propose failure is unknown, but generally propose failure is normal scenario.
What's not normal in this case is that the split check should run again later to make sure the region get splitted.
The text was updated successfully, but these errors were encountered: