Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A 12GB region is not splitted automatically, cause TiSpark's failure due to too large coproc response. #15863

Closed
tonyxuqqi opened this issue Oct 27, 2023 · 2 comments · Fixed by #15900
Assignees
Labels
affects-7.5 severity/moderate type/bug Type: Issue - Confirmed a bug user_report The issue is reported by real TiKV user from their environment.

Comments

@tonyxuqqi
Copy link
Contributor

tonyxuqqi commented Oct 27, 2023

Bug Report

What version of TiKV are you using?

v6.5

What operating system and CPU are you using?

Steps to reproduce

Import many SST files to a region to make it larger than region-max-size.

What did you expect?

The region can be split correctly

What did happened?

The region is not split.

Logs:
tikv-2023-10-21T04-19-59.819.log:{"level":"INFO","caller":"peer.rs:5773","message":"on split","time":"2023/10/21 04:06:55.797 +00:00","source":"split checker","split_keys":"10 keys range from ? to ?","peer_id":69583985,"region_id":69583975}
tikv-2023-10-21T04-19-59.819.log:{"level":"INFO","caller":"pd.rs:1098","message":"try to batch split region","time":"2023/10/21 04:06:55.798 +00:00","task":"batch_split","region":"id: 69583975 start_key: ? end_key: ? region_epoch { conf_ver: 302 version: 5278 } peers { id: 69583977 store_id: 29 } peers { id: 69583979 store_id: 68611552 } peers { id: 69583985 store_id: 36 } peers { id: 69583986 store_id: 12007 role: Learner }","new_region_ids":"[new_region_id: 77356595 new_peer_ids: 77356596 new_peer_ids: 77356597 new_peer_ids: 77356598 new_peer_ids: 77356599, new_region_id: 77356600 new_peer_ids: 77356601 new_peer_ids: 77356602 new_peer_ids: 77356603 new_peer_ids: 77356604, new_region_id: 77356605 new_peer_ids: 77356606 new_peer_ids: 77356607 new_peer_ids: 77356608 new_peer_ids: 77356609, new_region_id: 77356610 new_peer_ids: 77356611 new_peer_ids: 77356612 new_peer_ids: 77356613 new_peer_ids: 77356614, new_region_id: 77356615 new_peer_ids: 77356616 new_peer_ids: 77356617 new_peer_ids: 77356618 new_peer_ids: 77356619, new_region_id: 77356620 new_peer_ids: 77356621 new_peer_ids: 77356622 new_peer_ids: 77356623 new_peer_ids: 77356624, new_region_id: 77356625 new_peer_ids: 77356626 new_peer_ids: 77356627 new_peer_ids: 77356628 new_peer_ids: 77356629, new_region_id: 77356630 new_peer_ids: 77356631 new_peer_ids: 77356632 new_peer_ids: 77356633 new_peer_ids: 77356634, new_region_id: 77356635 new_peer_ids: 77356636 new_peer_ids: 77356637 new_peer_ids: 77356638 new_peer_ids: 77356639, new_region_id: 77356640 new_peer_ids: 77356641 new_peer_ids: 77356642 new_peer_ids: 77356643 new_peer_ids: 77356644]","region_id":69583975}
tikv-2023-10-23T17-57-02.378.log:{"level":"INFO","caller":"peer.rs:4770","message":"propose conf change peer","time":"2023/10/21 04:34:35.597 +00:00","kind":"Simple","changes":"[change_type: AddLearnerNode peer { id: 77887276 store_id: 27 role: Learner }]","peer_id":69583985,"region_id":69583975}

As we can see the split check runs, but some how the batch split does not go through, probably because propose failure for some reason. Right now we don't log the propose failure so the exact reason of propose failure is unknown, but generally propose failure is normal scenario.
What's not normal in this case is that the split check should run again later to make sure the region get splitted.

@tonyxuqqi tonyxuqqi added type/bug Type: Issue - Confirmed a bug severity/moderate user_report The issue is reported by real TiKV user from their environment. labels Oct 27, 2023
@tonyxuqqi tonyxuqqi self-assigned this Oct 27, 2023
@tonyxuqqi
Copy link
Contributor Author

The reason is that we reset the self.fsm.peer.size_diff_hint after a split check is sucessfully scheduled. However, that split check may fail or the split generated from the split check may fail.
If there's no more update on that region, the region won't be splitted automatically.

@bufferflies
Copy link
Contributor

bufferflies commented Nov 2, 2023

The reason is that we reset the self.fsm.peer.size_diff_hint after a split check is sucessfully scheduled. However, that split check may fail or the split generated from the split check may fail. If there's no more update on that region, the region won't be splitted automatically.
so we can reset this flag(size_diff_hint ) only if the update info has notified.
The condition maybe change, the region size also needs to be considered.

ti-chi-bot bot pushed a commit that referenced this issue Nov 7, 2023
close #15863

Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
ti-chi-bot bot added a commit that referenced this issue Nov 8, 2023
close #15863

Signed-off-by: Qi Xu <tonyxuqqi@outlook.com>
Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>

Co-authored-by: Qi Xu <tonyxuqqi@outlook.com>
Co-authored-by: tonyxuqqi <tonyxuqi@outlook.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 severity/moderate type/bug Type: Issue - Confirmed a bug user_report The issue is reported by real TiKV user from their environment.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants