New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seems region leaders can lose after one tikv instance fail #14547
Labels
Comments
I think the root cause is when a TiKV node fails, other nodes can still dispatch Raft messages from it, after #14574 adds a test case for this. Here is its log:
|
hicqu
added
type/bug
Type: Issue - Confirmed a bug
severity/moderate
affects-6.5
affects-7.0
labels
Apr 14, 2023
ti-chi-bot
added a commit
that referenced
this issue
Apr 21, 2023
ref #14547 raft: peers shouldn't hibernate incorrectly when one node fails Signed-off-by: qupeng <qupeng@pingcap.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Apr 21, 2023
ref tikv#14547 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
lidezhu
pushed a commit
to lidezhu/tikv
that referenced
this issue
Apr 27, 2023
…#14574) ref tikv#14547 raft: peers shouldn't hibernate incorrectly when one node fails Signed-off-by: qupeng <qupeng@pingcap.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lidezhu <lidezhu@pingcap.com>
This was referenced Oct 30, 2023
This was referenced Oct 31, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Bug Report
What version of TiKV are you using?
TiKV
Release Version: 7.1.0-alpha
Edition: Community
Git Commit Hash: abb672b
Git Commit Branch: heads/refs/tags/v7.1.0-alpha
UTC Build Time: 2023-04-08 14:33:07
Rust Version: rustc 1.67.0-nightly (96ddd32c4 2022-11-14)
Enable Features: pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure
Profile: dist_release
What operating system and CPU are you using?
Steps to reproduce
pre-split = 8
, so that we get 256K Regions.What did you expect?
Changefeed checkpoint lag may increase, but should decrease after a while.
What did happened?
Changefeed lag keeps increasing about 1 hour.
This may contains several issues. Let's focus on this one: seems some region leaders are lost in about 10min.
The text was updated successfully, but these errors were encountered: