New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
one az is isolated from other az networks, and the service is unavailable for 4 minutes #12966
Comments
/type bug |
@Lily2025: GitHub didn't allow me to assign the following users: cosven. Note that only tikv members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue is in investigation. I'll update this issue when there is any update. /assign @cosven |
This bug can affects all the old versions. However, since this will only be fixed in latest LTS version (v6.1) and lator versions, so I add and remove some labels. /remove-label may-affects-6.2 |
@cosven: These labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
@cosven: These labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
…ailed (#13254) close #12966, ref #12966 When a tikv is isolated from other tikv instances, some requests will be blocked in raftstore and the corresponding latches are not released. Following requests which require the latches will receive ServerIsBusy error and keep retrying. However, In such case, peers on the tikv are not leader anymore. The client is supposed to receive NotLeader error immediately. This commit introduces fail fast mode to scheduler. When a request fails to acquire any latch, scheduler checks if the peer is still leader. If it still the leader, schedule the request as usual, fail fast otherwise. Signed-off-by: cosven <yinshaowen241@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
…ailed (tikv#13254) close tikv#12966, ref tikv#12966 When a tikv is isolated from other tikv instances, some requests will be blocked in raftstore and the corresponding latches are not released. Following requests which require the latches will receive ServerIsBusy error and keep retrying. However, In such case, peers on the tikv are not leader anymore. The client is supposed to receive NotLeader error immediately. This commit introduces fail fast mode to scheduler. When a request fails to acquire any latch, scheduler checks if the peer is still leader. If it still the leader, schedule the request as usual, fail fast otherwise. Signed-off-by: cosven <yinshaowen241@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: fengou1 <feng.ou@pingcap.com>
…ailed (#13254) (#13318) close #12966, ref #12966, ref #13254 When a tikv is isolated from other tikv instances, some requests will be blocked in raftstore and the corresponding latches are not released. Following requests which require the latches will receive ServerIsBusy error and keep retrying. However, In such case, peers on the tikv are not leader anymore. The client is supposed to receive NotLeader error immediately. This commit introduces fail fast mode to scheduler. When a request fails to acquire any latch, scheduler checks if the peer is still leader. If it still the leader, schedule the request as usual, fail fast otherwise. Signed-off-by: ti-srebot <ti-srebot@pingcap.com> Signed-off-by: cosven <yinshaowen241@gmail.com> Co-authored-by: cosven <cosven@users.noreply.github.com> Co-authored-by: cosven <yinshaowen241@gmail.com>
Bug Report
What version of TiKV are you using?
./tikv-server -V
TiKV
Release Version: 6.1.0
Edition: Community
Git Commit Hash: 080d086
Git Commit Branch: heads/refs/tags/v6.1.0
UTC Build Time: 2022-06-10 11:22:39
Rust Version: rustc 1.60.0-nightly (1e12aef3f 2022-02-13)
Enable Features: jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure
Profile: dist_release
What operating system and CPU are you using?
8core、32g
Steps to reproduce
inject one az is isolated from other az networks
What did you expect?
the service is available
What did happened?
the service is unavailable for 4 minutes
more info see https://pingcap.feishu.cn/wiki/wikcnzK33Ck6q1BQldXOnMUMvPc
The text was updated successfully, but these errors were encountered: