Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one az is isolated from other az networks, and the service is unavailable for 4 minutes #12966

Closed
Lily2025 opened this issue Jul 6, 2022 · 6 comments · Fixed by #13254
Closed
Assignees
Labels
affects-6.1 severity/major type/bug Type: Issue - Confirmed a bug

Comments

@Lily2025
Copy link

Lily2025 commented Jul 6, 2022

Bug Report

What version of TiKV are you using?

./tikv-server -V
TiKV
Release Version: 6.1.0
Edition: Community
Git Commit Hash: 080d086
Git Commit Branch: heads/refs/tags/v6.1.0
UTC Build Time: 2022-06-10 11:22:39
Rust Version: rustc 1.60.0-nightly (1e12aef3f 2022-02-13)
Enable Features: jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure
Profile: dist_release

What operating system and CPU are you using?

8core、32g

Steps to reproduce

inject one az is isolated from other az networks

What did you expect?

the service is available

What did happened?

the service is unavailable for 4 minutes
image

more info see https://pingcap.feishu.cn/wiki/wikcnzK33Ck6q1BQldXOnMUMvPc

@Lily2025
Copy link
Author

Lily2025 commented Jul 6, 2022

/type bug
/severity major
/assign cosven

@ti-chi-bot
Copy link
Member

@Lily2025: GitHub didn't allow me to assign the following users: cosven.

Note that only tikv members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/type bug
/severity major
/assign cosven

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cosven
Copy link
Member

cosven commented Jul 6, 2022

This issue is in investigation. I'll update this issue when there is any update.

/assign @cosven

@cosven
Copy link
Member

cosven commented Aug 16, 2022

This bug can affects all the old versions. However, since this will only be fixed in latest LTS version (v6.1) and lator versions, so I add and remove some labels.

/remove-label may-affects-6.2
/remove-label may-affects-6.0
/remove-label may-affects-5.4
/remove-label may-affects-5.3
/remove-label may-affects-5.2
/remove-label may-affects-5.1
/remove-label may-affects-5.0
/remove-label may-affects-4.0
/remove-label affects-6.2
/remove-label affects-6.0
/remove-label affects-5.4
/remove-label affects-5.3
/remove-label affects-5.2
/remove-label affects-5.1
/remove-label affects-5.0
/remove-label affects-4.0

@VelocityLight VelocityLight added affects-6.1 affects-5.4 affects-5.3 This bug affects 5.3.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.0 This bug affects 5.0.x versions. and removed may-affects-6.1 may-affects-5.4 may-affects-5.3 may-affects-5.2 may-affects-5.1 may-affects-5.0 labels Aug 16, 2022
@ti-chi-bot
Copy link
Member

@cosven: These labels are not set on the issue: may-affects-5.4, may-affects-5.3, may-affects-5.2, may-affects-5.1, may-affects-5.0.

In response to this:

This bug can affects all the old versions. However, since this will only be fixed in latest LTS version (v6.1) and lator versions, so I add and remove some labels.

/affects-6.1
/remove-label may-affects-6.0
/remove-label may-affects-5.4
/remove-label may-affects-5.3
/remove-label may-affects-5.2
/remove-label may-affects-5.1
/remove-label may-affects-5.0
/remove-label may-affects-4.0
/remove-label affects-6.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot removed affects-6.2 affects-5.4 affects-5.3 This bug affects 5.3.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.1 This bug affects 5.1.x versions. labels Aug 16, 2022
@ti-chi-bot
Copy link
Member

@cosven: These labels are not set on the issue: may-affects-6.2, may-affects-6.0, may-affects-5.4, may-affects-5.3, may-affects-5.2, may-affects-5.1, may-affects-5.0, may-affects-4.0, affects-6.2, affects-6.0, affects-4.0.

In response to this:

This bug can affects all the old versions. However, since this will only be fixed in latest LTS version (v6.1) and lator versions, so I add and remove some labels.

/remove-label may-affects-6.2
/remove-label may-affects-6.0
/remove-label may-affects-5.4
/remove-label may-affects-5.3
/remove-label may-affects-5.2
/remove-label may-affects-5.1
/remove-label may-affects-5.0
/remove-label may-affects-4.0
/remove-label affects-6.2
/remove-label affects-6.0
/remove-label affects-5.4
/remove-label affects-5.3
/remove-label affects-5.2
/remove-label affects-5.1
/remove-label affects-5.0
/remove-label affects-4.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot removed the affects-5.0 This bug affects 5.0.x versions. label Aug 16, 2022
ti-chi-bot added a commit that referenced this issue Aug 19, 2022
…ailed (#13254)

close #12966, ref #12966

When a tikv is isolated from other tikv instances, some requests will be 
blocked in raftstore and the corresponding latches are not released. 
Following requests which require the latches will receive ServerIsBusy error
and keep retrying. However, In such case, peers on the tikv are not leader
anymore. The client is supposed to receive NotLeader error immediately.

This commit introduces fail fast mode to scheduler. When a request 
fails to acquire any latch, scheduler checks if the peer is still leader.
If it still the leader, schedule the request as usual, fail fast otherwise.

Signed-off-by: cosven <yinshaowen241@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
fengou1 pushed a commit to fengou1/tikv that referenced this issue Aug 30, 2022
…ailed (tikv#13254)

close tikv#12966, ref tikv#12966

When a tikv is isolated from other tikv instances, some requests will be
blocked in raftstore and the corresponding latches are not released.
Following requests which require the latches will receive ServerIsBusy error
and keep retrying. However, In such case, peers on the tikv are not leader
anymore. The client is supposed to receive NotLeader error immediately.

This commit introduces fail fast mode to scheduler. When a request
fails to acquire any latch, scheduler checks if the peer is still leader.
If it still the leader, schedule the request as usual, fail fast otherwise.

Signed-off-by: cosven <yinshaowen241@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Signed-off-by: fengou1 <feng.ou@pingcap.com>
ti-chi-bot pushed a commit that referenced this issue Oct 12, 2022
…ailed (#13254) (#13318)

close #12966, ref #12966, ref #13254

When a tikv is isolated from other tikv instances, some requests will be 
blocked in raftstore and the corresponding latches are not released. 
Following requests which require the latches will receive ServerIsBusy error
and keep retrying. However, In such case, peers on the tikv are not leader
anymore. The client is supposed to receive NotLeader error immediately.

This commit introduces fail fast mode to scheduler. When a request 
fails to acquire any latch, scheduler checks if the peer is still leader.
If it still the leader, schedule the request as usual, fail fast otherwise.

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Signed-off-by: cosven <yinshaowen241@gmail.com>

Co-authored-by: cosven <cosven@users.noreply.github.com>
Co-authored-by: cosven <yinshaowen241@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 severity/major type/bug Type: Issue - Confirmed a bug
Projects
None yet
4 participants