Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raftstore: return error when read index is dropped due to invaild leader #5316

Merged
merged 8 commits into from Aug 28, 2019

Conversation

@5kbpers
Copy link
Contributor

5kbpers commented Aug 22, 2019

Signed-off-by: 5kbpers tangminghua@pingcap.com

What have you changed?

When the leader id is INVAILD_ID in a replica, the read index request will be dropped by the raft group, see https://github.com/pingcap/raft-rs/blob/0632bcfc1ce4cd60812959758b7c8645f8d42458/src/raft.rs#L1968-L1979.
This PR adds a condition for returning an error directly in that situation, to fix endless waiting for the response of read index.

What is the type of the changes?

  • Bugfix (a change which fixes an issue)

How is the PR tested?

Integration test

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No.

Does this PR affect tidb-ansible?

No.

Signed-off-by: 5kbpers <tangminghua@pingcap.com>
@5kbpers 5kbpers requested review from overvenus and nolouch Aug 22, 2019
Signed-off-by: 5kbpers <tangminghua@pingcap.com>
@@ -1884,6 +1884,9 @@ impl Peer {
if self.is_merging() {
return Err(box_err!("can not read index due to merge"));
}
if !self.is_leader() && self.leader_id() == INVALID_ID {

This comment has been minimized.

Copy link
@zhangjinpeng1987

zhangjinpeng1987 Aug 23, 2019

Member

How did you find this bug? Better to cover it in schrodinger tests.

This comment has been minimized.

Copy link
@zhangjinpeng1987

zhangjinpeng1987 Aug 23, 2019

Member

if the peer is not leader you can return error directly, why it must be equal to 0?

This comment has been minimized.

Copy link
@5kbpers

5kbpers Aug 23, 2019

Author Contributor

if the peer is not leader you can return error directly, why it must be equal to 0?

A replica will set leader_id to INVALID_ID when election timeout (which shows no leader).

This comment has been minimized.

Copy link
@overvenus

overvenus Aug 23, 2019

Contributor

Could you add a reference of how raft behaves on readindex if there is no leader?

@BusyJay

This comment has been minimized.

Copy link
Contributor

BusyJay commented Aug 23, 2019

Better add more check to

let last_pending_read_count = self.raft_group.raft.pending_read_count();
let last_ready_read_count = self.raft_group.raft.ready_read_count();
let id = Uuid::new_v4();
self.raft_group.read_index(id.as_bytes().to_vec());
let pending_read_count = self.raft_group.raft.pending_read_count();
let ready_read_count = self.raft_group.raft.ready_read_count();
if pending_read_count == last_pending_read_count
&& ready_read_count == last_ready_read_count
&& self.is_leader()
{
// The message gets dropped silently, can't be handled anymore.
apply::notify_stale_req(self.term(), cb);
return false;
}

instead.

Signed-off-by: 5kbpers <tangminghua@pingcap.com>
@nolouch

This comment has been minimized.

Copy link
Contributor

nolouch commented Aug 26, 2019

PTAL @BusyJay

// cause a long time waiting for a read response. Then we should return an error directly
// in this situation.
if !self.is_leader() && self.leader_id() == INVALID_ID {
cmd_resp::bind_error(

This comment has been minimized.

Copy link
@Connor1996

Connor1996 Aug 27, 2019

Member

maybe update related metrics?

This comment has been minimized.

Copy link
@5kbpers

5kbpers Aug 28, 2019

Author Contributor

maybe update related metrics?

Should it be a type of unsafe_read_index?

This comment has been minimized.

Copy link
@Connor1996

Connor1996 Aug 28, 2019

Member

not for sure, add a new one? @overvenus What's your opinion?

This comment has been minimized.

Copy link
@overvenus

overvenus Aug 28, 2019

Contributor

What about no_leader or follower_read_no_leader? You can add it to RaftInvalidProposeMetrics.

This comment has been minimized.

Copy link
@5kbpers

5kbpers Aug 28, 2019

Author Contributor

OK, I will add a read_index_no_leader there.

nolouch and others added 3 commits Aug 28, 2019
Signed-off-by: 5kbpers <tangminghua@pingcap.com>
@5kbpers 5kbpers dismissed stale reviews from nolouch and overvenus via a9bdcd6 Aug 28, 2019
Signed-off-by: 5kbpers <tangminghua@pingcap.com>
Copy link
Member

Connor1996 left a comment

LGTM

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Aug 28, 2019

Your auto merge job has been accepted, waiting for #5342

@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Aug 28, 2019

/run-all-tests

@sre-bot sre-bot merged commit b9a806d into tikv:master Aug 28, 2019
6 checks passed
6 checks passed
DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-cop-push-down-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details
@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Aug 28, 2019

cherry pick to release-3.0 failed

YangKeao added a commit to YangKeao/tikv that referenced this pull request Sep 5, 2019
sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
…der (tikv#5316)

Signed-off-by: 5kbpers <tangminghua@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.