Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve election stability and reproducibility #8699

Merged
merged 3 commits into from May 29, 2023

Conversation

sergepetrenko
Copy link
Collaborator

@sergepetrenko sergepetrenko commented May 24, 2023

Fix a couple of bugs regarding election stability:

  1. Make candidates write and broadcast term and vote for self atomically. This reduces possible split-vote scenarios where other candidates vote for themselves unnecessarily.
  2. Do the same for box.ctl.promote()
  3. Fix a bug in split-vote detector.

Closes #8497
Closes #8698

@sergepetrenko sergepetrenko requested a review from a team as a code owner May 24, 2023 13:57
@coveralls
Copy link

coveralls commented May 24, 2023

Coverage Status

Coverage: 85.765% (-0.008%) from 85.773% when pulling 9e3521c on sergepetrenko:gh-8497-atomic-promote into e64568b
on tarantool:master
.

Copy link
Contributor

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you patch. LGTM

box_cfg = cg.box_cfg,
}
cg.box_cfg.election_mode = 'voter'
cg.server2 =cg.replica_set:build_and_add_server{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space after '='

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed.

@@ -1211,6 +1211,7 @@ raft_promote(struct raft *raft)
return;
raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
raft_start_candidate(raft);
raft_sm_schedule_new_vote(raft, raft->self, raft->vclock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use raft_sm_schedule_new_election here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. We have to do raft_sm_schedule_new_term before raft_start_candidate for the sake of 'manual' nodes. Otherwise raft_sm_schedule_new_term would restore a manual node to non-candidate again.
At the same time we have to do raft_sm_schedule_new_vote after raft_start_candidate (again, for the manual nodes).

On the other hand I can do this, because raft_start_candidate is a no-op for a real candidate. I'm not sure it's worth it though. What do you think? cc: @Gerold103

diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index f31cee5b1..e0b073b5c 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -959,8 +959,7 @@ raft_sm_schedule_new_election(struct raft *raft)
        say_info("RAFT: begin new election round");
        assert(raft->is_cfg_candidate);
        /* Everyone is a follower until its vote for self is persisted. */
-       raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
-       raft_sm_schedule_new_vote(raft, raft->self, raft->vclock);
+       raft_promote(raft);
 }
 
 static void

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation, I see now, that raft_start_candidate has to be exactly between term and vote when manual promote is called. And I would rather not apply change you proposed, as indeed raft_start_candidate won't have any effect in automatic elections, and it would make the code even harder to understand, I'm on the side of explicit call of new_term and new_vote here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't like this hunk either. Let's revert it then:

diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index e0b073b5c..f31cee5b1 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -959,7 +959,8 @@ raft_sm_schedule_new_election(struct raft *raft)
        say_info("RAFT: begin new election round");
        assert(raft->is_cfg_candidate);
        /* Everyone is a follower until its vote for self is persisted. */
-       raft_promote(raft);
+       raft_sm_schedule_new_term(raft, raft->volatile_term + 1);
+       raft_sm_schedule_new_vote(raft, raft->self, raft->vclock);
 }
 
 static void

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Just one comment.

@@ -1,4 +1,4 @@
## bugfix/replication

* Fixed possible failure to promote the desired node by `box.ctl.promote()` on
* Fixed a possible failure to promote the desired node by `box.ctl.promote()` on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change better be in the previous commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing! Fixed.

Commit c9155ac ("raft: persist new term and vote separately") made
the nodes persist new term and vote separately, using 2 WAL writes.
Writing the term first is needed to flush all the ongoing transactions,
so that the node's vclock is updated and can be checked against the
candidate's vclock. Otherwise it could happen that the node persists a
vote for some candidate only to find that it's vclock would actually
become incomparable with the candidate's.

Actually, this guard is not needed when checking a vote for self,
because a node can always vote for self. Besides, splitting term bump
and vote can lead to increased probability of split-vote. It may happen
that a candidate bumps and broadcasts the new term without a vote,
making other nodes vote for self. Let's go back to writing term and vote
together for self votes.

This change makes raft candidate persist term bump and vote for self in
one WAL write instead of two, so all the tests which count WAL writes or
expect 2 separate state updates for term and vote are rewritten.

Prerequisite tarantool#8497

NO_DOC=not user-visible
NO_CHANGELOG=not user-visible
box.ctl.promote() was implemented as follows: an instance bumps the
term and marks itself a candidate, but doesn't vote for self
immediately. Instead it relies on the machinery which makes a candidate
vote for self as soon as it persists a new term.

This differs from a normal election start due to leader timeout: there
term and vote are bumped at once.

Besides, this increases probability of box.ctl.promote() resulting in
other node getting elected: if a node first broadcasts a term without a
vote, it is not considered a candidate, so other candidates might start
elections and vote for themselves.

Let's bring promote into line with automatic elections.

Closes tarantool#8497

NO_DOC=bugfix
Due to a typo raft candidate counted a vote for another node as a vote
for self in its split-vote detector. This could lead to spurious
split-vote detection in cases when another node wins elections with a bare
minimum of votes for it (exactly a quorum of votes).

Closes tarantool#8698

NO_DOC=bugfix
@sergepetrenko sergepetrenko added the full-ci Enables all tests for a pull request label May 29, 2023
@sergepetrenko sergepetrenko merged commit 2afde5b into tarantool:master May 29, 2023
143 checks passed
@sergepetrenko
Copy link
Collaborator Author

Cherry-picked to 2.11: 0cbe616 .. f796936

@sergepetrenko
Copy link
Collaborator Author

Cherry-picked to 2.10: 657e3f9 .. 3e0229f

@sergepetrenko sergepetrenko deleted the gh-8497-atomic-promote branch May 29, 2023 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-ci Enables all tests for a pull request
Projects
None yet
5 participants