Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how the leader election compares to raft? #5

Closed
benoitc opened this issue Aug 11, 2014 · 2 comments
Closed

how the leader election compares to raft? #5

benoitc opened this issue Aug 11, 2014 · 2 comments

Comments

@benoitc
Copy link
Contributor

benoitc commented Aug 11, 2014

I am curious how the implementation compares with raft? Did you have a look on it?

Also can we add dynamically a member and remove it from the cluster? How many participants can it handle?

@uwiger
Copy link
Owner

uwiger commented Jan 13, 2015

Sorry, I missed this one.

The Raft leader election protocol is timeout-based, and the paper recommends using conservative timeout values of ca 150-300 ms. In their own tests, this leads to leader election times of 100 - 300 ms. The locks_leader algorithm is event-triggered, and converges much more quickly: when I've tested 5-node scenarios on one machine, convergence time seems to be around 10-20 ms*.

The Raft replication mechanism should be possible to implement on top of locks_leader. A possibility then would be to make use of the {surrender, Other, ModSt} reply if the leader elected by locks decides that another candidate would be a better leader (e.g. a higher "term" after a netsplit). This surrender operation uses the basic lock surrender supported by locks, and basically means that the leader hands its locks to another candidate, making it the leader.

  • Granted, as locks makes use of Erlang monitors, a worst case is that the communication link to the leader dies, and the net_tick timer must release before the old leader is declared dead. However, the argument could be made that it's better to do it this way in Erlang, since this is consistent with other applications. It can be problematic if different applications use different strategies to determine the state of the network, and potentially come to different conclusions, or have significantly different response times to failures.

Members can be dynamically added and removed. Note, however, that if you use Raft replication on top of locks_leader, Raft has some restrictions on how membership changes are made (called "joint consensus"). In the case of locks_leader electing the leader, there won't ever be two leaders unless there's a netsplit, but specifically for the netsplit case, the leaders need to know what constitutes a majority. Therefore, the Raft callback would still need to commit the membership changes as recommended by the paper.

I haven't run any max configuration tests with locks, but it is designed to be conservative in the number of messages sent. Each candidate tries to lock every involved node, and especially after netsplits, there will be lock surrender actions...

In a unit test with 5 nodes and a netsplit, the two islands (2 and 3 nodes respectively) had converged after 13-14 ms. After joining the networks, all nodes had converged ca 18 ms after the last nodeup arrived (which in itself took 42 ms after the first nodeup). Some 56 surrender actions were performed. ;-)

@benoitc
Copy link
Contributor Author

benoitc commented May 16, 2015

Sorry I missed your answer. It's really informative and actually give me some inspiration. Thanks!

@benoitc benoitc closed this as completed May 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants