Skip to content

Commit

Permalink
raft: test join to a raft cluster
Browse files Browse the repository at this point in the history
There was a bug that a new replica at join to a Raft cluster
sometimes tried to register on a non-leader node which couldn't
write to _cluster, so the join failed with ER_READONLY error.

Now in scope of #5613 the algorithm of join-master selection is
changed. A new node looks for writable members of the cluster to
use a join-master. It will not choose a follower if there is a
leader.

Closes #6127
  • Loading branch information
Gerold103 committed Jun 6, 2021
1 parent 39e7b39 commit c5eaf86
Show file tree
Hide file tree
Showing 6 changed files with 187 additions and 0 deletions.
4 changes: 4 additions & 0 deletions changelogs/unreleased/gh-6127-raft-join-new.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## bugfix/raft

* Fixed an error when a new replica in a Raft cluster could try to join from a
follower instead of a leader and failed with an error `ER_READONLY` (gh-6127).
15 changes: 15 additions & 0 deletions test/replication/gh-6127-master1.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env tarantool

require('console').listen(os.getenv('ADMIN'))
box.cfg({
listen = 'unix/:./master1.sock',
replication = {
'unix/:./master1.sock',
'unix/:./master2.sock'
},
election_mode = 'candidate',
election_timeout = 0.1,
instance_uuid = '10f9828d-b5d5-46a9-b698-ddac7cce5e27',
})
box.ctl.wait_rw()
box.schema.user.grant('guest', 'super')
13 changes: 13 additions & 0 deletions test/replication/gh-6127-master2.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env tarantool

require('console').listen(os.getenv('ADMIN'))
box.cfg({
listen = 'unix/:./master2.sock',
replication = {
'unix/:./master1.sock',
'unix/:./master2.sock'
},
election_mode = 'voter',
election_timeout = 0.1,
instance_uuid = '20f9828d-b5d5-46a9-b698-ddac7cce5e27',
})
105 changes: 105 additions & 0 deletions test/replication/gh-6127-raft-join-new.result
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
-- test-run result file version 2
test_run = require('test_run').new()
| ---
| ...

--
-- gh-6127: the algorithm selecting a node from which to join to a replicaset
-- should take into account who is the leader (is writable and can write to
-- _cluster) and who is a follower/candidate.
--
test_run:cmd('create server master1 with script="replication/gh-6127-master1.lua"')
| ---
| - true
| ...
test_run:cmd('start server master1 with wait=False')
| ---
| - true
| ...
test_run:cmd('create server master2 with script="replication/gh-6127-master2.lua"')
| ---
| - true
| ...
test_run:cmd('start server master2')
| ---
| - true
| ...

test_run:switch('master1')
| ---
| - true
| ...
box.cfg{election_mode = 'voter'}
| ---
| ...
test_run:switch('master2')
| ---
| - true
| ...
-- Perform manual election because it is faster - the automatic one still tries
-- to wait for 'death timeout' first which is several seconds.
box.cfg{ \
election_mode = 'manual', \
election_timeout = 0.1, \
}
| ---
| ...
box.ctl.promote()
| ---
| ...
box.ctl.wait_rw()
| ---
| ...
-- Make sure the other node received the promotion row. Vclocks now should be
-- equal so the new node would select only using read-only state and min UUID.
test_run:wait_lsn('master1', 'master2')
| ---
| ...

-- Min UUID is master1, but it is not writable. Therefore must join from
-- master2.
test_run:cmd('create server replica with script="replication/gh-6127-replica.lua"')
| ---
| - true
| ...
test_run:cmd('start server replica')
| ---
| - true
| ...
test_run:switch('replica')
| ---
| - true
| ...
assert(box.info.leader ~= 0)
| ---
| - true
| ...

test_run:switch('default')
| ---
| - true
| ...
test_run:cmd('stop server replica')
| ---
| - true
| ...
test_run:cmd('delete server replica')
| ---
| - true
| ...
test_run:cmd('stop server master2')
| ---
| - true
| ...
test_run:cmd('delete server master2')
| ---
| - true
| ...
test_run:cmd('stop server master1')
| ---
| - true
| ...
test_run:cmd('delete server master1')
| ---
| - true
| ...
41 changes: 41 additions & 0 deletions test/replication/gh-6127-raft-join-new.test.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
test_run = require('test_run').new()

--
-- gh-6127: the algorithm selecting a node from which to join to a replicaset
-- should take into account who is the leader (is writable and can write to
-- _cluster) and who is a follower/candidate.
--
test_run:cmd('create server master1 with script="replication/gh-6127-master1.lua"')
test_run:cmd('start server master1 with wait=False')
test_run:cmd('create server master2 with script="replication/gh-6127-master2.lua"')
test_run:cmd('start server master2')

test_run:switch('master1')
box.cfg{election_mode = 'voter'}
test_run:switch('master2')
-- Perform manual election because it is faster - the automatic one still tries
-- to wait for 'death timeout' first which is several seconds.
box.cfg{ \
election_mode = 'manual', \
election_timeout = 0.1, \
}
box.ctl.promote()
box.ctl.wait_rw()
-- Make sure the other node received the promotion row. Vclocks now should be
-- equal so the new node would select only using read-only state and min UUID.
test_run:wait_lsn('master1', 'master2')

-- Min UUID is master1, but it is not writable. Therefore must join from
-- master2.
test_run:cmd('create server replica with script="replication/gh-6127-replica.lua"')
test_run:cmd('start server replica')
test_run:switch('replica')
assert(box.info.leader ~= 0)

test_run:switch('default')
test_run:cmd('stop server replica')
test_run:cmd('delete server replica')
test_run:cmd('stop server master2')
test_run:cmd('delete server master2')
test_run:cmd('stop server master1')
test_run:cmd('delete server master1')
9 changes: 9 additions & 0 deletions test/replication/gh-6127-replica.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env tarantool

require('console').listen(os.getenv('ADMIN'))
box.cfg({
replication = {
'unix/:./master1.sock',
'unix/:./master2.sock'
},
})

0 comments on commit c5eaf86

Please sign in to comment.