Skip to content

Commit

Permalink
replication: prefer to join from booted replicas
Browse files Browse the repository at this point in the history
The algorithm of looking for an instance to join the replicaset
from didn't take into account that some of the instances might be
not bootstrapped but still perfectly available.

As a result, a ridiculous situation could happen - an instance
could connect to a cluster with just read-only instances, but it
could have itself with box.cfg{read_only = false}. Then instead of
failing or waiting it just booted a brand new cluster. And after
that the node just started complaining about the others having a
different replicaset UUID.

The patch makes so a new instance always prefers a bootstrapped
join-source to a non-boostrapped one, including self. In the
situation above the new instance now terminates with an error.

In future hopefully it should start a retry-loop instead.

Closes #5613

@TarantoolBot document
Title: IPROTO_BALLOT rework and a new field

A couple of fields in `IPROTO_BALLOT 0x29` used to have values not
matching with their names. They are changed.

* `IPROTO_BALLOT_IS_RO 0x01` used to mean "the instance has
  `box.cfg{read_only = true}`". It was renamed in the source code
  to `IPROTO_BALLOT_IS_RO_CFG`. It has the same code `0x01`, and
  the value is the same. Only the name has changed, and in the doc
  should be too.

* `IPROTO_BALLOT_IS_LOADING 0x04` used to mean "the instance has
  finished `box.cfg()` and it has `read_only = true`". The name
  was wrong therefore, because even if the instance finished
  loading, the flag still was false for `read_only = true` nodes.
  Also such a value is not very suitable for any sane usage.
  The name was changed to `IPROTO_BALLOT_IS_RO`, the code stayed
  the same, and the value now is "the instance is not writable".
  The reason for being not writable can be any: the node is an
  orphan; or it has `read_only = true`; or it is a Raft follower;
  or anything else.

And there is a new field.

`IPROTO_BALLOT_IS_BOOTED 0x06` means the instance has finished its
bootstrap or recovery.
  • Loading branch information
Gerold103 committed Jun 10, 2021
1 parent 89f1f63 commit 6a14c8d
Show file tree
Hide file tree
Showing 8 changed files with 160 additions and 9 deletions.
6 changes: 6 additions & 0 deletions changelogs/unreleased/gh-5613-bootstrap-prefer-booted.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## bugfix/replication

* Fixed an error when a replica, at attempt to join a cluster with exclusively
read-only replicas available, instead of failing or retrying just decided to
boot its own replicaset. Now it fails with an error about the other nodes
being read-only so they can't register it (gh-5613).
20 changes: 11 additions & 9 deletions src/box/replication.cc
Original file line number Diff line number Diff line change
Expand Up @@ -951,15 +951,6 @@ replicaset_next(struct replica *replica)
return replica_hash_next(&replicaset.hash, replica);
}

/**
* Compare vclock, read only mode and orphan status
* of all connected replicas and elect a leader.
* Initiallly, skip read-only replicas, since they
* can not properly act as bootstrap masters (register
* new nodes in _cluster table). If there are no read-write
* replicas, choose a read-only replica with biggest vclock
* as a leader, in hope it will become read-write soon.
*/
struct replica *
replicaset_find_join_master(void)
{
Expand All @@ -972,12 +963,23 @@ replicaset_find_join_master(void)
const struct ballot *ballot = &applier->ballot;
int score = 0;
/*
* First of all try to ignore non-booted instances. Including
* self if not booted yet. For self it is even dangerous as the
* instance might decide to boot its own cluster if, for
* example, the other nodes are available, but read-only. It
* would be a mistake.
*
* For a new cluster it is ok to use a non-booted instance as it
* means the algorithm tries to find an initial "boot-master".
*
* Prefer instances not configured as read-only via box.cfg, and
* not being in read-only state due to any other reason. The
* config is stronger because if it is configured as read-only,
* it is in read-only state for sure, until the config is
* changed.
*/
if (ballot->is_booted)
score += 10;
if (!ballot->is_ro_cfg)
score += 5;
if (!ballot->is_ro)
Expand Down
78 changes: 78 additions & 0 deletions test/replication/gh-5613-bootstrap-prefer-booted.result
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
-- test-run result file version 2
test_run = require('test_run').new()
| ---
| ...

--
-- gh-5613: when a new replica is joined to a cluster, it must prefer
-- bootstrapped join sources over non-bootstrapped ones, including self. Even
-- if all the bootstrapped ones are read-only. Otherwise the node might have
-- decided to bootstrap a new cluster on its own and won't be able to join the
-- existing one forever.
--

test_run:cmd('create server master with script="replication/gh-5613-master.lua"')
| ---
| - true
| ...
test_run:cmd('start server master with wait=False')
| ---
| - true
| ...
test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"')
| ---
| - true
| ...
test_run:cmd('start server replica1')
| ---
| - true
| ...
test_run:switch('master')
| ---
| - true
| ...
box.cfg{read_only = true}
| ---
| ...
test_run:switch('default')
| ---
| - true
| ...

test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"')
| ---
| - true
| ...
-- It returns false, but it is expected.
test_run:cmd('start server replica2 with crash_expected=True')
| ---
| - false
| ...
opts = {filename = 'gh-5613-replica2.log'}
| ---
| ...
assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil)
| ---
| - true
| ...

test_run:cmd('delete server replica2')
| ---
| - true
| ...
test_run:cmd('stop server replica1')
| ---
| - true
| ...
test_run:cmd('delete server replica1')
| ---
| - true
| ...
test_run:cmd('stop server master')
| ---
| - true
| ...
test_run:cmd('delete server master')
| ---
| - true
| ...
29 changes: 29 additions & 0 deletions test/replication/gh-5613-bootstrap-prefer-booted.test.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
test_run = require('test_run').new()

--
-- gh-5613: when a new replica is joined to a cluster, it must prefer
-- bootstrapped join sources over non-bootstrapped ones, including self. Even
-- if all the bootstrapped ones are read-only. Otherwise the node might have
-- decided to bootstrap a new cluster on its own and won't be able to join the
-- existing one forever.
--

test_run:cmd('create server master with script="replication/gh-5613-master.lua"')
test_run:cmd('start server master with wait=False')
test_run:cmd('create server replica1 with script="replication/gh-5613-replica1.lua"')
test_run:cmd('start server replica1')
test_run:switch('master')
box.cfg{read_only = true}
test_run:switch('default')

test_run:cmd('create server replica2 with script="replication/gh-5613-replica2.lua"')
-- It returns false, but it is expected.
test_run:cmd('start server replica2 with crash_expected=True')
opts = {filename = 'gh-5613-replica2.log'}
assert(test_run:grep_log(nil, 'ER_READONLY', nil, opts) ~= nil)

test_run:cmd('delete server replica2')
test_run:cmd('stop server replica1')
test_run:cmd('delete server replica1')
test_run:cmd('stop server master')
test_run:cmd('delete server master')
11 changes: 11 additions & 0 deletions test/replication/gh-5613-master.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env tarantool

require('console').listen(os.getenv('ADMIN'))
box.cfg({
listen = 'unix/:./gh-5613-master.sock',
replication = {
'unix/:./gh-5613-master.sock',
'unix/:./gh-5613-replica1.sock',
},
})
box.schema.user.grant('guest', 'super')
13 changes: 13 additions & 0 deletions test/replication/gh-5613-replica1.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env tarantool

require('console').listen(os.getenv('ADMIN'))
box.cfg({
listen = 'unix/:./gh-5613-replica1.sock',
replication = {
'unix/:./gh-5613-master.sock',
'unix/:./gh-5613-replica1.sock',
},
-- Set to read_only initially so as the bootstrap-master would be
-- known in advance.
read_only = true,
})
11 changes: 11 additions & 0 deletions test/replication/gh-5613-replica2.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env tarantool

require('console').listen(os.getenv('ADMIN'))
box.cfg({
listen = 'unix/:./gh-5613-replica2.sock',
replication = {
'unix/:./gh-5613-master.sock',
'unix/:./gh-5613-replica1.sock',
'unix/:./gh-5613-replica2.sock',
},
})
1 change: 1 addition & 0 deletions test/replication/suite.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
"gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {},
"gh-5536-wal-limit.test.lua": {},
"gh-5566-final-join-synchro.test.lua": {},
"gh-5613-bootstrap-prefer-booted.test.lua": {},
"gh-6032-promote-wal-write.test.lua": {},
"gh-6057-qsync-confirm-async-no-wal.test.lua": {},
"gh-6094-rs-uuid-mismatch.test.lua": {},
Expand Down

0 comments on commit 6a14c8d

Please sign in to comment.