You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the current implementation of raft it is possible to disable all synchronous transactions by adding a new instance to the cluster until it is fully joined and up-to-date.
The simplest example: there is 1 node with the canonical (replication_synchro_quorum = 'N/2 + 1'). It works fine, but has lots of data. If now a second node is added, the first node won't be able to commit any synchronous data because the quorum became 2, but the second node is busy with long initial and then final join stages.
The same will happen for a cluster having multiple nodes and getting multiple new instances. For example, cluster of 3 nodes, and it gets 3 new nodes - the quorum becomes 4 and the synchronous replication is dead until at least one of the new nodes finished its join stages.
In etcd the problem is solved by having a special role for such instances: https://etcd.io/docs/v3.3/learning/learner/ - learner. They do not affect the quorum, can't vote, can't be leaders. But they can join and get up to date with the leader before entering the quorum.
Need to check if this issue actually exists in Tarantool (make a test, perhaps with an error injection). Then try to add the new role.
In the etcd approach, if I understood it correctly, I do not like that a user needs to set the role manually. In Tarantool it might be done automatically, I hope. Make all new nodes learners until they finish at least both join stages, and then turn of the actual election_mode specified in the config.
The text was updated successfully, but these errors were encountered:
With the current implementation of raft it is possible to disable all synchronous transactions by adding a new instance to the cluster until it is fully joined and up-to-date.
The simplest example: there is 1 node with the canonical (
replication_synchro_quorum = 'N/2 + 1'
). It works fine, but has lots of data. If now a second node is added, the first node won't be able to commit any synchronous data because the quorum became 2, but the second node is busy with long initial and then final join stages.The same will happen for a cluster having multiple nodes and getting multiple new instances. For example, cluster of 3 nodes, and it gets 3 new nodes - the quorum becomes 4 and the synchronous replication is dead until at least one of the new nodes finished its join stages.
In etcd the problem is solved by having a special role for such instances: https://etcd.io/docs/v3.3/learning/learner/ - learner. They do not affect the quorum, can't vote, can't be leaders. But they can join and get up to date with the leader before entering the quorum.
Need to check if this issue actually exists in Tarantool (make a test, perhaps with an error injection). Then try to add the new role.
In the etcd approach, if I understood it correctly, I do not like that a user needs to set the role manually. In Tarantool it might be done automatically, I hope. Make all new nodes learners until they finish at least both join stages, and then turn of the actual
election_mode
specified in the config.The text was updated successfully, but these errors were encountered: