Remove a peer from the Raft quorum only if the load balancer would allow a new one to be added. #1218
Labels
area/docdb
YugabyteDB core features
help wanted
We welcome your contributions for this issue!
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-2435
Currently, Raft groups independently kick out a failed peer from the quorum independently. However, adding new peers into the quorum is controlled by the master/load-balancer and throttled to limit the number of concurrent remote bootstraps.
We should make this symmetric so that a failed peer is kicked out of the quorum only when the load balancer actually has the bandwidth to add some other node to the quorum.
So, instead of a failed peer losing all the tablets that he had after 5 mins of absence, the new behavior will be to remove the failed peer from the tablets he owns in a throttled fashion. So, if he were to come back before the system has moved away all his tablets, he can start off from where he left on those tablets.
(This may require #1217 for this to be effective.)
The text was updated successfully, but these errors were encountered: