You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a usecase where I need to to provide HA between multiple datacenters connected over public internet. I have keepalived running on a publicly accessible endpoint in each of the DCs.
A VRRP master is elected between these nodes (in L3 Unicast mode), such that all the internal services are accessed via the VRRP master.
If the L3 connectivity between these nodes is severed for MDT interval, the backup will take over as master. (as expected). When the HA cluster network is restored, all the routes must be propagated to all peers before a new master is re-elected.
Before the routing protocols converge among all peers, a given peer doesnt receive all messages which can lead it into believing that it is alone, and should take over as master.
This results in a split brain case where where more than one VRRP nodes own the VRRP VIP and we have more than one master.
startup-delay wont work here because it only applies when keepalived daemon restarts, and preempt delay does nt get applied here,
because it is only triggered when the node receives a VRRP advertisement of a higher priority then the instance's local priority.
Increasing advertisement intervals increases the master switchover time, and cannot compensate for BGP convergence time (if one is using BGP as we are) which can be in order of several 10s of seconds or even minutes.
In order to avoid this stituation, a new field "route-propagation-delay SECONDS" can be added to the per instance vrrp config, which basically does the following:
When a VRRP node moves from FAULT state to BACKUP and eventually to MASTER (after MDT timeout), wait for an additional "route-propagation-delay" time if such a field is set.
During this delay, all the advertisements that are received from master are ignored (just like startup-delay).
This will be applied every time the node exits out of FAULT state and does not directly to master. This feature is not supported with initial state master and vrrp strict mode. (like preempt delay)
Needless to say, no behavioral change if "route-propagation-delay" is not set in the configuration.
I can send patches to support this feature if this is useful.
The text was updated successfully, but these errors were encountered:
I have a usecase where I need to to provide HA between multiple datacenters connected over public internet. I have keepalived running on a publicly accessible endpoint in each of the DCs.
A VRRP master is elected between these nodes (in L3 Unicast mode), such that all the internal services are accessed via the VRRP master.
If the L3 connectivity between these nodes is severed for MDT interval, the backup will take over as master. (as expected). When the HA cluster network is restored, all the routes must be propagated to all peers before a new master is re-elected.
Before the routing protocols converge among all peers, a given peer doesnt receive all messages which can lead it into believing that it is alone, and should take over as master.
This results in a split brain case where where more than one VRRP nodes own the VRRP VIP and we have more than one master.
startup-delay wont work here because it only applies when keepalived daemon restarts, and preempt delay does nt get applied here,
because it is only triggered when the node receives a VRRP advertisement of a higher priority then the instance's local priority.
Increasing advertisement intervals increases the master switchover time, and cannot compensate for BGP convergence time (if one is using BGP as we are) which can be in order of several 10s of seconds or even minutes.
In order to avoid this stituation, a new field "route-propagation-delay SECONDS" can be added to the per instance vrrp config, which basically does the following:
I can send patches to support this feature if this is useful.
The text was updated successfully, but these errors were encountered: