-
-
Notifications
You must be signed in to change notification settings - Fork 749
vrrp: add fault_init_exit_delay to vrrp configuration #2556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I have explained more details about this changeset here: #2555 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very happy with this patch in principle but there a few issues that need to be resolved first.
Unfortunately the patch as submitted doesn't work properly. Consider a VRRP instance with priority 240 already running as master, and another instance with priority 220 starts up. vrrp->sands is calculated to be route_propagation_delay + MDT in the future, and until that time expires any received adverts are ignored. As soon as the sands timer expires, the lower priority instance transitions to master state (because that is what the expiry of the sands timer means). The lower priority VRRP instance does revert to backup state after it next receives an advert from the priority 240 instance, but the transition to master should not have happened in the first place.
There is a simple fix to this problem, and that is in vrrp_init_instance_sands()
set vrrp->route_propagated_time = timer_add_long(time_now, vrrp->route_propagation_delay)
, so that the priority 220 instance then has MDT after the vrrp->route_propagated_time has passed to be able to receive adverts before it decides to transition to master.
There is also a problem with configuration reloads. If an instance is in backup state and its configuration is reloaded, then after the reload the route propagation delay is again applied to the VRRP instance, and of course this is wrong if the old VRRP instance is not waiting for the route_propagated_time to expire. The propagation delay needs to be carried across a reload, more specifically the vrrp->route_propagated_time needs to be recalculated as new vrrp->route_propagated_time = old vrrp->route_propagated_time - old vrrp->route_propagated_delay + new vrrp->route_propagated_delay, but only if old vrrp->route_propagated_time has not passed.
Your implementation makes the vrrp instance enter backup state as normal, but then ignores any received adverts for the configured delay. I think this is the wrong approach, since what you are really wanting to happen is that the vrrp instance remains in the init/fault state after a fault is cleared, or after startup, for the configured delay time, before it then transitions to backup state.
Cosmetic
I think the names of the configuration parameter and the variables are too specific to your purpose - there may be other users who want to use this feature but for other reasons. I think the names should be based on something like "fault_init_exit_delay", so that they are general purpose. Comments need to be adjusted accordingly.
Minor
In vrrp_route_propagation_delay_handler, don't clear current_vrrp->route_propagation_delay on error (this is how most of the similar code behaves - preempt_delay currently does clear the setting, but I will change that to make it consistent with the other parameters).
The log entry Applied vrrp route propagation delay of 10000000 on instance (VI_1)
needs to be changed to Applied vrrp route propagation delay of 10.000000 seconds on instance (VI_1)
.
The new variables need to be included in the dump_vrrp() function in vrrp_data.c, and also in the SNMP variables in the KEEPALIVED.MIB.
Major
doc/man/man5/keepalived.conf.5.in must be updated to add the new keyword (otherwise no-one else will know it exists).
In vrrp_dispatcher_read()
the added return sock->fd_in should be continue.
The change to vrrp_dispatcher_read()
adds a timer check every time a packet is received (if this feature is enabled). This is not the way keepalived handles time. A timer thread needs to be added so that when the timer expires, the thread is run and the state is changed accordingly.
I think this is best implemented by adding a timer thread (thread_add_timer()), with the timer being route_propagation_delay, and the vrrp instance remains in the fault state until the timer expires. When the timer expires the vrrp instance then transitions to backup state. If a new fault occurs before the timer expires, the existing timer thread would need to be cancelled. The timer thread would also need to be carried over a reload (I think the way this is best handled is to the vrrp structure to have a pointer to a fault_exit_timer_thread, which is cleared when the timer expires. If the timer still exists when a reload occurs, then a new timer can be created, using the sands value from the old timer thread suitably adjusted for any changes to the configured route_propagation_delay. The pointer to the fault_exit_timer_thread could also be used to cancel the timer if a new fault occurs before the timer expires).
If you would like me to help with making the changes above, or even me complete them, I am quite happy to do so, but I always prefer to let the originator of a patch complete any necessary changes in the first place.
Thank you for your comments. I will send a new patch soon. |
Thanks for your comments @pqarmitage. In the previous patch, when VRRP exits the FAULT state, a timer is started, and VRRP transitions to the BACKUP state, where it remains for the duration of the fault_init_exit_delay interval. In your previous comment, you suggested that it would be better to stay in the FAULT state instead: However, upon further reflection, this approach adds complexity, particularly in handling scenarios where VRRP is reloaded while the fault_init_exit_delay timer is still running. Since all faults are recomputed after a reload, should we consider a running timer as an active fault? If so, should the new VRRP instance start in the FAULT state until the timer expires, after which it transitions to the BACKUP state (or whichever state is configured)? Please let me know your thoughts on the same. |
Add a new keyword named "fault_init_exit_delay" to vrrp_instance. This will be a per instance parameter like "preempt_delay", and will apply when the instance leaves from a fault or init state.
Add FaultInitExitDelay to SNMP MIB
When router moves from fault to backup state, a variable fault_init_time_needed is set. During the subsequent call to vrrp_init_instance_sands, 1. sands timer (which determines the next timeout notification) is increased by fault_init_exit_delay seconds. 2. fault_init_exit_time is set, which determines the time, before which we don't send out any advertisement (even if we don't receive packets and the MDT time has expired). 3. block_socket_time is set, which determines the timestamp, till which no packets that are received from the corresponding socket are processed.
Apply fault_init_exit_delay, when vrrp instance moves out of init state. The actual delay is applied when the vrrp instance tries to take over as master.
When fault_init_exit_delay is modified, and the config is reloaded, the new fault_init_exit_delay is applied in the following manner: 1. If the fault_init_exit_time has already passed, don't change it.V When fault_init_exit_delay is modified, and the config is reloaded, the new fault_init_exit_delay is applied in the following manner: 1. If the fault_init_exit_time has already passed, don't change it. The new fault_init_exit_delay will only be applied the next time the interface comes out of fault state. 2. If the new fault_init_exit_delay is greater than the old one, and the fault_init_exit_delay timer is presently running, increase the timer by the difference. 3. If the new fault_init_exit_delay is less than the old one, and the fault_init_exit_time timer is in progress, reduce the timer by the difference between the two, provided the reduced time is greater than the present time.
When vrrp state moves out of a fault state, we would like to wait for an additional route propagated delay time, before doing a vrrp master election.
During this interval, the packets that are received from other vrrp nodes, which can alter the state of the vrrp instance are dropped. The present nodes, does not pass to participate in leader election which can result in it becoming the master and taking over the VRRP IP.