Keepalived is losing VIP upon "nmcli c up <con-name>" and failover does not occur #2564

bupt075225 · 2025-03-19T08:58:49Z

Describe the bug
After down/up interface via nmcli, NetworkManager deletes the VIP address and does not reassign the VIP to the configured interface.

To Reproduce
Step 1: On the backup node, use "nmcli c down bond0.91" to take down the network, then use "nmcli c up bond0.91" to restore the network.
Step 2: On the master node, use "nmcli c down bond0.91" to take down the network, then use "nmcli c up bond0.91" to restore the network.
The VIP is lost and will not be configured on any node running Keepalived.

Expected behavior
VIP should configured on the new master node

Keepalived version
2.2.7

Distro (please complete the following information):

Name:Red Hat Enterprise Linux 9
Linux kernel Version: 5.15.131
Architecture: x86_64

Details of any containerisation or hosted service (e.g. AWS)

keepalived running in a container on k8s cluster

Configuration file:

global_defs {
    enable_script_security
    script_user root
    max_auto_priority -1
    vrrp_garp_master_refresh 60
}

vrrp_script chk_cpu_affinity {
    script "/root/keepalived_check_cpu_affinity.sh"
    interval 300
    fall 2
    rise 2
}

vrrp_instance dns-grp {
    interface bond0.91
    state BACKUP
    virtual_router_id 127
    priority 100
    unicast_src_ip 10.255.62.7
    unicast_peer {
        10.255.62.8
        10.255.62.6
    }

    virtual_ipaddress {
        10.255.62.14/27
    }
    notify /root/notify_stor_dns.sh
    track_script {
        chk_cpu_affinity
    }
}

Notify and track scripts

If any notify or track scripts are in use, please provide copies of them

System Log entries

Wed Mar 19 18:45:25 2025: Deassigned address fe80::c425:ed87:9a2d:d81c from interface bond0.91
Wed Mar 19 18:45:25 2025: Deassigned address 10.255.62.7 from interface bond0.91
Wed Mar 19 18:45:25 2025: Netlink reports bond0.91 down
Wed Mar 19 18:45:25 2025: (dns-grp) Entering FAULT STATE
2025-03-19 18:45:25 notify_stor_dns.sh:execute notify dns script
2025-03-19 18:45:25 notify_stor_dns.sh:INSTANCE:dns-grp become fault
Wed Mar 19 18:45:25 2025: Interface vxlan.calico deleted
Wed Mar 19 18:45:25 2025: Interface bond0.91 deleted
Wed Mar 19 18:45:25 2025: Closing vrrp socket fd_in
Wed Mar 19 18:45:25 2025: Closing vrrp socket fd_in
Wed Mar 19 18:45:35 2025: Interface bond0.91 added
Wed Mar 19 18:45:35 2025: (dns-grp) interface bond0.91 is down
Wed Mar 19 18:45:35 2025: Netlink reports bond0.91 up
Wed Mar 19 18:45:35 2025: (dns-grp) Entering BACKUP STATE
Wed Mar 19 18:45:35 2025: dns-grp: sending gratuitous ARP for 10.255.62.7
Wed Mar 19 18:45:35 2025: Sending gratuitous ARP on bond0.91 for 10.255.62.7
Wed Mar 19 18:45:35 2025: Assigned address 10.255.62.7 for interface bond0.91
2025-03-19 18:45:35 notify_stor_dns.sh:execute notify dns script
Wed Mar 19 18:45:35 2025: Assigned address fe80::c425:ed87:9a2d:d81c for interface bond0.91
2025-03-19 18:45:35 notify_stor_dns.sh:INSTANCE:dns-grp become backup

The above log "Closing vrrp socket fd_in" that is added to check this bug

Root Cause
When Keepalived receives a link delete event via netlink, cleanup_lost_interface() closes the sockets. Even if the network interface is brought back up, Keepalived nodes will no longer send or receive VRRP packets.

The text was updated successfully, but these errors were encountered:

pqarmitage · 2025-03-19T09:23:41Z

Could you please try using keepalived v2.3.2 and see if that resolves your issue. There have been a number of improvements in this area since v2.2.7.

bupt075225 · 2025-03-20T01:44:33Z

I have tried with keepalived v2.3.2, and the same issue persists.

pqarmitage · 2025-03-29T09:00:25Z

The versions of keepalived that you tested do not have a problem with interfaces simply being downed and upped, but the problem here, rather, is that for some reason the bond0.91 interface is deleted, and then 10 seconds later it is recreated.

To handle interfaces being deleted and recreated, you need to specify dynamic_interfaces in the global_defs section of your configuration. Also, quite a bit of work has been done in the last week to improve the handling of the deletion and recreation of interfaces, so you will probably need to build keepalived from the head of the master branch in order for this to work successfully.

bupt075225 · 2025-03-31T03:35:42Z

Thanks very much, I will have a try

pqarmitage · 2025-04-26T11:20:10Z

@bupt075225 Do you have any update on this? an we close the issue?

pqarmitage added the Awaiting feedback label Apr 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Keepalived is losing VIP upon "nmcli c up <con-name>" and failover does not occur #2564

Keepalived is losing VIP upon "nmcli c up <con-name>" and failover does not occur #2564

bupt075225 commented Mar 19, 2025

pqarmitage commented Mar 19, 2025

Uh oh!

bupt075225 commented Mar 20, 2025

Uh oh!

pqarmitage commented Mar 29, 2025

Uh oh!

bupt075225 commented Mar 31, 2025

Uh oh!

pqarmitage commented Apr 26, 2025

Uh oh!

Uh oh!

Keepalived is losing VIP upon "nmcli c up <con-name>" and failover does not occur #2564

Keepalived is losing VIP upon "nmcli c up <con-name>" and failover does not occur #2564

Comments

bupt075225 commented Mar 19, 2025

pqarmitage commented Mar 19, 2025

Uh oh!

bupt075225 commented Mar 20, 2025

Uh oh!

pqarmitage commented Mar 29, 2025

Uh oh!

bupt075225 commented Mar 31, 2025

Uh oh!

pqarmitage commented Apr 26, 2025

Uh oh!