Skip to content

Keepalived is losing VIP upon "nmcli c up <con-name>" and failover does not occur #2564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bupt075225 opened this issue Mar 19, 2025 · 5 comments
Labels
Awaiting feedback Awaiting feedback from the originator of the issue

Comments

@bupt075225
Copy link

Describe the bug
After down/up interface via nmcli, NetworkManager deletes the VIP address and does not reassign the VIP to the configured interface.

To Reproduce
Step 1: On the backup node, use "nmcli c down bond0.91" to take down the network, then use "nmcli c up bond0.91" to restore the network.
Step 2: On the master node, use "nmcli c down bond0.91" to take down the network, then use "nmcli c up bond0.91" to restore the network.
The VIP is lost and will not be configured on any node running Keepalived.

Expected behavior
VIP should configured on the new master node

Keepalived version
2.2.7

Distro (please complete the following information):

  • Name:Red Hat Enterprise Linux 9
  • Linux kernel Version: 5.15.131
  • Architecture: x86_64

Details of any containerisation or hosted service (e.g. AWS)

keepalived running in a container on k8s cluster

Configuration file:

global_defs {
    enable_script_security
    script_user root
    max_auto_priority -1
    vrrp_garp_master_refresh 60
}

vrrp_script chk_cpu_affinity {
    script "/root/keepalived_check_cpu_affinity.sh"
    interval 300
    fall 2
    rise 2
}

vrrp_instance dns-grp {
    interface bond0.91
    state BACKUP
    virtual_router_id 127
    priority 100
    unicast_src_ip 10.255.62.7
    unicast_peer {
        10.255.62.8
        10.255.62.6
    }

    virtual_ipaddress {
        10.255.62.14/27
    }
    notify /root/notify_stor_dns.sh
    track_script {
        chk_cpu_affinity
    }
}

Notify and track scripts

If any notify or track scripts are in use, please provide copies of them

System Log entries

Wed Mar 19 18:45:25 2025: Deassigned address fe80::c425:ed87:9a2d:d81c from interface bond0.91
Wed Mar 19 18:45:25 2025: Deassigned address 10.255.62.7 from interface bond0.91
Wed Mar 19 18:45:25 2025: Netlink reports bond0.91 down
Wed Mar 19 18:45:25 2025: (dns-grp) Entering FAULT STATE
2025-03-19 18:45:25 notify_stor_dns.sh:execute notify dns script
2025-03-19 18:45:25 notify_stor_dns.sh:INSTANCE:dns-grp become fault
Wed Mar 19 18:45:25 2025: Interface vxlan.calico deleted
Wed Mar 19 18:45:25 2025: Interface bond0.91 deleted
Wed Mar 19 18:45:25 2025: Closing vrrp socket fd_in
Wed Mar 19 18:45:25 2025: Closing vrrp socket fd_in
Wed Mar 19 18:45:35 2025: Interface bond0.91 added
Wed Mar 19 18:45:35 2025: (dns-grp) interface bond0.91 is down
Wed Mar 19 18:45:35 2025: Netlink reports bond0.91 up
Wed Mar 19 18:45:35 2025: (dns-grp) Entering BACKUP STATE
Wed Mar 19 18:45:35 2025: dns-grp: sending gratuitous ARP for 10.255.62.7
Wed Mar 19 18:45:35 2025: Sending gratuitous ARP on bond0.91 for 10.255.62.7
Wed Mar 19 18:45:35 2025: Assigned address 10.255.62.7 for interface bond0.91
2025-03-19 18:45:35 notify_stor_dns.sh:execute notify dns script
Wed Mar 19 18:45:35 2025: Assigned address fe80::c425:ed87:9a2d:d81c for interface bond0.91
2025-03-19 18:45:35 notify_stor_dns.sh:INSTANCE:dns-grp become backup

The above log "Closing vrrp socket fd_in" that is added to check this bug

Root Cause
When Keepalived receives a link delete event via netlink, cleanup_lost_interface() closes the sockets. Even if the network interface is brought back up, Keepalived nodes will no longer send or receive VRRP packets.

@pqarmitage
Copy link
Collaborator

Could you please try using keepalived v2.3.2 and see if that resolves your issue. There have been a number of improvements in this area since v2.2.7.

@bupt075225
Copy link
Author

I have tried with keepalived v2.3.2, and the same issue persists.

@pqarmitage
Copy link
Collaborator

The versions of keepalived that you tested do not have a problem with interfaces simply being downed and upped, but the problem here, rather, is that for some reason the bond0.91 interface is deleted, and then 10 seconds later it is recreated.

To handle interfaces being deleted and recreated, you need to specify dynamic_interfaces in the global_defs section of your configuration. Also, quite a bit of work has been done in the last week to improve the handling of the deletion and recreation of interfaces, so you will probably need to build keepalived from the head of the master branch in order for this to work successfully.

@bupt075225
Copy link
Author

Thanks very much, I will have a try

@pqarmitage
Copy link
Collaborator

@bupt075225 Do you have any update on this? an we close the issue?

@pqarmitage pqarmitage added the Awaiting feedback Awaiting feedback from the originator of the issue label Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting feedback Awaiting feedback from the originator of the issue
Projects
None yet
Development

No branches or pull requests

2 participants