Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible network issue involving VRRP #112

Closed
purpleidea opened this issue Jan 4, 2014 · 8 comments
Closed

Possible network issue involving VRRP #112

purpleidea opened this issue Jan 4, 2014 · 8 comments

Comments

@purpleidea
Copy link
Contributor

Not entirely sure if this is vagrant-libvirt's fault, but I'm not sure where else to look.

I have a four host vagrant-libvirt deployment. I'm trying to run VRRP (keepalived) to provide a virtual ip address. For whatever reason, I haven't been able to get it to successfully work. I've built the same setup on iron or non vagrant managed vm's without issue before.

Some details:

hosts: annex{1..4}
ips: 192.168.142.10{1..4}
vip: 192.168.142.3
router: 192.168.142.1

Each host has a similar keepalived configuration:

vrrp_instance VI_GLUSTER {
        interface eth2  # multicast communication link
        mcast_src_ip 192.168.142.101    # multicast source ip address
        state BACKUP            # MASTER or BACKUP
        virtual_router_id 42
        priority 254
        advert_int 3    # advertisement interval in seconds
        authentication {
                auth_type PASS
                auth_pass password
        }
        virtual_ipaddress {
                192.168.142.3/24 dev eth2 label eth2:1  # the VIP to share eg: ip/cidr dev ethx
        }
}

the priority is different for each host. I've left them all as BACKUP for demonstration, but it doesn't change anything if one is MASTER. This only sets the initial state.

# tcpdump -v -i eth2 vrrp
tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
14:36:23.217141 IP (tos 0xc0, ttl 255, id 182, offset 0, flags [none], proto VRRP (112), length 40)
    192.168.142.1 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 42, prio 254, authtype simple, intvl 3s, length 20, addrs: 192.168.142.3 auth "password"

14:36:23.217291 IP (tos 0xc0, ttl 255, id 182, offset 0, flags [none], proto VRRP (112), length 40)
    annex4.example.com > vrrp.mcast.net: VRRPv2, Advertisement, vrid 42, prio 251, authtype simple, intvl 3s, length 20, addrs: 192.168.142.3 auth "password"

This is from annex4... as you can see it sees it's own advertisement traffic, but also traffic from annex1 (I know it's annex1 because priority field is 254 which is unique to annex1) but the source IP seems to be the router! Not sure if this is supposed to be the case. I would expect it to be "annex1" Maybe this is what vagrant-libvirt networking is breaking? If so this would explain why each host is ignoring advertisement packets from each other.

Each host has decided to be the MASTER and hold the VIP.

The one strange thing about all this (apart from it not working) is that in the tcpdump, each host sees it's own traffic, and the traffic from annex1. Annex1 only sees it's own traffic. Strange that there is an asymmetry.

So maybe something strange is happening with the vagrant-libvirt networking. My config for each vm is:

# this is a red herring network to make vagrant-libvirt put it's dhcp traffic here...
vm.vm.network :private_network,
:ip => "10.10.10.101, # 102,103,104 ...
:libvirt__network_name => 'default'

# this is the real network that we'll use...
vm.vm.network :private_network,
:ip => 192.168.142.101, # or 102,103,104
:libvirt__dhcp_enabled => false,
:libvirt__network_name => 'gluster'

Any suggestions are appreciated!

@sciurus
Copy link
Contributor

sciurus commented Jan 4, 2014

When you say this setup works in "non vagrant managed vm's", does that include VMs managed by libvirt?

I don't think we're doing anything that would break multicast. Libvirt does have a concept of network filters, but vagrant-libvirt doesn't enable any of them.

@purpleidea
Copy link
Contributor Author

@sciurus I have tested it with vm's that I've built manually with cobbler in the past (this was a long time ago).

Most recent non-vagrant related tests have been on physical iron. (and keepalived works as expected).

Not quite sure what is wrong for sure, but seeing as the vagrant 192.168.142.1 gateway is involved, I thought it might be related to vagrant.

@sciurus
Copy link
Contributor

sciurus commented Jan 4, 2014

Okay. I'd call 192.168.142.1 a libvirt, rather than a vagrant, gateway. vagrant-libvirt is just providing a nicer interface to functionality implemented by libvirt; as soon as your vagrant up command finishes we're out of the picture.

You should be able to view the network configuration by running virsh net-dumpxml gluster. The documentation for that configuration is here and an overview of libvirt networking is here. If you find that there's a way we can change how vagrant-libvirt is setting up networks or VMs in libvirt that will make what you're trying with keepalived work, please let us know.

@purpleidea
Copy link
Contributor Author

Okay. I'd call 192.168.142.1 a libvirt, rather than a vagrant, gateway. vagrant-libvirt is just providing a nicer interface to functionality implemented by libvirt;

Very true. I'll hack on this more from the libvirt side.

You should be able to view the network configuration by running virsh net-dumpxml gluster. The documentation for that configuration is here and an overview of libvirt networking is here. If you find that there's a way we can change how vagrant-libvirt is setting up networks or VMs in libvirt that will make what you're trying with keepalived work, please let us know.

Sounds reasonable. The one thing that I'd mention, is that it would be nice to have vagrant-libvirt use a single interface for all the management, and that interface be a static IP. Currently this is only possible with DHCP. At the moment, I've worked around this with a second interface which is static for VM work, and the eth0 interface for vagrant-libvirt.

Cheers!

@purpleidea
Copy link
Contributor Author

So I think this turns out to be a libvirt bug:
https://bugzilla.redhat.com/show_bug.cgi?id=876541

I'll be upgrading to F20 later and testing again.

Thanks for your comments.

@sciurus
Copy link
Contributor

sciurus commented Jan 5, 2014

Okay, @pronix looks like this is ready to be closed then.

@purpleidea
Copy link
Contributor Author

I can close. Thanks
Sorry for the noise.

@pronix
Copy link
Member

pronix commented Jan 5, 2014

there was interesting to read. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants