High availability

Alexander K edited this page Mar 21, 2018 · 8 revisions

Keepalive

A high available Tempesta FW cluster in a cloud (see Clouds Wiki page for description of using Tempesta FW in different cloud environments) or on bare metal machines can be configured using keepalived. This doc describes cluster configuration of two machines. Each machine must run Tempesta FW and keepalived. Each keepalived instance sends periodic heartbeat messages to the second instance and run failovering process if the instance doesn't respond. It's recommended to use separate network interfaces for ingress HTTP traffic and internal keepalived (VRRP) traffic: if the cluster receives enormous ingress traffic and uses the same interfaces to process ingress traffic and VRRP, then VRRP messages can be dropped by an interfaces and keepalived won't be able to manage possible server failures.

Keepalived is usually available through standard Linux distribution packages. Use

    # yum install keepalived

to install it in CentOS or

    # apt-get install keepalived

to install it in Debian.

If a server fails, then it must restart all required services. Thus, add keepalived to bootup process by:

    # systemctl enable keepalived

Configuration files for keepalived are at the below. Note that active-active configuration with two virtual (floating) IP addresses, VIPs, is used. I.e. both the nodes can process traffic and each node can acquire VIP of the second node if that fails. Two VRRP instances are used for active-active mode: one instance is configured as master at the first node and as backup at the second node and the second instance is configured as backup at the first node and as master at the second one. You can use only one instance for active-passive configuration.

The doc doesn't describe keepalived configuration options. However, you can find good description of them from the user guide (the Web site says that it's deprecated, but it's still provides bunch of useful information) or keepalived.conf(5) man page. Also Internet has a lot of blog posts about keepalived configuration for various use cases.

The first node configuration:

    vrrp_script chk_tfw {
        script "wget -q -O /dev/null http://127.0.0.1/"
        interval 1
    }
    
    vrrp_instance TFW_1 {
        state BACKUP
        interface eth0
        virtual_router_id 1
        priority 100
        advert_int 1
        dont_track_primary
        unicast_src_ip 192.168.100.6
        unicast_peer {
            192.168.100.5
        }
        virtual_ipaddress {
            172.16.0.5/24 dev eth1
        }
        track_script {
            chk_tfw
        }
    }
    
    vrrp_instance TFW_2 {
        state MASTER
        interface eth0
        virtual_router_id 2
        priority 200
        advert_int 1
        dont_track_primary
        unicast_src_ip 192.168.100.6
        unicast_peer {
            192.168.100.5
        }
        virtual_ipaddress {
            172.16.0.6/24 dev eth1
        }
        track_script {
            chk_tfw
        }
    }

Note that eth0 is a private network interface for VRRP communications and eth1 is an external interface. It's supposed that Tempesta FW is running at 0.0.0.0:80 and we use wget -q -O /dev/null http://127.0.0.1/ to verify that if works as expected. 0.0.0.0 should be used to allow Tempesta FW accept traffic at VIP addresses: if an address appears at the system, then there is nothing to be done to make Tempesta FW accept connections at the new address. We don't use authentication, e-mail notifications, and other nice keepalived features in the configuration file for the example brevity.

The second node configuration looks the similar, except assigning master and backup roles to different instances:

    vrrp_script chk_tfw {
        script "wget -q -O /dev/null http://127.0.0.1/"
        interval 1
    }
    
    vrrp_instance TFW_1 {
        state MASTER
        interface eth0
        virtual_router_id 1
        priority 200
        advert_int 1
        dont_track_primary
        unicast_src_ip 192.168.100.5
        unicast_peer {
            192.168.100.6
        }
        virtual_ipaddress {
            172.16.0.5/24 dev eth1
        }
        track_script {
            chk_tfw
        }
    }
    
    vrrp_instance TFW_2 {
        state BACKUP
        interface eth0
        virtual_router_id 2
        priority 100
        advert_int 1
        dont_track_primary
        unicast_src_ip 192.168.100.5
        unicast_peer {
            192.168.100.6
        }
        virtual_ipaddress {
            172.16.0.6/24 dev eth1
        }
        track_script {
            chk_tfw
        }
    }

Now, let's restart keepalived and ensure that the VIP appears the the first node:

    # systemctl restart keepalived
    # ip addr show dev eth1
    2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 172.16.0.1/24 brd 172.16.0.255 scope global eth1
           valid_lft forever preferred_lft forever
        inet 172.16.0.5/24 scope global secondary eth1
           valid_lft forever preferred_lft forever
        inet6 fe80::5054:ff:fe12:3456/64 scope link 
           valid_lft forever preferred_lft forever

If we stop Tempesta FW at the second node, then we'll see following in /var/log/messages:

    Mar  5 23:04:38 localhost Keepalived_vrrp[15199]: VRRP_Instance(TFW_2) Transition to MASTER STATE
    Mar  5 23:04:39 localhost Keepalived_vrrp[15199]: VRRP_Instance(TFW_2) Entering MASTER STATE
    Mar  5 23:04:39 localhost Keepalived_vrrp[15199]: VRRP_Instance(TFW_2) setting protocol VIPs.
    Mar  5 23:04:39 localhost Keepalived_vrrp[15199]: VRRP_Instance(TFW_2) Sending gratuitous ARPs on eth1 for 172.16.0.6
    Mar  5 23:04:39 localhost avahi-daemon[662]: Registering new address record for 192.168.100.60 on eth1.IPv4.
    Mar  5 23:04:39 localhost Keepalived_healthcheckers[15198]: Netlink reflector reports IP 192.168.100.60 added
    Mar  5 23:04:44 localhost Keepalived_vrrp[15199]: VRRP_Instance(TFW_2) Sending gratuitous ARPs on eth1 for 172.16.0.6

The second VIP appears at the interface:

    # ip addr show dev eth1
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
        inet 172.16.0.1/24 brd 192.168.100.255 scope global eth1
           valid_lft forever preferred_lft forever
        inet 172.16.0.5/24 scope global secondary eth1
           valid_lft forever preferred_lft forever
        inet 172.16.0.6/24 scope global secondary eth1
           valid_lft forever preferred_lft forever
        inet6 fe80::5054:ff:fe12:3456/64 scope link 
           valid_lft forever preferred_lft forever

If you restart or just switch of the second node, you'll see the same.

You can send an HTTP request to either 172.16.0.5 or 172.16.0.6 and always received response from your Tempesta FW cluster. You can stop any of the nodes for maintenance and do not care about your site availability, keepalived cares about it.

Cross-Zone Load Balancing

In some cases it's recommended to backend site in different subnets to achieve better reliability. This is the case for Cross-zone load valancing in Amazon Elastic Load Balancing (ELB). The other example could be a CDN routing traffic to different client's data centers. In both the cases your private cloud is going to have structure like on the picture at the below.

There are 2 backend server groups in different subnets, 192.168.100.0/24 and 192.168.200.0/24 correspondingly. Tempesta FW instances balance load among servers of both the groups separately. However, you can configure Tempesta FW to use other server group if the current server group fails. To do so you can use backup option, i.e. configuration for the first Tempesta FW instance will look as

srv_group grp1 {
    server 192.168.100.10;
    server 192.168.100.11;
}

srv_group grp2 {
    server 192.168.200.10;
    server 192.168.200.11;
    server 192.168.200.12;
}

sched_http_rules {
    match grp1 * * * backup=grp2;
}

The second Tempesta FW instance can use almost the same configuration, but the last rule points out grp2 as the main server group and grp1 as a backup server group:

srv_group grp1 {
    server 192.168.100.10;
    server 192.168.100.11;
}

srv_group grp2 {
    server 192.168.200.10;
    server 192.168.200.11;
    server 192.168.200.12;
}

sched_http_rules {
    match grp2 * * * backup=grp1;
}

In this scenario both the Tempesta FW instances sends traffic to their local server groups and if the local server group fails, then the other one is used. This is essentially active-standby scenario and you can make active-active scenario by aggregating all the servers into the same group, but set different weights. The configuration for the first instance will be:

srv_group grp {
    server 192.168.100.10 weight=35;
    server 192.168.100.11 weight=35;
    server 192.168.200.10 weight=10;
    server 192.168.200.11 weight=10;
    server 192.168.200.12 weight=10;
}

sched_http_rules {
    match grp * * *;
}

I.e. the server from the second group will receive 30% of the traffic while local group servers will receive 70% of the traffic (given that there are 3 server from the 2nd group and only 2 from the 1t one). Configuration for the second Tempesta FW looks the similar except different weights for the servers. Note that there is no backup option - the load is automatically redistributes if some of the servers are going to die.

The weights 35 and 10 can be considered as something unnatural. Really, it's clear that servers from the remote group will reply slower than servers from the local group, but usually it's hard to say how much. You can configure Tempesta FW to automatically estimate the server response time and distribute traffic among all servers from both the groups according to their real response time. To do so you can use dynamic load balancing. Configuration for the first instance becomes:

srv_group grp {
    server 192.168.100.10;
    server 192.168.100.11;
    server 192.168.200.10;
    server 192.168.200.11;
    server 192.168.200.12;
    sched dynamic percentile 75;
}

sched_http_rules {
    match grp * * *;
}

Note the scheduler option sched dynamic percentile 75 which dynamically calculates 75th percentile for response time of each server and assigns weights to the servers according to the response time (a server with lower value of the response time percentile get the higher weight).

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.