Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

IP load balancing #1213

Open
squaremo opened this issue Jul 27, 2015 · 5 comments
Open

IP load balancing #1213

squaremo opened this issue Jul 27, 2015 · 5 comments
Labels
Milestone

Comments

@squaremo
Copy link
Contributor

Permuting DNS records is a means of load balancing, but an unsatisfactory one. Resolvers are in general stupid, and do things like remembering records forever and otherwise ignoring TTLs.

A more generally-applicable means of load balancing is to use virtual IPs. In this scheme, DNS resolves to an IP that is not (necessarily) assigned to a particular host or container interface. Routing rules, or a user-space proxy, then determine which of several non-virtual IPs actually gets the traffic.

Kubernetes uses routing rules with a proxy as a fallback: kubernetes/kubernetes#3760.

@rade
Copy link
Member

rade commented Jul 27, 2015

strawman:

  • run a "routing" container per host, on the weave network
  • each of these containers gets an IP on the weave network, as normal (or several, if multiple sub-nets are in use)
  • we register all these containers in DNS, so a client resolving the chosen name will get back all IPs, and (due to the RFC magic we discovered recently) connect to the local routing container by preference, and a remote one if the local one is not available.
  • the iptable rules inside the routing container do the load balancing magic, essentially rerouting packets to service container instances using the latter's weave address
  • the rules are populated/updated via weavedns; polling DNS for a name will work for an initial prototype; we can easily switch this to a 'push' based model later (since that is actually what we have underneath weavedns)

TBD:

  • what defines a service? In particular, can we define the routing rules without reference to specific ports? (hopefully yes). And, ideally, protocols. (doubt it; this sort of 'routing' requires connection tracking)
  • What is the relationship between the routing containers and services? Do we have just one set of routing containers, or one per service? Ideally the former.

@inercia
Copy link
Contributor

inercia commented Oct 1, 2015

I'm wondering if users are really interested in "load balancing" when they would probably find for valuable high availability. Infrastructure users are probably not so interested in resolving, for example, Zookeeper.local to many IP addresses: they probably use some client library that is perfectly happy connecting to several servers. What they probably want is to reach a Zookeeper server that is alive.

Scenario: a container runs some software that connects to Zookeeper, so it connects to [zookeeper1, zookeeper2]. The resolver obtains 10.0.2.1 and 10.0.2.2 and it never resolves these names again (and this is a problem). What users really want is not to load balance requests between 10.0.2.1 and 10.0.2.2 (the zookeeper client library probably does this in a more intellegent manner anyway). What they really want is to find a Zookeeper alive (at least) at one of those IPs.

So I was wondering if solving the HA problem would be easier for us. Instead of having some routing magic so that any new connecting gets to a running Zookeeper, maybe we could use the same mechanism used by HA systems and just send gratuitous ARP messages and redirect traffic to some other container. As you guys mentioned, we would probably need the concept of service (that would be another issue), but maybe the local Weave router could use the information in the distributed database for detecting if zookeeper2 has died (when the DNS tombstone is received, maybe) and sending a g-ARP messages to local containers with another MAC for the same service.

What do you guys think?

@squaremo
Copy link
Contributor Author

squaremo commented Oct 1, 2015

Load balancing and high availability are certainly intricately linked. Both mechanisms are present in the wild: "I don't care who services my request, so long as someone does", and "I'll try servers myself until one works".

I don't see one as detracting from the other (and I don't think you're suggesting that). I think it's worth putting your ideas in another issue, since tackling one or the other is a matter of priorities rather than supersession.

@rade
Copy link
Member

rade commented Nov 17, 2015

IPVS might give us more sophisticated in-kernel load balancing than "plain" iptables. See also this talk and twitter exchange.

@faddat
Copy link

faddat commented Dec 18, 2015

This user is interested in ensuring that he can have a single endpoint, and use virtual hosting on that single IP address, and then have weave send the traffic to the correct container.

@rade rade added this to the 1.7.0 milestone Jun 25, 2016
@rade rade modified the milestones: 1.8.0, overflow Oct 6, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants