Skip to content
Anders Pearson edited this page Oct 18, 2017 · 4 revisions

For Cask to operate most efficiently, it's important for all the nodes to relatively quickly converge on the same view of the overall cluster. In other words, each node in the cluster should know about the existence of every other node. If a new node joins the cluster, it needs to be told about all the other nodes, and all the other nodes need to learn about it.

Note: As of 2017, Cask has switched to using hashicorp/memberlist for the gossip and failure detection implementation instead of using its own. The rest of this document is now outdated, but I'm leaving it up in case you're curious. FWIW, the built-in gossip in Cask worked fine; switching to memberlist just let us remove a bunch of code and lean on a more widely used component instead.


One approach is to configure this statically. Ie, the config for each node contains a list of every other node in the cluster. This becomes an administrative nightmare though once you have more than a couple nodes.

Instead Cask uses a simple, somewhat brute force and inefficient, but effective Gossip algorithm.

There's a small amount of static configuration to bootstrap things; that's more or less unavoidable. The config for each node starts off with a list of one or more other nodes. When the node starts up, the first thing it does is contact each node in that list and announce "here I am and here's my info". The nodes it reaches will respond by adding the new node to their understanding of the cluster, sending their info back to the new node (so it can do likewise), and finally, they each go through their list of other neighbor nodes and send it a message saying "hey, there's a new node at the following address, you should check it out". Those nodes, if it's the first they've heard of the new node will then reach out to the new node and the cycle continues as long as there are any nodes just learning about a new node for the first time.

Like I said, this is somewhat brute force and inefficient. When a new node joins the cluster, there's quite a burst of messages between the nodes. Many of those messages are redundant. Ie, node A joins and says hi to node B. node B sends an alert to its neighbors C, D, E, and F. Meanwhile node A says hi to C, and that triggers C to send alerts to B, D, E, and F. Etc. The messages are fairly small though and the nodes are smart enough to ignore alerts about a new node after the first one. So there's a burst, everyone gets connected to the new node, and then things die back down. This works OK when the cluster is relatively small (less than a few hundred nodes), which so far is Cask's target. If anyone wants Cask to scale up to a larger cluster than that, the fix will be that instead of a node alerting all of its neighbors, it picks one or two at random and only alerts those. That will make it take a few cycles longer for the cluster to converge, but will not have an exponential explosion of messages.

Nodes in the cluster also send heartbeats to each other on a regular basis (default is one minute). The heartbeat just tells the other nodes that it's still alive and well. Each node keeps track of how long it's been since it heard a heartbeat from each other node. A background "reaper" periodically goes through the list and, if it hasn't gotten a heartbeat from a node in a while (eg, more than three minutes), assumes that that node is dead and removes it from the list. If that node shows up again later (eg, after a network partition has healed), it will be joined into the cluster and announced to neighbors all over again.

Clone this wiki locally