Rust implementation of the phi accrual failure detector
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
proto
src
tracetests
.gitignore
Cargo.toml
Dockerfile
README.markdown
build.rs

README.markdown

phifd - Rust implementation of the Phi accrual failure detector

Concepts

The Phi accrual failure detector enables applications to respond to group members' suspicion levels, as opposed to binary {failed,alive} decisions made by the failure detector itself.

For each member in the group, the FD outputs a suspicion value, guaranteed to increase monotonically, and to tend to infinity for failed nodes. Applications can contol the tradeoff between fast detection and high accuracy by tweaking the threshold suspicion level phi at which a node is deemed failed.

A low value of phi will result in fast failure detections, at the expense of increase false alarms, whereas increasing phi will make detections more accurate, but will cause true detections of true failures to be delayed accordingly.

One can hence build a group membership service on top of this FD by thresholding appropriately the suspicion values for each node.

Code

High level

This crate is based on the tokio platform. The entry point is PhiFD::run() in src/lib.rs. The execution is modeled as two main future streams flowing into a future sink. The first stream is a stream of periodic pings sent out to peers. The second stream is the stream of incoming pings from peers, in response to each of which we must update the failure detector state. The fist stream (call it the pinger) doesn't actually send anything, but produces, as events, a Vec of pings that need to be sent out. The second stream (called the ping listener) wraps the stream of incoming pings such that upon each ping, the internal state is update as a side effect, and a confirmation of this is emitted as an event.

These two streams are then combined into one by allowing them to "race" each other -- this new stream emits values as values become available on either the pinger or the ping listener streams. So each value in this new stream is either an acknowledgement of an incoming ping being successfully processed (on the ping listener stream), or a Vec of pings to be sent out. So we derive yet another stream from this by filtering only for values of the latter kind (outstanding pings). This stream is then "forwarded" to a sink wrapping over a UDP socket. This completes the core flow of the code.

It is important to note that with most async frameworks, and with tokio in particular, specifying what your stream does is the bulk of the work, but we need to reduce all of this down to a future that can then be executed to completion by the event loop. As such, the forward method on streams returns a future representing the work of pumping the invocant stream into the sink in question. This future is then run on the event loop, and once it starts running, all the participant streams and sinks crank into action.

Implementation details

State management

All state is kept in the FDState struct, which is held by the PhiFD struct behind a Rc<RefCell<.>>. Read this chapter if you're not familiar with those wrapper types. In short, an immutable RefCell<FDState> can be used to mutate the underlying FDState value, with R/W borrow checks happening at runtime. The Rc smart pointer is a refcounted container that allows multiple owners of a value (which normally is forbidden in Rust). This is necessary when we need to access state mutably from multiple closures that arise as callbacks in futures based code.

Transport

The pinging happens over UDP with each datagram containing a protobuf serialized gossip message. The definition can be found in proto/msg.proto. The protobuf compiler is invoked upon every build in build.rs, and this generates the Rust structs in src/proto.

To decode and encode ping messages transparently, and to make available the stream abstraction for incoming messages, we have a "codec" (tokio terminology) called GossipCodec in src/lib.rs. It basically describes what to do with incoming messages and how to serialize outgoing messages (and to whom should they be sent). In our case, we just use methods on our rust-protobuf generated structs. A minor detail here is that the codec works with tuples of (SocketAddr, Gossip). In other words, it decodes incoming datagrams into a tuple of the sender address and the decoded ping message contents, and encodes tuples of (recipient address, ping message) into datagrams containing the serialized message addressed to the given recipient. Once the codec is in place, tokio's "framed" abstraction gives a stream and a sink representing incoming messages, and a hole for outgoing messages, for the underlying UDP socket, respectively.