dgomezferro edited this page Jun 1, 2012 · 27 revisions

Silent data corruptions are faults that modify the state or the code of a program without being detected. There are several possible reasons for silent data corruptions to occur; if you want to know more and understand why protecting against silent data corruptions is important, have a look at a few examples we collected.

PASC protects distributed systems against the effects of silent data corruptions. The user of PASC just needs to design its distributed system to tolerate crashes and message omissions. PASC takes care of executing all the checks related to silent data corruptions transparently and in the background.

Silent data corruptions are modeled as Arbitrary State Corruption (ASC) faults, a fault model that is tailored to realistic random misbehaviors due to data corruptions. Fault injection experiments on a Paxos implementation indicate that PASC successfully detect faults before they produce observable effects (see the USENIX ATC'12 paper for the results).

PASC requires you to implement your system as a set of deterministic message handlers operating on a process state and activated by messages, and provides templates for implementing these abstractions.

Here is more detailed information on these templates

Arbitrary State Corruptions vs. Byzantine faults

Unlike the well-known Byzantine model used in Byzantine fault tolerance, the ASC model does not cover security issues or bugs: typical production systems do not handle these issues using fault tolerance techniques. This reasonable reduction in coverage compared to the Byzantine fault model leads to significant performance and flexibility advantages. A thorough comparison of different fault models can be found in the USENIX ATC'12 paper.


We have benchmarked PASC by hardening an implementation of the [Paxos protocol](http://en.wikipedia.org/wiki/Paxos_(computer_science). Here we compare latency and throughput of unprotected Paxos, PASC-hardened Paxos and PBFT, a Byzantine-fault tolerant state machine replication protocol that can be seen as a Byzantine-tolerant version of Paxos. In all case we tolerate a single fault and send 0 byte requests and replies. We show results without batching and batching at most 100 messages at a time.

Latency vs throughput - 0 bytes

Unprotected Paxos is the most efficient option, but it only tolerate crashes. Data corruptions can result in unpredictable behavior. PASC Paxos also tolerate Arbitrary State Corruption faults, and imposes only a moderate performance reduction. PBFT is much less efficient, but it protects against worst-case faults as for example a malicious adversary trying to actively disrupt the system.

More details and a more extensive evaluation is available in the USENIX ATC'12 paper.