Skip to content
Anders Pearson edited this page Jan 25, 2015 · 1 revision

Read repair is another idea borrowed from Riak.

It takes the basic operations that happen to a file during Active Anti Entropy--checking the file against its hash for corruption and making sure it is replicated to the "right" set of nodes in the cluster--and schedules that to happen in the background after a file is retrieved from the cluster.

The benefits are that "hot" data is kept healthy and it provides a mechanism to externally "force" active anti entropy. Eg, if you've lost a node from your cluster and you have a lot of files stored, it could take a while for AAE to get around to repairing them all. If you know that some of your data is more critical than others, you could write a small script to request that data right away, triggering read repair on it.

A proper method for cleanly decomissioning a node is still in the works, but for now, it would be fairly straightforward to cleanly remove a node from the cluster by shutting that node down, then going through its data directory and requesting each key from the rest of the cluster, triggering the read-repair on each. If fact, when Cask does get a decomission feature added, that's roughly how it will work internally.

Clone this wiki locally