Repairing stale data for nodes that were down and now are back online but the nodes are far behind and a read repair would take too long.
Naively splitting up the entire range to calculate checksums for comparison, is not very feasible.
We can use merkle trees = binary tree of hashes, where each internal node is the hash of its two children, and each leaf node is a hash of a portion of the original data
We can use merkle trees by:
- comparing the root hashes of both trees.
- if they are equal, stop.
- recursively checking the left and right children.
This way the amount of data exchanged is minimized. The principal advantage of a Merkle tree is that each branch of the tree can be checked independently without requiring nodes to download the entire tree or the entire data set.