System prototype for USENIX ATC 2015: "Convergent Dispersal Deduplication Datastore"
-
Updated
Apr 1, 2019 - C++
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
System prototype for USENIX ATC 2015: "Convergent Dispersal Deduplication Datastore"
A command line application that finds duplicate files and removes them. Duplicate files can also be replaced with symbolic links or hard links.
DeDuplicationKit: Advanced File Storage Deduplication
System prototype for MSST 2019: "Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection"
DBRetina python package
Virtual Versionized Packaging
Designed, implemented, and tested Reduct, a functional deduplicated distributed file system.
A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).
Fast multi files (or STDIN) de-duplicator keeping lines order.
System prototype for DSN 2016: "REED: Rekeying-aware Encrypted Deduplication Storage"
Yet another block-level btrfs deduplication tool. Other filesystems with FIEMAP and FIDEDUPERANGE may also compatible.
zpaqfranz/zpaq SFX module for Windows 32 and 64 bit
HUMID: reference free FastQ deduplication
Deduplication over dis-aggregated memory for Serverless Computing
A Management Friendly Deduplication Prototype System for Backup
A backup program that does deduplication, compression, encryption
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
Created by Halbert L. Dunn
Released 1946