deduplication
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 23 public repositories matching this topic...
Virtual Versionized Packaging
-
Updated
Jun 24, 2024 - C++
System prototype for USENIX ATC 2015: "Convergent Dispersal Deduplication Datastore"
-
Updated
Apr 1, 2019 - C++
Designed, implemented, and tested Reduct, a functional deduplicated distributed file system.
-
Updated
Jun 23, 2020 - C++
System prototype for MSST 2019: "Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection"
-
Updated
Sep 24, 2020 - C++
Yet another block-level btrfs deduplication tool. Other filesystems with FIEMAP and FIDEDUPERANGE may also compatible.
-
Updated
Oct 21, 2020 - C++
A backup program that does deduplication, compression, encryption
-
Updated
Jan 12, 2021 - C++
A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).
-
Updated
Feb 25, 2022 - C++
Deduplication over dis-aggregated memory for Serverless Computing
-
Updated
Mar 21, 2022 - C++
zpaqfranz/zpaq SFX module for Windows 32 and 64 bit
-
Updated
Sep 20, 2022 - C++
A command line application that finds duplicate files and removes them. Duplicate files can also be replaced with symbolic links or hard links.
-
Updated
Jan 24, 2023 - C++
DeDuplicationKit: Advanced File Storage Deduplication
-
Updated
Mar 31, 2023 - C++
DBRetina python package
-
Updated
May 20, 2024 - C++
Fast multi files (or STDIN) de-duplicator keeping lines order.
-
Updated
Oct 7, 2023 - C++
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
-
Updated
Oct 27, 2023 - C++
System prototype for DSN 2016: "REED: Rekeying-aware Encrypted Deduplication Storage"
-
Updated
Jan 20, 2024 - C++
HUMID: reference free FastQ deduplication
-
Updated
Jun 4, 2024 - C++
DedupBench is a benchmarking tool for data chunking techniques used in data deduplication. DedupBench is designed for extensibility, allowing new chunking techniques to be implemented with minimal additional code.
-
Updated
Sep 27, 2024 - C++
Created by Halbert L. Dunn
Released 1946
- Followers
- 38 followers
- Organization
- entity-resolution
- Wikipedia
- Wikipedia