deduplication
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 423 public repositories matching this topic...
A simulator and related scripts for measuring efficiency of an encrypted data deduplication solution. https://eprint.iacr.org/2015/455.pdf
-
Updated
Mar 13, 2016 - Python
Tool to securely push incremental (think "rsync --link-dest") backups to tahoe-lafs
-
Updated
Apr 12, 2016 - Python
Fork of the Freely Extensible Biomedical Record Linkage program
-
Updated
Nov 4, 2016 - Python
Mirror of https://bitbucket.org/resteorts/smered
-
Updated
Mar 12, 2017 - Java
Python factory for limiting Celery tasks by configuration
-
Updated
May 24, 2017 - Python
Simple program that finds duplicate files, written in Rust.
-
Updated
Jun 4, 2017 - Rust
xplor
-
Updated
Jun 14, 2017 - R
Deduplication for cfDNA sequencing data
-
Updated
Jul 5, 2017 - Python
A collection of algorithms to generate a signature/fingerprint/hash in order to be used for detecting duplicate/near duplicate documents.
-
Updated
Aug 9, 2017 - Rust
technical details about implementing solr dedupe strategy in solr single node 6.4
-
Updated
Sep 15, 2017
-
Updated
Oct 13, 2017 - Python
Webpack plugin that resolves copies of the same module in different locations to a single path.
-
Updated
Oct 17, 2017 - JavaScript
An efficient backup tool inspired by Git, saving your bandwidth and providing global deduplication at file level.
-
Updated
Oct 18, 2017 - Python
Created by Halbert L. Dunn
Released 1946
- Followers
- 38 followers
- Organization
- entity-resolution
- Wikipedia
- Wikipedia