deduplication
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 399 public repositories matching this topic...
A UI application for File Deduplication using Hashing
-
Updated
Jan 18, 2018 - Java
Blazing Fast Petabyte Scale Static Web Server + Tools. Serve Billion Files from an Indexed, Compressed and Deduplicated Archive.
-
Updated
Feb 19, 2019 - PHP
Efficiently import pictures while handling duplicates gracefully
-
Updated
Sep 20, 2022 - Haskell
A workflow template for deduplication and record linkage using the Dedupe library
-
Updated
Jun 30, 2020 - Jupyter Notebook
Deduplicate Google Calendar events that were created by Fastmail import
-
Updated
Feb 1, 2022 - Python
ATBU Cloud/Local Backup & File Integrity/Duplication Management Utility
-
Updated
Nov 10, 2023 - Python
System prototype for USENIX ATC 2015: "Convergent Dispersal Deduplication Datastore"
-
Updated
Apr 1, 2019 - C++
Model for data deduplication assignment.
-
Updated
Feb 21, 2018 - Python
A command line application that finds duplicate files and removes them. Duplicate files can also be replaced with symbolic links or hard links.
-
Updated
Jan 24, 2023 - C++
A script which provides a way to create an item set which is de-duplicated by generating an MD5 for each item based on the concatenation of values yielded by a metadata profile for each item
-
Updated
Apr 15, 2021 - Ruby
python script to analyze dedup usage in btrfs
-
Updated
Sep 5, 2019 - Python
-
Updated
Dec 7, 2020 - Python
Deduplication/backup tool with extremely high 'compression' rate
-
Updated
May 30, 2023 - Python
Removes repeating pages with same page number in PDFs prepared for presentation purposes.
-
Updated
Dec 16, 2021 - Python
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
-
Updated
Apr 17, 2023 - Python
Created by Halbert L. Dunn
Released 1946
- Followers
- 34 followers
- Organization
- entity-resolution
- Wikipedia
- Wikipedia