All-in-one text de-duplication
-
Updated
May 21, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
All-in-one text de-duplication
CD4Py: Code De-Duplication for Python
Container image converter aiming to minimize image size and speed up boot time dramatically with block-level de-dupliction and lazy-pull technology.
Preprocesses the query logs which can be used by suggesters like Most Popular Suggester (MPS).
A collection of algorithms to generate a signature/fingerprint/hash in order to be used for detecting duplicate/near duplicate documents.
Efficient Event Streamlining and Dynamic De-Duplication Across Message Brokers - A Technology-Agnostic approach
Repository contains java application code to stream records to a Kinesis Stream, consume , de-duplicate and strictly order records and display records on a dashboard
A misc project of python based function to track, survey and manage files from mutiple systems
A secure image storage application written in Python
Created by Halbert L. Dunn
Released 1946