Deduplicating archiver with compression and authenticated encryption.
-
Updated
Jul 19, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Deduplicating archiver with compression and authenticated encryption.
A powerful and modular toolkit for record linkage and duplicate detection in Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Simple, configuration-driven backup software for servers and workstations
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Scalable toolkit for data curation
Open source project for data preparation of LLM application builders
Record Linkage ToolKit (Find and link entities)
Dedupe/batch geocode addresses and venues around the world with libpostal
Fork of the Freely Extensible Biomedical Record Linkage program
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Fast block-level out-of-band BTRFS deduplication tool.
FastCDC implementation in Python https://pypi.org/project/fastcdc/
CLI utility to find near duplicate images and remove all but the best copy.
Backend (Docker & API) for matchID project
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
Created by Halbert L. Dunn
Released 1946