Deduplicating archiver with compression and authenticated encryption.
-
Updated
Jun 7, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Deduplicating archiver with compression and authenticated encryption.
Simple, configuration-driven backup software for servers and workstations
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
A powerful and modular toolkit for record linkage and duplicate detection in Python
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Fast block-level out-of-band BTRFS deduplication tool.
CLI utility to find near duplicate images and remove all but the best copy.
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
Record Linkage ToolKit (Find and link entities)
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
图片查重、图片去重、Find/Delete duplicated images
Dedupe/batch geocode addresses and venues around the world with libpostal
Python package for deduplication/entity resolution using active learning
Created by Halbert L. Dunn
Released 1946