Deduplicating archiver with compression and authenticated encryption.
-
Updated
Jul 6, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Deduplicating archiver with compression and authenticated encryption.
CLI utility to find near duplicate images and remove all but the best copy.
Face detection and retrieval in image and video files.
Open source project for data preparation of LLM application builders
Simple, configuration-driven backup software for servers and workstations
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
A dictionary that de-duplicates values.
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Prototype for basic deduplication and aggregation of eCQM data
The SQL/Ibis powered sklearn of record linkage
FastCDC implementation in Python https://pypi.org/project/fastcdc/
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
EROFS documentation repo for https://erofs.docs.kernel.org
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation.
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Tool to remove duplicate text messages (SMS/MMS). RCS support is also available for some clients.
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
Created by Halbert L. Dunn
Released 1946