🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
-
Updated
Aug 5, 2024 - Python
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
A powerful and modular toolkit for record linkage and duplicate detection in Python
🆔 Command line tool for deduplicating CSV files
🆔 Examples for using the dedupe library
Identifying and removing near-duplicate images using perceptual hashing.
Fast block-level out-of-band BTRFS deduplication tool.
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Fast Scalable Dedupe - Fuzzy Matching With Opensearch + nmslib + Rapidfuzz
Base class for dedupe variables for parsed fields
Project to take two similar zipfiles, and to dedupe files that have the same tiemstamp in the older file.
Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
Duplicate file finder - with % duplication of folders
Yet another tool to find and remove duplicate files.
The goal of this project is to make a deduper program that anybody can run on their computer to save storage space.
Add a description, image, and links to the dedupe topic page so that developers can more easily learn about it.
To associate your repository with the dedupe topic, visit your repo's landing page and select "manage topics."