Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
Updated
May 31, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
A powerful and modular toolkit for record linkage and duplicate detection in Python
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
🔎 Finds fuzzy matches between CSV files
Link Wikidata items to large catalogs
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
A maximum-strength name parser for record linkage.
🔎 Finds fuzzy matches between datasets
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
An open-source compound AI toolchain for fast and accurate entity matching, powered by LLMs.
A Single View application aggregates and reconciles data from multiple sources to create a single view of an entity.
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
Created by Halbert L. Dunn
Released 1946