utilities for working with Entity Resolution models
-
Updated
Feb 14, 2022 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
utilities for working with Entity Resolution models
🔎 Finds fuzzy matches between datasets
🕸️ Little helper for handling entity clusters
A maximum-strength name parser for record linkage.
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
Entity Resolution and Record Linkage library
Record linkage - simple, flexible, efficient.
Deduplicate data using fuzzy and deterministic matching rules.
Fast, accurate, open-source geocoding in Python
Interpretable metadata for the results of NHS England record linkage
Range of computer science applications using Python.
Python library for the generation and mutation of realistic personal identification data at scale
Lo scopo di questo progetto è quello di confrontare l’efficienza di diversi metodi per il confronto approssimativo di stringhe applicate nell’ambito del record linkage.
PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.
🧱 blocking methods for entity resolution
Example scripts for generating data with Gecko
A database management system for restaurant inspection records, restaurant-related tweets, and other relevant data.
A Python package designed to allow health, biomedical and other researchers to clean (standardise) and deduplicate or link data sets of all sizes faster, with less effort and with improved quality.
Tools for improved blocking for historical record linkage
a Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
Created by Halbert L. Dunn
Released 1946