Similarity and distance measures for clustering and record linkage applications in R
-
Updated
Mar 16, 2022 - R
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Similarity and distance measures for clustering and record linkage applications in R
Clustering and Link Prediction Evaluation in R
An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.
An R interface for the dblink Spark application
Entity resolution research project looking at what can be enabled by construing it as a problem of calibration from similarity to log-odds of a true match.
Duplicate detection in R using a Bayesian partitioning approach
Details for reproducing the experiments in our d-blink paper
Pre-processing script for data from the Survey of Household Income and Wealth
Created by Halbert L. Dunn
Released 1946