Skip to content
Record Linkage ToolKit (Find and link entities)
Python Shell
Branch: master
Clone or download
Latest commit f89b009 May 3, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
binder fix path Dec 10, 2018
docs
examples add example for dedup Apr 23, 2019
rltk validate property name May 3, 2019
.gitignore add core class to support new api Feb 28, 2017
.travis.yml only deploy to pypi when py=3.6 Apr 4, 2019
LICENSE Initial commit Feb 15, 2017
MANIFEST.in add manifest, fix setup Apr 20, 2018
README.rst add online nb to readme Dec 10, 2018
VERSION update version May 3, 2019
make_release.sh add version and release helper script Apr 20, 2018
requirements.txt remove parallelprocessor and mapreduce from rltk, use pyrallel instead Apr 5, 2019
requirements_dev.txt add optional dependencies to dev requirement file Apr 4, 2019
requirements_docs.txt add optional dependencies to dev requirement file Apr 4, 2019
setup.py fix encoding of reading from readme Mar 22, 2019

README.rst

RLTK: Record Linkage ToolKit

License Travis pypi Documents

The Record Linkage ToolKit (RLTK) is a general-purpose open-source record linkage platform that allows users to build powerful Python programs that link records referring to the same underlying entity. Record linkage is an extremely important problem that shows up in domains extending from social networks to bibliographic data and biomedicine. Current open platforms for record linkage have problems scaling even to moderately sized datasets, or are just not easy to use (even by experts). RLTK attempts to address all of these issues.

RLTK supports a full, scalable record linkage pipeline, including multi-core algorithms for blocking, profiling data, computing a wide variety of features, and training and applying machine learning classifiers based on Python’s sklearn library. An end-to-end RLTK pipeline can be jump-started with only a few lines of code. However, RLTK is also designed to be extensible and customizable, allowing users arbitrary degrees of control over many of the individual components. You can add new features to RLTK (e.g. a custom string similarity) very easily.

RLTK is being built by the Center on Knowledge Graphs at USC/ISI, with funding from multiple projects funded by the DARPA LORELEI and MEMEX programs and the IARPA CAUSE program. RLTK is under active maintenance and we expect to keep adding new features and state-of-the-art record linkage algorithms in the foreseeable future, in addition to continuously supporting our adopters to integrate the platform into their applications.

Getting Started

Installation (make sure prerequisites are installed):

pip install -U rltk

Example:

>>> import rltk
>>> rltk.levenshtein_distance('abc', 'abd')
1

Try RLTK Online

Datasets & Experiments

Documentation

You can’t perform that action at this time.