Skip to content
tumarkin edited this page Mar 6, 2018 · 6 revisions

These instructions assume that yente is installed to your path as described in installation.

Assume you have a two data sets with each consisting of a list of entities (people, companies, etc...) with identifiers (an unique code such as an integer). These datasets should be saved as either a comma separated or tab separated file. These files should have header rows. The entity names should be a column called name and the identifiers in a column called id.

Let's say that for each entity in the FROM-FILE you want to find all the possible matches in the TO-FILE. To do so, open a terminal or DOS prompt and type:

yente FROM-FILE TO-FILE -o OUTPUT-FILE.

This will save the results in OUTPUT-FILE, with a progress bar indicating time remaining in seconds.

In the default, yente will output the best match for each entity in the FROM-FILE. That matches will be scored between 0 and 1, with 1 be a perfect match.

Further options are available to customize the matching process as described in Advanced use.

Should you require any help, type

yente --help

to display a list of options.