Skip to content


Subversion checkout URL

You can clone with
Download ZIP
A set of scripts that is a functional approach to creating a domain specific LOD name directory
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.

Linked Jazz

Mou icon

Name Directory Creation

These set of scripts are a functional approach to creating a domain specific LOD name directory. It works with extract files that are sequentially processed, no DB interface needed just the extracts and the scripts. The scripts uses keywords to build our Jazz directory but the keywords could easily be replaced to create a name directory for other domains. A lot of the process it designed so it will work on a VPS but some parts ( needed to be done locally.


Requires osx/linux command line tools, grep, wget, etc..

Extracts Needed:

The process requires a number of extract files from dbpedia and the Library of Congress


(When a new version of dbpedia extract comes you would need to change the urls below)

Library of Congress:

Extract these files into the data directory (you are going to need a lot of space)


Building the directory is just running the scripts in order.


This takes a article category approach to everything related to jazz and filters it down to people. It is diagramed in filterDBpediaJazzFile.pdf


Takes the enormous LC data file and creates a new LC lookup that is more manageable. The first step it does it create personURIs.nt, this could be done locally and added to the extract data on a server to reduce the space needed. Making this file will take a long time as its greping a 30GB extract. The process is in filter_LOC_filterLOCskos.pdf.


This adds birth and death dates to the name directory for people who don't have that data structured but it is in their abstract. Just cares about the year.


This attempts to merge the two authorities based on name and dates, it makes a number of final name directory sameAs_*.nt files based on the confidence of the match. Documented in mergeLOCandDBpedia.pdf


Optional, this script creates an auxiliary file for the sameAs files which has the person image if in wikipedia and their short abstract.

Something went wrong with that request. Please try again.