9b6924e Apr 11, 2018
3 contributors

Users who have contributed to this file

@smalyshev @JDPUPONT-BNF @earldouglas
98 lines (72 sloc) 3.22 KB

Getting started

Clone the project

$ git clone  --recurse-submodules wikidata-query-rdf

Build it

$ cd wikidata-query-rdf
$ mvn package

The distributable package can be found in ./dist/target/

$ cd dist/target
$ unzip service-*
$ cd service-*/

Unzipping the package, we find a customized Blazegraph, Jetty launcher, launch scripts, and configuration:

├── blazegraph-service-0.1.0-dist.war
├── docs
├── gui
├── jetty-runner-9.2.9.v20150224.jar
├── lib

Optionally modify

Launch Blazegraph

$ ./

Blazegraph is now running on localhost:9999 with a fresh database.

Load the dump

  • Create data directory:
$ mkdir data
$ mkdir data/split
$ ./ -f data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s

The option -l en only imports English labels. The option -s skips the sitelinks, for smaller storage and better performance. If you need labels in other languages, either add them to the list - -l en,de,ru - or skip the language option altogether. If you need sitelinks, remove the -s option.

  • The Munger will produce a lot of data files named like wikidump-000000001.ttl.gz, wikidump-000000002.ttl.gz, etc. To load these files, you can use the following script:
$ ./ -n wdq -d `pwd`/data/split

This will load the data files one by one into the Blazegraph data store. Note that you need curl to be installed for it to work.

You can also load specific files:

$ ./ -n wdq -d `pwd`/data/split/wikidump-000000001.ttl.gz

Run updater

To update the database from Wikidata fresh edits, open a second terminal and run the updater with the wdq Blazegraph namespace:

$ ./ -n wdq

The updater is designed to run constantly, but can be interrupted and resumed at any time. Note that if you loaded an old dump, or did not load any dump at all, it may require very long time for the data to be full synchronized, as updater only picks up recently edited items. Use the same set of language/skip options as in the script, e.g. -l en -s.

Run queries

In order to query the database, you can use the GUI at http://localhost:9999/bigdata/. If you install it on a remote machine, you can configure SSH tunnel in your .ssh/config:

Host blazegraph-runner
  LocalForward localhost:9999

The REST query endpoint is located at http://localhost:9999/bigdata/namespace/wdq/sparql, read more at

Examples of SPARQL queries can be found at . For WDQ query translation, use (choose "Wikidata RDF Syntax").