In [1]:
# Parameters
home = "/Users/amandeep"
f_path = "Documents/wikidata-20200504"
wiki_file = "all.tsv"


# Generating Useful Wikidata Files

In [2]:
import io
import os

import numpy as np
import pandas as pd

# from IPython.display import display, HTML, Image
# from pandas_profiling import ProfileReport

## Set up environment and folders to store the files

- `WIKIDATA_HOME` folder where you put your Wikidata data
- `OUT` folder where the output files go
- `TEMP` folder to keep temporary files , including the database
- `kgtk` shortcut to invoke the kgtk software
- `compress` the compression software to use

The current implementation of some of the kgtk commands does not understand compressed files. In particular, `query` often rejects `gz` files.

To dos:

- Make sure that all files have id columns as `query` gets unhappy when files have no ids.
- Create an output folder for a subset of Wikidata without scholarly articles. This is half done: the remaining work is to subtract the scholarly articles from `EDGES` and repeat the workflow.
- Change the naming convention to make it clear which files are a partition of the original `EDGES`, so users know what files they need to get to have a full version.
- Create a qualifier file for the partition files of Wikidata: this is so that if a user gets one of the partitions, they can get the corresponding qualifier file.
- Add pagerank and other stats. We can compute the pagerank from the `all.item` file, so maybe should be called `all.item.pagerank.tsv`

Naming convention: the name `all` is redundant, we should consider removing it. I recomment using the prefix `part.` to name the partition of Wikidata, e.g., `part.label`, `part.quantity`. Files such as `P279` are not partitions as it is a subset of `part.item`.

If we create a subset of Wikidata, e.g., no scholarly articles, we could call it `minus.Q13442814`; if we remove galaxies too, we could call it `minus.Q13442814-Q318`, so the files would be `minus.Q13442814-Q318.part.quantity.tsv` (the idea of `all` is in contrast to `minus`). We can also have files that start with Qnodes, e.g, `Q5.part.quantity.tsv`; constructing such files is harder as we don't want dangling nodes in the item file.

In [3]:
#home = %env HOME
# supply home from command line
# supply wiki_file from command line as well
# f_path from command line
%env WIKIDATA_HOME=$home/$f_path
wikidata_home = %env WIKIDATA_HOME
%env OUT = $wikidata_home/output
%env TEMP = $wikidata_home/temp

# Define $kgtk so that we can turn on and off the debugging options
%env kgtk = time kgtk --debug
#%env kgtk = kgtk
%env compress = compress
%env compress = cat

env: WIKIDATA_HOME=/Users/amandeep/Documents/wikidata-20200504
env: OUT=/Users/amandeep/Documents/wikidata-20200504/output
env: TEMP=/Users/amandeep/Documents/wikidata-20200504/temp
env: kgtk=time kgtk --debug
env: compress=compress
env: compress=cat


In [4]:
cd $wikidata_home

/Users/amandeep/Documents/wikidata-20200504


In [5]:
!mkdir output
!mkdir temp

mkdir: output: File exists


mkdir: temp: File exists


Clean up the output and temp folders before we start

In [6]:
!rm $OUT/*.tsv
!rm $TEMP/*.tsv

The `all` file contains 100M edges of the full dump, `all.10` contains 10M edges. This is for testing, as we should run on the full edges file.

In [7]:
%env STORE=$wikidata_home/temp/wikidata.sqlite3.db
# %env EDGES=$wikidata_home/all.10.tsv
%env EDGES=$wikidata_home/$wiki_file

#%env QUALS=$wikidata_home/wikidata-20200803-all-qualifiers.tsv.gz
#%env LABELS=$wikidata_home/wikidata-20200803-all-labels-en-sorted.tsv.gz

env: STORE=/Users/amandeep/Documents/wikidata-20200504/temp/wikidata.sqlite3.db
env: EDGES=/Users/amandeep/Documents/wikidata-20200504/all.tsv


Uncomment the line below to remove the sqllite2 database. It takes a long time to load all the data and create indices, so don't remove the database unless you change files that have already been loaded and you need to force a reload.

In [8]:
#rm $TEMP/wikidata.sqlite3.db

### Get a sample and force importing the edge file into the database

In [9]:
!$kgtk query -i $EDGES --limit 10 --graph-cache $STORE

zsh:1: command not found: time kgtk --debug


Force creation of the index on the label column

In [10]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(i)-[:P31]->(c)' \
    --limit 5

zsh:1: command not found: time kgtk --debug


Force creation of the index on the node2 column

In [11]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(i)-[r]->(:Q5)' \
    --limit 5

zsh:1: command not found: time kgtk --debug


### Count the number of edges

In [12]:
!$kgtk query -i $EDGES --graph-cache $STORE \
    --match 'all: ()-[r]->()' \
    --return 'count(r) as count' \
    --limit 10

zsh:1: command not found: time kgtk --debug


### Get the distribution of the label column
I would like to have it sorted numerically, but don't know how to make it happen

In [13]:
!$kgtk unique --column label -i $EDGES / sort2 -c node2 -r -o $OUT/all-distribution.tsv 

zsh:1: command not found: time kgtk --debug


In [14]:
!head $OUT/all-distribution.tsv | column -t -s $'\t' 

head: /Users/amandeep/Documents/wikidata-20200504/output/all-distribution.tsv: No such file or directory


### Compute files with labels, aliases and descriptions
Return the id, node1, label and node2 columns

In [15]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:label]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | $compress > $OUT/all.label.tsv

zsh:1: command not found: time kgtk --debug


In [16]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:alias]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | $compress > $OUT/all.alias.tsv

zsh:1: command not found: time kgtk --debug


In [17]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:description]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | $compress > $OUT/all.description.tsv

zsh:1: command not found: time kgtk --debug


### Now create files with the English labels, aliases and descriptions

In [18]:
!$kgtk query -i $OUT/all.label.tsv --graph-cache $STORE -o - \
    --match '()-[]->(n2)' \
    --where 'n2.kgtk_lqstring_lang = "en"' \
    | kgtk sort2 \
    | $compress > $OUT/all.label.en.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



In [19]:
!$kgtk query -i $OUT/all.alias.tsv --graph-cache $STORE -o - \
    --match '()-[]->(n2)' \
    --where 'n2.kgtk_lqstring_lang = "en"' \
    | kgtk sort2 \
    | $compress > $OUT/all.alias.en.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



In [20]:
!$kgtk query -i $OUT/all.description.tsv --graph-cache $STORE -o - \
    --match '()-[]->(n2)' \
    --where 'n2.kgtk_lqstring_lang = "en"' \
    | kgtk sort2 \
    | $compress > $OUT/all.description.en.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



Let's sample these files to see what they look like:

* we are getting all variants of English, we really want `en` only
* the labels have the language tags, how do we output only the string without the language tag?

In [21]:
!head $OUT/all.label.en.tsv | column -t -s $'\t' 

### Compute the distribution of the number of edges for each Wikidata type

In [22]:
!$kgtk unique --column 'node2;wikidatatype' -i $EDGES / sort2 -c node2 -r -o $OUT/all.wikidatatype.distribution.tsv

zsh:1: command not found: time kgtk --debug


In [23]:
!column -t -s $'\t' $OUT/all.wikidatatype.distribution.tsv

column: /Users/amandeep/Documents/wikidata-20200504/output/all.wikidatatype.distribution.tsv: No such file or directory


### Create a file to contain the edges for each wikidata type

In [24]:
types = [
    "time",
    "wikibase-item",
    "math",
    "wikibase-form",
    "quantity",
    "string",
    "external-id",
    "commonsMedia",
    "globe-coordinate",
    "monolingualtext",
    "musical-notation",
    "geo-shape",
    "wikibase-property",
    "url",
]
command = "$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l]->(n2 {wikidatatype: type})' \
    --return 'l, n1, l.label, n2'\
    --where 'type = \"TYPE\"' \
    | kgtk sort2 | $compress > $OUT/all.TYPE.tsv"
for type in types:
    cmd = command.replace("TYPE", type)
    print(cmd)
    os.system(cmd)

$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "time"'     | kgtk sort2 | $compress > $OUT/all.time.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "wikibase-item"'     | kgtk sort2 | $compress > $OUT/all.wikibase-item.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "math"'     | kgtk sort2 | $compress > $OUT/all.math.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "wikibase-form"'     | kgtk sort2 | $compress > $OUT/all.wikibase-form.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "quantity"'     | kgtk sort2 | $compress > $OUT/all.quantity.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "string"'     | kgtk sort2 | $compress > $OUT/all.string.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "external-id"'     | kgtk sort2 | $compress > $OUT/all.external-id.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "commonsMedia"'     | kgtk sort2 | $compress > $OUT/all.commonsMedia.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "globe-coordinate"'     | kgtk sort2 | $compress > $OUT/all.globe-coordinate.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "monolingualtext"'     | kgtk sort2 | $compress > $OUT/all.monolingualtext.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "musical-notation"'     | kgtk sort2 | $compress > $OUT/all.musical-notation.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "geo-shape"'     | kgtk sort2 | $compress > $OUT/all.geo-shape.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "wikibase-property"'     | kgtk sort2 | $compress > $OUT/all.wikibase-property.tsv


$kgtk query -i $EDGES --graph-cache $STORE -o -     --match '(n1)-[l]->(n2 {wikidatatype: type})'     --return 'l, n1, l.label, n2'    --where 'type = "url"'     | kgtk sort2 | $compress > $OUT/all.url.tsv


### Create a file with the sitelinks

In [25]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:wikipedia_sitelink]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | kgtk sort2 \
    | $compress > $OUT/all.wikipedia_sitelink.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



### Create a file that specifies for each node whether it is an item or a property

In [26]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:type]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | kgtk sort2 \
    | $compress > $OUT/all.type.tsv 

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



### Create the P31 and P279 files

In [27]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:P31]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | kgtk sort2 | $compress > $OUT/all.P31.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



In [28]:
!$kgtk query -i $EDGES --graph-cache $STORE -o - \
    --match '(n1)-[l:P279]->(n2)' \
    --return 'l, n1, l.label, n2' \
    | kgtk sort2 | $compress > $OUT/all.P279.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



In [29]:
!head $OUT/all.P31.tsv | column -t -s $'\t' 

In [30]:
!$kgtk cat -i $OUT/all.P279.tsv -i $OUT/all.P31.tsv -o - \
    | kgtk sort2 | $compress > $OUT/all.P31_P279.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



In [31]:
!head $OUT/all.P31_P279.tsv | column -t -s $'\t' 

### Create the file that contains all nodes reachable via P279 starting from a node2 in P31 or a node1 in P279

First compute the roots

In [32]:
!$kgtk query -i $OUT/all.P279.tsv --graph-cache $STORE -o - \
    --match '(n1)-[]->()' \
    --return 'n1 as node' \
    | kgtk sort -c node > $TEMP/P279.n1.tsv

zsh:1: command not found: time kgtk --debug


INTERNAL ERROR: sh.

Original exception:

    Traceback (most recent call last):
      File "/Users/amandeep/Github/kgtk/kgtk_env/lib/python3.6/site-packages/sh.py", line 2082, in __init__
        preexec_fn()
      File "/Users/amandeep/Github/kgtk/kgtk/cli/sort.py", line 152, in wait_for_key_spec
        raise KGTKException('INTERNAL ERROR: failed to communicate sort key')
    kgtk.exceptions.KGTKException: INTERNAL ERROR: failed to communicate sort key
    




In [33]:
!$kgtk query -i $OUT/all.P31.tsv --graph-cache $STORE  -o - \
    --match '()-[]->(n2)' \
    --return 'n2 as node' \
    | kgtk sort -c node > $TEMP/P31.n2.tsv

zsh:1: command not found: time kgtk --debug


INTERNAL ERROR: sh.

Original exception:

    Traceback (most recent call last):
      File "/Users/amandeep/Github/kgtk/kgtk_env/lib/python3.6/site-packages/sh.py", line 2082, in __init__
        preexec_fn()
      File "/Users/amandeep/Github/kgtk/kgtk/cli/sort.py", line 152, in wait_for_key_spec
        raise KGTKException('INTERNAL ERROR: failed to communicate sort key')
    kgtk.exceptions.KGTKException: INTERNAL ERROR: failed to communicate sort key
    




In [34]:
!$kgtk cat --mode NONE $TEMP/P31.n2.tsv $TEMP/P279.n1.tsv \
    / compact --mode NONE --columns node \
    > $TEMP/P279.roots.tsv

zsh:1: command not found: time kgtk --debug


Now we can invoke the reachable-nodes command

In [35]:
!$kgtk reachable-nodes \
    --rootfile $TEMP/P279.roots.tsv \
    --rootfilecolumn 0 \
    --subj 1 --pred 2 --obj 3 \
    $OUT/all.P279.tsv \
    | kgtk sort2 \
    | $compress > $TEMP/P279.reachable.tsv

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



The reachable-nodes command produces edges labeled `reachable`, so we need one command to rename them.

In [36]:
!$kgtk query -i $TEMP/P279.reachable.tsv --graph-cache $STORE  -o - \
    --match '(n1)-[]->(n2)' \
    --return 'n1, "P279star" as label, n2 as node2' \
     > $TEMP/P279star.1.tsv

zsh:1: command not found: time kgtk --debug


We also want `P279star` to be relflexive, ie, contain `(n1)-[:P279star]->(n1)` for all node1

In [37]:
!$kgtk query -i $TEMP/P279.reachable.tsv --graph-cache $STORE  -o - \
    --match '(n1)-[]->(n2)' \
    --return 'n1 as node1, "P279star" as label, n1 as node2' \
     > $TEMP/P279star.2.tsv

zsh:1: command not found: time kgtk --debug


In [38]:
!$kgtk query -i $TEMP/P279.reachable.tsv --graph-cache $STORE  -o - \
    --match '(n1)-[]->(n2)' \
    --return 'n2 as node1, "P279star" as label, n2 as node2' \
     > $TEMP/P279star.3.tsv

zsh:1: command not found: time kgtk --debug


In [39]:
!$kgtk query -i $OUT/all.P31.tsv --graph-cache $STORE  -o - \
    --match '(n1)-[]->(n2)' \
    --return 'n2 as node1, "P279star" as label, n2 as node2' \
     > $TEMP/P279star.4.tsv

zsh:1: command not found: time kgtk --debug


Now we can concatenate these files to produce the final output

In [40]:
!$kgtk cat --mode NONE $TEMP/P279star.1.tsv $TEMP/P279star.2.tsv $TEMP/P279star.3.tsv $TEMP/P279star.4.tsv \
    | kgtk compact \
    | kgtk sort2 \
    | kgtk add-id --id-style node1-label-node2-num \
    > $OUT/all.P279star.tsv

zsh:1: command not found: time kgtk --debug


No header line in file


In input header '': Column 0 has an invalid name in the file header
In input header '': Column 0 has an invalid name in the file header
Exit requested
KGTKException found



This is difficult to test with our Wikidata subset because our hierarchy is very sparse.

This is how we would do the typical `?item P31/P279* ?class` in Kypher. 
The example shows how to get all the `n1` that are instances of subclasses of beer (q44).

In [41]:
!$kgtk query -i $OUT/all.P31.tsv -i $OUT/all.P279star.tsv --graph-cache $STORE  -o - \
    --match 'P31: (n1)-[:P31]->(c), P279star: (c)-[]->(:Q44)' \
    --return 'count(n1) as count'

zsh:1: command not found: time kgtk --debug


### Create a file to do generalized Is-A queries
The idea is that `(n1)-[:isa]->(n2)` when `(n1)-[:P31]->(n2)` or `(n1)-[:P279]->(n2)`

We do this by concatenating the files and renaming the relation

In [42]:
!$kgtk cat $OUT/all.P31.tsv $OUT/all.P279.tsv \
    > $TEMP/isa.1.tsv

zsh:1: command not found: time kgtk --debug


In [43]:
!$kgtk query -i $TEMP/isa.1.tsv --graph-cache $STORE  -o - \
    --match '(n1)-[]->(n2)' \
    --return 'n1, "isa" as label, n2' \
    | kgtk sort2 \
    | $compress > $OUT/all.isa.tsv 

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



Example of how to use the `isa` relation

In [44]:
!$kgtk query -i $OUT/all.isa.tsv -i $OUT/all.P279star.tsv --graph-cache $STORE  -o - \
    --match 'isa: (n1)-[l:isa]->(c), P279star: (c)-[]->(:Q44)' \
    --return 'distinct n1, l.label, "Q44" as node2' \
    --limit 10

zsh:1: command not found: time kgtk --debug


### Creating a subset of Wikidata without scholarly articles (Q13442814)
First create a file with the schloarly articles

In [45]:
!$kgtk query -i $OUT/all.isa.tsv -i $OUT/all.P279star.tsv --graph-cache $STORE  -o - \
    --match 'isa: (n1)-[l:isa]->(n2:Q13442814)' \
    --return 'distinct n1, l.label, n2' \
    | kgtk sort2 \
    | $compress > $OUT/all.isa.Q13442814.tsv 

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found



Now we need to remove from `$EDGES` any edge where node1 or node2 is in node1 of `$OUT/all.isa.Q13442814.tsv`. The result will be `$OUT/minus.Q13442814.tsv`. We can then run the whole notebook with this new file as $EDGES and compute all the product files in a new output directory

In [46]:
!head $OUT/all.isa.Q13442814.tsv | column -t -s $'\t' 

## Summary

In [47]:
!wc -l $OUT/*.tsv $EDGES

       1 /Users/amandeep/Documents/wikidata-20200504/output/all.P279.tsv
       0 /Users/amandeep/Documents/wikidata-20200504/output/all.P279star.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.P31.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.P31_P279.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.alias.en.tsv
       0 /Users/amandeep/Documents/wikidata-20200504/output/all.alias.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.commonsMedia.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.description.en.tsv
       0 /Users/amandeep/Documents/wikidata-20200504/output/all.description.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.external-id.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.geo-shape.tsv
       1 /Users/amandeep/Documents/wikidata-20200504/output/all.globe-coordinate.tsv
       1 /Users/amandeep/Documents/wikidata-2020050

Number of distinct items in our dataset

In [48]:
!$kgtk query -i $EDGES --graph-cache $STORE  -o - \
    --match '(n1)-[]->()' \
    --return 'count(distinct n1) as count'

zsh:1: command not found: time kgtk --debug


## Other Stuff

Little bug: if two files are specified as input, only one is used.

In [49]:
!$kgtk query -i $OUT/all.isa.tsv -i $OUT/all.P279star.tsv --graph-cache $STORE  -o - \
    --match 'P279star: (c)-[]->(:Q44)' \
    --limit 10

zsh:1: command not found: time kgtk --debug


In [50]:
!$kgtk query -i $OUT/all.isa.tsv -i $OUT/all.P279star.tsv --graph-cache $STORE  -o - \
    --match '(c)-[]->(:Q44)' \
    --limit 10

zsh:1: command not found: time kgtk --debug


In [51]:
!$kgtk query -i $OUT/all.isa.tsv -i $OUT/all.P279star.tsv --graph-cache $STORE  -o - \
    --match 'isa: (n1)-[l:isa]->(n2:Q318)' \
    --return 'distinct n1, l.label, n2' \
    | kgtk sort2 \
    | $compress > $OUT/all.isa.Q318.tsv 

zsh:1: command not found: time kgtk --debug


In input header '': Column 0 has an invalid name in the file header
KGTKException found

