# Class Visualization



### Preamble: set up the environment and files used in the tutorial

In [2]:
import io
import os
import subprocess
import sys

import numpy as np
import pandas as pd
from IPython.display import display, HTML

from graph_tool.all import *

from kgtk.configure_kgtk_notebooks import ConfigureKGTK
from kgtk.functions import kgtk, kypher

In [3]:
# Parameters

kgtk_path = "/Users/amandeep/GitHub/kgtk"

# Folder on local machine where to create the output and temporary folders
input_path = "/Volumes/saggu-ssd/wikidata-dwd-v3"
input_path = "/data/amandeep/wikidata-20211027-dwd-v3"
output_path = "/Volumes/saggu-ssd/wikidata-dwd-v3"
output_path = "/data/amandeep/wikidata-20211027-dwd-v3"
project_name = "class-visualization"

Our Wikidata distribution partitions the knowledge in Wikidata into smaller files that make it possible for you to pick and choose which files you want to use. Our tutorial KG is a subset of Wikidata, and is partitioned in the same way as the full Wikidata. The following is a partial list of all the files:

In [4]:
files = [
    "p279",
    "p279star",
    "label"
]

# statistics.Pinstance_count.tsv.gz

ck = ConfigureKGTK(files, kgtk_path=kgtk_path)
ck.configure_kgtk(input_graph_path=input_path,
                  output_path=output_path,
                  project_name=project_name,
                  debug=True
                 )

User home: /nas/home/amandeep
Current dir: /data/amandeep/github/kgtk-notebooks/use-cases
KGTK dir: /Users/amandeep/GitHub/kgtk
Use-cases dir: /Users/amandeep/GitHub/kgtk/use-cases


The KGTK setup command defines environment variables for all the files so that you can reuse the Jupyter notebook when you install it on your local machine.

In [5]:
ck.print_env_variables()

KGTK_LABEL_FILE: /data/amandeep/wikidata-20211027-dwd-v3/labels.en.tsv.gz
STORE: /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/wikidata.sqlite3.db
kypher: kgtk --debug query --graph-cache /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/wikidata.sqlite3.db
kgtk: kgtk --debug
EXAMPLES_DIR: /Users/amandeep/GitHub/kgtk/examples
KGTK_GRAPH_CACHE: /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/wikidata.sqlite3.db
KGTK_OPTION_DEBUG: false
OUT: /data/amandeep/wikidata-20211027-dwd-v3/class-visualization
TEMP: /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization
USE_CASES_DIR: /Users/amandeep/GitHub/kgtk/use-cases
GRAPH: /data/amandeep/wikidata-20211027-dwd-v3
p279: /data/amandeep/wikidata-20211027-dwd-v3/derived.P279.tsv.gz
p279star: /data/amandeep/wikidata-20211027-dwd-v3/derived.P279star.tsv.gz
label: /data/amandeep/wikidata-20211027-dwd-v3/labels.

In [6]:
ck.load_files_into_cache()

kgtk --debug query --graph-cache /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/wikidata.sqlite3.db -i "/data/amandeep/wikidata-20211027-dwd-v3/derived.P279.tsv.gz" --as p279  -i "/data/amandeep/wikidata-20211027-dwd-v3/derived.P279star.tsv.gz" --as p279star  -i "/data/amandeep/wikidata-20211027-dwd-v3/labels.en.tsv.gz" --as label  --limit 3
[2022-04-15 10:04:17 sqlstore]: IMPORT graph directly into table graph_1 from /data/amandeep/wikidata-20211027-dwd-v3/derived.P279.tsv.gz ...
[2022-04-15 10:04:23 sqlstore]: IMPORT graph directly into table graph_2 from /data/amandeep/wikidata-20211027-dwd-v3/derived.P279star.tsv.gz ...
[2022-04-15 10:07:12 sqlstore]: IMPORT graph directly into table graph_3 from /data/amandeep/wikidata-20211027-dwd-v3/labels.en.tsv.gz ...
[2022-04-15 10:08:41 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_1 AS graph_1_c1
     LIMIT ?
  PARAS: [3]
------------------------------

In [7]:
!kgtk --debug query -i p279 --idx mode:monograph --limit 5

[2022-04-15 10:08:43 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_1 AS graph_1_c1
     LIMIT ?
  PARAS: [5]
---------------------------------------------
[2022-04-15 10:08:43 sqlstore]: CREATE INDEX "graph_1_node1_label_node2_idx" ON "graph_1" ("node1", "label", "node2")
[2022-04-15 10:08:46 sqlstore]: ANALYZE "graph_1_node1_label_node2_idx"
[2022-04-15 10:08:46 sqlstore]: CREATE INDEX "graph_1_node2_label_node1_idx" ON "graph_1" ("node2", "label", "node1")
[2022-04-15 10:08:50 sqlstore]: ANALYZE "graph_1_node2_label_node1_idx"
id	node1	label	node2
Q100000030-P279-Q14748-30394205-0	Q100000030	P279	Q14748
Q100000058-P279-Q1622444-bd182663-0	Q100000058	P279	Q1622444
Q1000032-P279-Q1813494-0aa0f1dc-0	Q1000032	P279	Q1813494
Q1000032-P279-Q83602-482a1943-0	Q1000032	P279	Q83602
Q1000039-P279-Q11555767-2dddfd86-0	Q1000039	P279	Q11555767


In [8]:
!kgtk --debug query -i p279star --idx mode:monograph --limit 5

[2022-04-15 10:08:53 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_2 AS graph_2_c1
     LIMIT ?
  PARAS: [5]
---------------------------------------------
[2022-04-15 10:08:53 sqlstore]: CREATE INDEX "graph_2_node1_label_node2_idx" ON "graph_2" ("node1", "label", "node2")
[2022-04-15 10:10:48 sqlstore]: ANALYZE "graph_2_node1_label_node2_idx"
[2022-04-15 10:11:07 sqlstore]: CREATE INDEX "graph_2_node2_label_node1_idx" ON "graph_2" ("node2", "label", "node1")
[2022-04-15 10:14:42 sqlstore]: ANALYZE "graph_2_node2_label_node1_idx"
node1	label	node2	id
Q100000030	P279star	Q100000030	Q100000030-P279star-Q100000030
Q100000030	P279star	Q14748	Q100000030-P279star-Q14748
Q100000030	P279star	Q14745	Q100000030-P279star-Q14745
Q100000030	P279star	Q1357761	Q100000030-P279star-Q1357761
Q100000030	P279star	Q223557	Q100000030-P279star-Q223557


In [9]:
!kgtk --debug query -i label --idx mode:monograph --limit 5

[2022-04-15 10:15:03 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_3 AS graph_3_c1
     LIMIT ?
  PARAS: [5]
---------------------------------------------
[2022-04-15 10:15:03 sqlstore]: CREATE INDEX "graph_3_node1_label_node2_idx" ON "graph_3" ("node1", "label", "node2")
[2022-04-15 10:15:47 sqlstore]: ANALYZE "graph_3_node1_label_node2_idx"
[2022-04-15 10:15:54 sqlstore]: CREATE INDEX "graph_3_node2_label_node1_idx" ON "graph_3" ("node2", "label", "node1")
[2022-04-15 10:17:18 sqlstore]: ANALYZE "graph_3_node2_label_node1_idx"
id	node1	label	node2	lang	rank	node2;wikidatatype
P10-label-en	P10	label	'video'@en	en		
P1000-label-en	P1000	label	'record held'@en	en		
P10000-label-en	P10000	label	'Research Vocabularies Australia ID'@en	en		
P10001-label-en	P10001	label	'Austrian Football Association player ID'@en	en		
P10002-label-en	P10002	label	'Dewan Negara ID'@en	en		


## Get a list of all the classes


First get a list of all the `node1` in p279

In [10]:
kgtk("""
    query -i p279
        --match '(class)-[]->()'
        --return 'distinct class as id'
    -o $TEMP/p279.node1.tsv.gz
""")

In [11]:
!zcat < $TEMP/p279.node1.tsv.gz | wc -l

2591348


Now get a list of all the node2 in p279

In [12]:
kgtk("""
    query -i p279
        --match '()-[]->(class)'
        --return 'distinct class as id'
    -o $TEMP/p279.node2.tsv.gz
""")

In [13]:
!zcat < $TEMP/p279.node2.tsv.gz | wc -l

134276


In [14]:
kgtk("""
    ifnotexists --mode NONE 
        -i $TEMP/p279.node2.tsv.gz
        --filter-on $TEMP/p279.node1.tsv.gz
        --input-keys id
        --filter-keys id
    -o $TEMP/p279.classes-that-are-not-subclasses.tsv.gz
""")

In [15]:
!zcat < $TEMP/p279.classes-that-are-not-subclasses.tsv.gz | wc -l

10876


In [22]:
kgtk("head -i $TEMP/p279.classes-that-are-not-subclasses.tsv.gz -n 25 / add-labels")

Unnamed: 0,id,id;label
0,Q1000068,'Planungsverband'@en
1,Q1000156,'insurance expert'@en
2,Q1000370,'Pinales'@en
3,Q10008206,'Category:Populated places templates'@en
4,Q1001043,'Buddhist modernism'@en
5,Q1001086,'buddy system'@en
6,Q1001150,'fibrillation'@en
7,Q100136988,'XDOMEA file format family'@en
8,Q1002456,'Foggia dialect'@en
9,Q100297305,'Flow Charting file format family'@en


Concatenate the files to get a list of all the classes

In [16]:
kgtk("""
    cat --mode NONE -i $TEMP/p279.node1.tsv.gz -i $TEMP/p279.classes-that-are-not-subclasses.tsv.gz
    / sort --mode NONE --column id
    -o $OUT/classes.tsv.gz
""")

In [17]:
!zcat < $OUT/classes.tsv.gz | wc -l

2602223


## Measure the degree of classes

In [18]:
kgtk("""
    graph-statistics -i "$p279" -o $OUT/statistics.p279.tsv.gz 
    --compute-pagerank False 
    --compute-hits False 
    --page-rank-property Pdirected_pagerank 
    --vertex-in-degree-property Pindegree
    --vertex-out-degree-property Poutdegree
    --output-degrees True 
    --output-pagerank False 
    --output-hits False \
    --output-statistics-only 
    --undirected False 
    --log-file $TEMP/statistics.summary.txt
""")

In [19]:
kgtk("sort -i $OUT/statistics.p279.tsv.gz --columns node2 --numeric --reverse -o $TEMP/p279.indegree.tsv.gz")

In [20]:
kgtk("head -i $TEMP/p279.indegree.tsv.gz -n 25 / add-labels")

Unnamed: 0,node1,label,node2,id,node1;label
0,Q20747295,Pindegree,942044,Q20747295-Pindegree-19858,'protein-coding gene'@en
1,Q8054,Pindegree,764432,Q8054-Pindegree-15452,'protein'@en
2,Q7187,Pindegree,449629,Q7187-Pindegree-5760,'gene'@en
3,Q277338,Pindegree,49936,Q277338-Pindegree-394380,'pseudogene'@en
4,Q427087,Pindegree,47845,Q427087-Pindegree-370926,'non-coding RNA'@en
5,Q382617,Pindegree,40184,Q382617-Pindegree-79490,'mayor of a place in France'@en
6,Q15113603,Pindegree,40179,Q15113603-Pindegree-371428,'municipal councillor'@en
7,Q11173,Pindegree,14305,Q11173-Pindegree-632,'chemical compound'@en
8,Q64698614,Pindegree,8819,Q64698614-Pindegree-2950174,'pseudogenic transcript'@en
9,Q201448,Pindegree,8724,Q201448-Pindegree-453142,'transfer RNA'@en


In [23]:
kgtk("""
    query -i $OUT/statistics.p279.tsv.gz 
        --match '(n1)-[eid]->(degree)' 
        --where 'cast(degree, int) > 500' 
        --order-by 'cast(degree, int) desc'
""")

Unnamed: 0,node1,label,node2,id
0,Q20747295,Pindegree,942044,Q20747295-Pindegree-19858
1,Q8054,Pindegree,764432,Q8054-Pindegree-15452
2,Q7187,Pindegree,449629,Q7187-Pindegree-5760
3,Q277338,Pindegree,49936,Q277338-Pindegree-394380
4,Q427087,Pindegree,47845,Q427087-Pindegree-370926
...,...,...,...,...
96,Q7368,Pindegree,518,Q7368-Pindegree-2190
97,Q3073451,Pindegree,513,Q3073451-Pindegree-29890
98,Q1144278,Pindegree,512,Q1144278-Pindegree-12628
99,Q72091636,Pindegree,509,Q72091636-Pindegree-29080


### Create list of high and low `P279` degree classes 

In [24]:
kgtk("""
    query -i $OUT/statistics.p279.tsv.gz 
        --match '(n1)-[:Pindegree]->(degree)' 
        --where 'cast(degree, int) < 500' 
        --return 'n1 as node1, "few_subclasses" as node_type'
        --order-by 'cast(degree, int) desc'
    -o $OUT/class-browsing.low-degree-nodes.tsv
""")

The `class-browsing.low-degree-nodes.tsv` is simply a list of nodes:

In [26]:
kgtk("head -n 5 -i $OUT/class-browsing.low-degree-nodes.tsv / add-labels")

Unnamed: 0,node1,node_type,node1;label
0,Q1002954,few_subclasses,'Formula One car'@en
1,Q2990946,few_subclasses,'golf tournament'@en
2,Q898273,few_subclasses,'protein domain'@en
3,Q62927,few_subclasses,'digital camera'@en
4,Q11446,few_subclasses,'ship'@en


In [27]:
kgtk("""
    query -i $OUT/statistics.p279.tsv.gz 
        --match '(n1)-[:Pindegree]->(degree)' 
        --where 'cast(degree, int) > 499'
        --return 'n1 as node1, "many_subclasses" as node_type'
        --order-by 'cast(degree, int) desc'
    -o $OUT/class-browsing.high-degree-nodes.tsv
""")

In [28]:
kgtk("head -n 5 -i $OUT/class-browsing.high-degree-nodes.tsv / add-labels")

Unnamed: 0,node1,node_type,node1;label
0,Q20747295,many_subclasses,'protein-coding gene'@en
1,Q8054,many_subclasses,'protein'@en
2,Q7187,many_subclasses,'gene'@en
3,Q277338,many_subclasses,'pseudogene'@en
4,Q427087,many_subclasses,'non-coding RNA'@en


In [29]:
kgtk("""
    cat --use-graph-cache-envar False --mode NONE -i $OUT/class-browsing.low-degree-nodes.tsv -i $OUT/class-browsing.high-degree-nodes.tsv
    -o $OUT/class-browsing.all-nodes.tsv
""")

In [30]:
kgtk("head -i $OUT/class-browsing.all-nodes.tsv -n 4")

Unnamed: 0,node1,node_type
0,Q1002954,few_subclasses
1,Q2990946,few_subclasses
2,Q898273,few_subclasses
3,Q62927,few_subclasses


In [31]:
!kgtk --debug query -i $OUT/class-browsing.all-nodes.tsv --as browsernodes --idx index:node1,node_type --limit 3

[2022-04-15 10:50:40 sqlstore]: IMPORT graph directly into table graph_5 from /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/class-browsing.all-nodes.tsv ...
[2022-04-15 10:50:43 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_5 AS graph_5_c1
     LIMIT ?
  PARAS: [3]
---------------------------------------------
[2022-04-15 10:50:43 sqlstore]: CREATE INDEX "graph_5_node1_node_type_idx" ON "graph_5" ("node1", "node_type")
[2022-04-15 10:50:45 sqlstore]: ANALYZE "graph_5_node1_node_type_idx"
node1	node_type
Q1002954	few_subclasses
Q2990946	few_subclasses
Q898273	few_subclasses


## Create a P279star file that we will use for visualization.



### First create a complete p279star file containing all classes

First create a complete P279star file that contains all classes as our starting point. We do this because in the browser, users can click on any class.

In [32]:
kgtk("""
    reachable-nodes
        --rootfile $OUT/classes.tsv.gz
        --selflink 
        --breadth-first True
        --show-distance True
        --label P279star
        -i "$p279"
        -o $TEMP/derived.p279star.complete.tsv.gz
""")

In [33]:
kgtk("head -i $TEMP/derived.p279star.complete.tsv.gz -n 10")

Unnamed: 0,node1,label,node2,distance
0,Q100000030,P279star,Q100000030,0
1,Q100000030,P279star,Q14748,1
2,Q100000030,P279star,Q14745,2
3,Q100000030,P279star,Q1357761,3
4,Q100000030,P279star,Q2424752,3
5,Q100000030,P279star,Q31807746,3
6,Q100000030,P279star,Q8205328,3
7,Q100000030,P279star,Q223557,4
8,Q100000030,P279star,Q15401930,4
9,Q100000030,P279star,Q28877,4


The complete p279star file has only a few more edges than the default one. We should replace the original one with the complete one in any case.

In [34]:
!zcat < "$p279star" | wc -l

117181418


In [35]:
!zcat < $TEMP/derived.p279star.complete.tsv.gz | wc -l

117191285


Add ids and index for use in queries. The new file has a distance column, which we index too so that we can do index queries quickly.

In [36]:
kgtk("""
    add-id --id-style wikidata -i $TEMP/derived.p279star.complete.tsv.gz
    -o $OUT/derived.p279star.complete.tsv.gz
""")

In [37]:
!kgtk --debug query -i $OUT/derived.p279star.complete.tsv.gz --as p279stard --idx index:node2,node1,distance --limit 3

[2022-04-15 12:14:35 sqlstore]: IMPORT graph directly into table graph_6 from /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/derived.p279star.complete.tsv.gz ...
[2022-04-15 12:17:16 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_6 AS graph_6_c1
     LIMIT ?
  PARAS: [3]
---------------------------------------------
[2022-04-15 12:17:16 sqlstore]: CREATE INDEX "graph_6_node2_node1_distance_idx" ON "graph_6" ("node2", "node1", "distance")
[2022-04-15 12:21:15 sqlstore]: ANALYZE "graph_6_node2_node1_distance_idx"
node1	label	node2	distance	id
Q100000030	P279star	Q100000030	0	Q100000030-P279star-Q100000030
Q100000030	P279star	Q14748	1	Q100000030-P279star-Q14748
Q100000030	P279star	Q14745	2	Q100000030-P279star-Q14745


### Count the number of subclasses 
We eventually want to build the subclass graph for each class, but some may be too large

In [38]:
kgtk("""
    query -i p279stard
        --match '
            (subclass)-[]->(class)'
        --return 'class as node1, "Pcount_subclasses" as label, count(distinct subclass) as node2, class as graph'
        --where 'subclass != class'
        --order-by 'cast(node2, int) desc'
    -o $TEMP/subclass.count.tsv.gz
""")

Get an overview of the file. The top classes have an enormous number of subclasses, which will cause trouble for visualization.
Also, only 126K classes with subclasses, so there are a lot of leaf classes in Wikidata.

In the steps below we exclude the high degree classes, but that won't fix the problem as the top classes have too many subclasses anyway. Sigh. The browser will freeze and the user will be annoyed.

In [39]:
df = kgtk("""
    cat -i $TEMP/subclass.count.tsv.gz / add-labels
""")
df

Unnamed: 0,node1,label,node2,graph,node1;label,graph;label
0,Q35120,Pcount_subclasses,2561483,Q35120,'entity'@en,'entity'@en
1,Q58415929,Pcount_subclasses,2385182,Q58415929,'spatio-temporal entity'@en,'spatio-temporal entity'@en
2,Q99527517,Pcount_subclasses,2295292,Q99527517,'collective entity'@en,'collective entity'@en
3,Q58416391,Pcount_subclasses,1972800,Q58416391,'spatial entity'@en,'spatial entity'@en
4,Q488383,Pcount_subclasses,1448830,Q488383,'object'@en,'object'@en
...,...,...,...,...,...,...
134262,Q99970237,Pcount_subclasses,1,Q99970237,'anthropomorphic deer'@en,'anthropomorphic deer'@en
134263,Q99971015,Pcount_subclasses,1,Q99971015,'anthropomorphic cow or other cattle'@en,'anthropomorphic cow or other cattle'@en
134264,Q99972330,Pcount_subclasses,1,Q99972330,'video game occupation'@en,'video game occupation'@en
134265,Q99974769,Pcount_subclasses,1,Q99974769,,


### Create a subset of p279 that excludes high in-degree classes in node2

File `class-browsing.low-degree-nodes.tsv` has the class with a low number of subclasses, which we call the low degree nodes. Our low degree P279 file will have all P279 edges that arrive at a low degree class, i.e., where the superclass is a low degree class.

In [40]:
kgtk("""
    query -i p279 -i $OUT/class-browsing.low-degree-nodes.tsv
        --match '
            p279: (class)-[eid]->(superclass),
            low: (superclass)'
        --return 'class as node1, eid.label as label, superclass as node2, eid as id'
    -o $OUT/p279.lowdegree.tsv.gz
""")

In [41]:
!zcat < "$p279" | wc -l

3198410


The low degree P279 file has many fewer edges, which is expected as the high degree classes account for a lot of edges.

In [42]:
!zcat < $OUT/p279.lowdegree.tsv.gz | wc -l

710738


### Recompute P279star with the low degree classes
The output will be `derived.p279star.low-degree.complete.tsv.gz`

We start at all classes, and find all superclasses for them, excluding the high degree classes.

In [43]:
kgtk("""
    reachable-nodes
        --rootfile $OUT/classes.tsv.gz
        --selflink 
        --breadth-first True
        --show-distance True
        --label P279star
        -i $OUT/p279.lowdegree.tsv.gz
        -o $TEMP/derived.p279star.low-degree.complete.tsv.gz
""")

Add ids

In [44]:
kgtk("""
    add-id --id-style wikidata -i $TEMP/derived.p279star.low-degree.complete.tsv.gz
    -o $OUT/derived.p279star.low-degree.complete.tsv.gz
""")

Index using node1, node2 and distance. I wonder if we should also index the id column?

In [45]:
!kgtk --debug query -i $OUT/derived.p279star.low-degree.complete.tsv.gz --as p279starlow --idx index:node2,node1,distance --limit 3

[2022-04-15 12:35:55 sqlstore]: IMPORT graph directly into table graph_8 from /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/derived.p279star.low-degree.complete.tsv.gz ...
[2022-04-15 12:36:25 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_8 AS graph_8_c1
     LIMIT ?
  PARAS: [3]
---------------------------------------------
[2022-04-15 12:36:25 sqlstore]: CREATE INDEX "graph_8_node2_node1_distance_idx" ON "graph_8" ("node2", "node1", "distance")
[2022-04-15 12:37:01 sqlstore]: ANALYZE "graph_8_node2_node1_distance_idx"
node1	label	node2	distance	id
Q100000030	P279star	Q100000030	0	Q100000030-P279star-Q100000030
Q100000030	P279star	Q14748	1	Q100000030-P279star-Q14748
Q100000030	P279star	Q14745	2	Q100000030-P279star-Q14745


### Statistics to show in the graph

> We are not computing the statistics file in this notebook as it is computed in the `p1963` project. 
> We need the file here, so Pedro copied it from the `p1963` project and put it in the `$TEMP` folder

File is `statistics.Pinstance_count.tsv.gz`


In [46]:
kgtk("head -i $TEMP/statistics.Pinstance_count.tsv.gz")

Unnamed: 0,node1,label,node2,id
0,Q1000017,Pinstance_count,1,Q1000017-Pinstance_count-6b86b2
1,Q1000091,Pinstance_count,1,Q1000091-Pinstance_count-6b86b2
2,Q1000156,Pinstance_count,16,Q1000156-Pinstance_count-b17ef6
3,Q100023,Pinstance_count,1,Q100023-Pinstance_count-6b86b2
4,Q100026,Pinstance_count,1,Q100026-Pinstance_count-6b86b2
5,Q100029091,Pinstance_count,10,Q100029091-Pinstance_count-4a44dc
6,Q1000300,Pinstance_count,2,Q1000300-Pinstance_count-d4735e
7,Q100034524,Pinstance_count,3,Q100034524-Pinstance_count-4e0740
8,Q1000371,Pinstance_count,3,Q1000371-Pinstance_count-4e0740
9,Q100038174,Pinstance_count,11,Q100038174-Pinstance_count-4fc82b


In [47]:
!kgtk --debug query -i $TEMP/statistics.Pinstance_count.tsv.gz --idx mode:monograph --limit 5

[2022-04-15 12:57:29 sqlstore]: IMPORT graph directly into table graph_9 from /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/statistics.Pinstance_count.tsv.gz ...
[2022-04-15 12:57:29 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_9 AS graph_9_c1
     LIMIT ?
  PARAS: [5]
---------------------------------------------
[2022-04-15 12:57:29 sqlstore]: CREATE INDEX "graph_9_node1_label_node2_idx" ON "graph_9" ("node1", "label", "node2")
[2022-04-15 12:57:30 sqlstore]: ANALYZE "graph_9_node1_label_node2_idx"
[2022-04-15 12:57:30 sqlstore]: CREATE INDEX "graph_9_node2_label_node1_idx" ON "graph_9" ("node2", "label", "node1")
[2022-04-15 12:57:30 sqlstore]: ANALYZE "graph_9_node2_label_node1_idx"
node1	label	node2	id
Q1000017	Pinstance_count	1	Q1000017-Pinstance_count-6b86b2
Q1000091	Pinstance_count	1	Q1000091-Pinstance_count-6b86b2
Q1000156	Pinstance_count	16	Q1000156-Pinstance_count-b17ef6
Q100023	Pinst

## Compute the edge file that contains the graph we want to visualize for each class

The edge file contains `subclass / P279 / class` edges, but we add two columns to support the visualization:

- `graph:` is the id of a class we want to visualize. This columns allows us to quickly fetch all the edges to build the visualization of a class.
- `edge_type`: in the visualization we want to distinguish `subclass` and `superclass` edges so the viewer can easily distinguish subclasses and superclasses.

### Compute the subclass edges

For every class (the graph) we want to find all the P279 edges for subclasses of the given class. We use `class-browsing.low-degree-nodes.tsv` so that we don't include high degree classes that will blow up the browser.

In [48]:
kgtk(f"""
    query -i p279starlow -i p279 -i $OUT/class-browsing.low-degree-nodes.tsv
        --match '
            p279starlow: (subclass1)-[]->(class),
            p279starlow: (subclass2)-[]->(class),
            low: (subclass1),
            low: (subclass2),
            p279: (subclass1)-[]->(subclass2)'
        --return 'distinct subclass1 as node1, "P279" as label, subclass2 as node2, class as graph, "subclass" as edge_type'
    -o $TEMP/all.graph.low.sub.tsv.gz
""")

In [49]:
!zcat < $TEMP/all.graph.low.sub.tsv.gz | wc -l

21150592


We have a lot of edges because we make copies for every graph, i.e., the same edge appears in many graphs. This is annoying, but it allows us to fetch the graphs very quickly, in less than 2 seconds.

In [50]:
kgtk("head -n 5 -i $TEMP/all.graph.low.sub.tsv.gz")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q100000030,P279,Q14748,Q14748,subclass
1,Q100000030,P279,Q14748,Q14745,subclass
2,Q100000030,P279,Q14748,Q1357761,subclass
3,Q100000030,P279,Q14748,Q2424752,subclass
4,Q100000030,P279,Q14748,Q31807746,subclass


### Compute the superclass edges

The superclass edges are also P279 edges, but they sit above the given class. We don't need to filter to low degree classes because we are going up the P279 hierarchy.

In [51]:
kgtk(f"""
    query -i p279stard -i p279
        --match '
            p279stard: (class)-[]->(superclass1),
            p279stard: (class)-[]->(superclass2),
            p279: (superclass1)-[]->(superclass2)'
        --return 'distinct superclass1 as node1, "P279" as label, superclass2 as node2, class as graph, "superclass" as edge_type'
    -o $TEMP/all.graph.low.super.tsv.gz
""")

In [52]:
!zcat < $TEMP/all.graph.low.super.tsv.gz | wc -l

167563569


In [53]:
kgtk("head -n 5 -i $TEMP/all.graph.low.super.tsv.gz")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q95079834,P279,Q1000068,Q95079834,superclass
1,Q179985,P279,Q1000156,Q179985,superclass
2,Q17372279,P279,Q100026,Q17372279,superclass
3,Q17372377,P279,Q100026,Q17372377,superclass
4,Q17372377,P279,Q100026,Q17372463,superclass


### Concatenate the subclass and superclass files, and store in `$TEMP/graph.low.tsv.gz`

We keep the file in `$TEMP` because for the final file we want to add he high degree nodes so that the user sees that they exist (we will not add the subclasses). Once we have the complete file, we will put it in `$OUT`.

In [54]:
kgtk(f"""
    cat --use-graph-cache-envar False -i $TEMP/all.graph.low.sub.tsv.gz -i $TEMP/all.graph.low.super.tsv.gz
    -o $TEMP/graph.low.tsv.gz
""")

Index the file to allow fast queries on all columns

In [55]:
!kgtk --debug query -i $TEMP/graph.low.tsv.gz --as graphbrowser --idx index:node1,node2,graph,edge_type --limit 3

[2022-04-15 13:28:54 sqlstore]: IMPORT graph directly into table graph_10 from /data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/graph.low.tsv.gz ...
[2022-04-15 13:32:26 query]: SQL Translation:
---------------------------------------------
  SELECT *
     FROM graph_10 AS graph_10_c1
     LIMIT ?
  PARAS: [3]
---------------------------------------------
[2022-04-15 13:32:26 sqlstore]: CREATE INDEX "graph_10_node1_node2_graph_edge_type_idx" ON "graph_10" ("node1", "node2", "graph", "edge_type")
[2022-04-15 13:36:14 sqlstore]: ANALYZE "graph_10_node1_node2_graph_edge_type_idx"
node1	label	node2	graph	edge_type
Q100000030	P279	Q14748	Q14748	subclass
Q100000030	P279	Q14748	Q14745	subclass
Q100000030	P279	Q14748	Q1357761	subclass


## Compute the node file for visualization

The node file for visualization needs the labels for the nodes, and the `graph` to pull it out quickly. We add:

- `instance_count`: the number of direct instances of the class, as it is interesting for the user to see this information.

### Extract the nodes from the edge file

The reason to use the edge file is that we need the `graph` id. We do it in two steps, first extract `node1` and then extract `node2`

In [56]:
kgtk("""
    query -i label -i $TEMP/statistics.Pinstance_count.tsv.gz -i graphbrowser -i browsernodes
        --match '
            graphbrowser: (c)-[{graph: graph}]->(),
            browsernodes: (c)-[{node_type: nt}]->()'
        --opt 'label: (c)-[]->(class_label)'
        --opt 'Pinstance_count: (c)-[:Pinstance_count]->(instance_count)'
        --return 'distinct c as node1, graph as graph, coalesce(instance_count,0) as instance_count, nt as node_type, class_label as label'
    -o $TEMP/graph.low.node1.tsv.gz
""")


This is what our node file looks like:

In [57]:
kgtk("head -n 5 -i $TEMP/graph.low.node1.tsv.gz")

Unnamed: 0,node1,graph,instance_count,node_type,label
0,Q1002954,Q1002954,5,few_subclasses,'Formula One car'@en
1,Q1002954,Q10273457,5,few_subclasses,'Formula One car'@en
2,Q1002954,Q10323255,5,few_subclasses,'Formula One car'@en
3,Q1002954,Q10393737,5,few_subclasses,'Formula One car'@en
4,Q1002954,Q10393739,5,few_subclasses,'Formula One car'@en


In [58]:
kgtk("""
    query -i label -i $TEMP/statistics.Pinstance_count.tsv.gz -i graphbrowser -i browsernodes
        --match '
            graphbrowser: ()-[{graph: graph}]->(c),
            browsernodes: (c)-[{node_type: nt}]->()'
        --opt 'label: (c)-[]->(class_label)'
        --opt 'Pinstance_count: (c)-[:Pinstance_count]->(instance_count)'
        --return 'distinct c as node1, graph as graph, coalesce(instance_count,0) as instance_count, nt as node_type, class_label as label'
    -o $TEMP/graph.low.node2.tsv.gz
""")

### Concatenate the two node files, deduplicate and index

To-do: try presorting the files to see if compact will run faster, as it is, this command takes over 2.5 hours

In [59]:
kgtk("""
    cat --use-graph-cache-envar False --mode NONE -i $TEMP/graph.low.node1.tsv.gz -i $TEMP/graph.low.node2.tsv.gz
    / compact --mode NONE  --columns node1 graph
    -o $TEMP/graph.low.node.tsv.gz
""")

We only need to index on `graph` as we will not do node queries on it:

## Special handling of high degree nodes

In [60]:
kgtk("head -n 5 -i $OUT/class-browsing.high-degree-nodes.tsv")

Unnamed: 0,node1,node_type
0,Q20747295,many_subclasses
1,Q8054,many_subclasses
2,Q7187,many_subclasses
3,Q277338,many_subclasses
4,Q427087,many_subclasses


### Make a graph file with the `P279` edges where the subclass is a high degree class

Do this only to add edges that connect to the subclasses of our target node, so `class` has to be in `$TEMP/all.graph.low.sub.tsv.gz`

In [65]:
kgtk("""
    query --debug -i $OUT/class-browsing.high-degree-nodes.tsv -i p279 -i $TEMP/all.graph.low.sub.tsv.gz
        --match '
            low: (class)-[{graph: graph}]->(),
            high: (subclass),
            p279: (subclass)-[]->(class)'
        --where 'subclass != class'
        --return 'distinct subclass as node1, "P279" as label, class as node2, graph as graph, "subclass" as edge_type'
    -o $TEMP/graph.high1.tsv.gz
""")

[2022-04-15 15:36:15 query]: SQL Translation:
---------------------------------------------
  SELECT DISTINCT graph_11_c2."node1" "_aLias.node1", ? "_aLias.label", graph_12_c1."node1" "_aLias.node2", graph_12_c1."graph" "_aLias.graph", ? "_aLias.edge_type"
     FROM graph_1 AS graph_1_c3
     INNER JOIN graph_11 AS graph_11_c2, graph_12 AS graph_12_c1
     ON graph_11_c2."node1" = graph_1_c3."node1"
        AND graph_12_c1."node1" = graph_1_c3."node2"
        AND graph_12_c1."graph" = graph_12_c1."graph"
        AND (graph_11_c2."node1" != graph_12_c1."node1")
  PARAS: ['P279', 'subclass']
---------------------------------------------



In [62]:
kgtk("""
    query --debug -i $OUT/class-browsing.high-degree-nodes.tsv -i p279 -i $TEMP/all.graph.low.sub.tsv.gz
        --match '
            low: ()-[{graph: graph}]->(class),
            high: (subclass),
            p279: (subclass)-[]->(class)'
        --where 'subclass != class'
        --return 'distinct subclass as node1, "P279" as label, class as node2, graph as graph, "subclass" as edge_type'
    -o $TEMP/graph.high2.tsv.gz
""")

[2022-04-15 15:33:15 query]: SQL Translation:
---------------------------------------------
  SELECT DISTINCT graph_11_c2."node1" "_aLias.node1", ? "_aLias.label", graph_12_c1."node2" "_aLias.node2", graph_12_c1."graph" "_aLias.graph", ? "_aLias.edge_type"
     FROM graph_1 AS graph_1_c3
     INNER JOIN graph_11 AS graph_11_c2, graph_12 AS graph_12_c1
     ON graph_11_c2."node1" = graph_1_c3."node1"
        AND graph_12_c1."node2" = graph_1_c3."node2"
        AND graph_12_c1."graph" = graph_12_c1."graph"
        AND (graph_11_c2."node1" != graph_12_c1."node2")
  PARAS: ['P279', 'subclass']
---------------------------------------------
[2022-04-15 15:33:15 sqlstore]: CREATE INDEX "graph_12_node2_idx" ON "graph_12" ("node2")
[2022-04-15 15:33:29 sqlstore]: ANALYZE "graph_12_node2_idx"



In [66]:
kgtk(f"""
    cat --use-graph-cache-envar False -i $TEMP/graph.high1.tsv.gz -i  $TEMP/graph.high2.tsv.gz
    -o $TEMP/graph.high.tsv.gz
""")

In [67]:
kgtk("head -n 5 -i $TEMP/graph.high.tsv.gz")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q10267817,P279,Q18553442,Q1225194,subclass
1,Q107715,P279,Q309314,Q246672,subclass
2,Q107715,P279,Q309314,Q937228,subclass
3,Q107715,P279,Q309314,Q7184903,subclass
4,Q107715,P279,Q309314,Q35120,subclass


### Make a node file with the high degree nodes

We use the edge file because we need to put the `graph` in the node file too.

In [68]:
kgtk("""
    query -i label -i $TEMP/statistics.Pinstance_count.tsv.gz -i $TEMP/graph.high.tsv.gz
        --match 'high: (c)-[{graph: graph}]->()'
        --opt 'label: (c)-[]->(class_label)'
        --opt 'Pinstance_count: (c)-[:Pinstance_count]->(instance_count)'
        --return 'distinct c as node1, graph as graph, coalesce(instance_count,0) as instance_count, "many_subclasses" as node_type, class_label as label'
    -o $TEMP/graph.high.node.tsv.gz
""")

In [69]:
kgtk("head -n 5 -i $TEMP/graph.high.node.tsv.gz")

Unnamed: 0,node1,graph,instance_count,node_type,label
0,Q10267817,Q1225194,1,many_subclasses,'autosomal recessive disease'@en
1,Q10267817,Q18553442,1,many_subclasses,'autosomal recessive disease'@en
2,Q107715,Q246672,95,many_subclasses,'physical quantity'@en
3,Q107715,Q937228,95,many_subclasses,'physical quantity'@en
4,Q107715,Q7184903,95,many_subclasses,'physical quantity'@en


Just to make sure, count the number of sublcasses of one of our supposedly high degree nodes, innocent looking with one instance, but indeed many subclasses.

In [70]:
kgtk("query -i p279 --match '(subclass)-[]->(:Q10267817)' --return 'count(distinct subclass)'")

Unnamed: 0,"count(DISTINCT graph_1_c1.""node1"")"
0,1096


In [71]:
kgtk("query -i p279 --match '(subclass)-[]->(:Q30185)' --return 'count(distinct subclass)'")

Unnamed: 0,"count(DISTINCT graph_1_c1.""node1"")"
0,2417


### Augment the low degree edge and node files with the high degree info

Concatenating without deduplication is sufficient as the files cannot have duplicate edges or nodes.

In [72]:
kgtk("""
    cat --use-graph-cache-envar False -i $TEMP/graph.high.tsv.gz -i $TEMP/graph.low.tsv.gz
    -o $OUT/class-visualization.edge.tsv.gz
""")

In [73]:
kgtk("head -n 5 -i $OUT/class-visualization.edge.tsv.gz")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q10267817,P279,Q18553442,Q1225194,subclass
1,Q107715,P279,Q309314,Q246672,subclass
2,Q107715,P279,Q309314,Q937228,subclass
3,Q107715,P279,Q309314,Q7184903,subclass
4,Q107715,P279,Q309314,Q35120,subclass


Index the file for query using the `graph` column:

In [74]:
!kgtk query -i $OUT/class-visualization.edge.tsv.gz --as classvizedge --idx index:graph --limit 3

node1	label	node2	graph	edge_type
Q10267817	P279	Q18553442	Q1225194	subclass
Q107715	P279	Q309314	Q246672	subclass
Q107715	P279	Q309314	Q937228	subclass


Concatenate the node files:

In [75]:
kgtk("""
    cat --use-graph-cache-envar False --mode NONE -i $TEMP/graph.high.node.tsv.gz -i $TEMP/graph.low.node.tsv.gz
    -o $TEMP/class-visualization.node.tsv.gz
""")

Add a tooltip with meaningful information

In [76]:
kgtk("""
    query -i $TEMP/class-visualization.node.tsv.gz
        --match '(node)-[{graph: g, instance_count: ic, node_type: nt, label: l}]->()'
        --return 'distinct
            node as node1, g as graph, ic as instance_count, nt as node_type, l as label,
            printf("%s (%s)<BR/>instance count: %s<BR/>node type: %s", kgtk_lqstring_text(l), node, cast(ic, int), nt) as tooltip'
    -o $OUT/class-visualization.node.tsv.gz
""")

In [77]:
kgtk("head -n 5 -i $OUT/class-visualization.node.tsv.gz")

Unnamed: 0,node1,graph,instance_count,node_type,label,tooltip
0,Q10267817,Q1225194,1,many_subclasses,'autosomal recessive disease'@en,autosomal recessive disease (Q10267817)<BR/>in...
1,Q10267817,Q18553442,1,many_subclasses,'autosomal recessive disease'@en,autosomal recessive disease (Q10267817)<BR/>in...
2,Q107715,Q246672,95,many_subclasses,'physical quantity'@en,physical quantity (Q107715)<BR/>instance count...
3,Q107715,Q937228,95,many_subclasses,'physical quantity'@en,physical quantity (Q107715)<BR/>instance count...
4,Q107715,Q7184903,95,many_subclasses,'physical quantity'@en,physical quantity (Q107715)<BR/>instance count...


In [78]:
kgtk("""
    query -i $OUT/class-visualization.edge.tsv.gz
    --match 'edge:()-[label{graph:g}]->()'
    --return 'g as node1, "count" as label, COUNT(g) as node2'
    -o $OUT/class-visualization.edge.count.tsv.gz
""")

In [79]:
kgtk("""
    query -i $OUT/class-visualization.edge.tsv.gz
    --match '()-[label{graph:g, edge_type:et}]->()'
    --where 'et = "subclass"'
    --return 'g as node1, "count" as label, COUNT(g) as node2'
    -o $OUT/class-visualization.edge.sub.count.tsv.gz
""")

In [80]:
kgtk("""
    head -i $OUT/class-visualization.edge.superclass.tsv.gz
""")

Exception in thread background thread for pid 134333:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/class-visualization.edge.superclass.tsv.gz'



In [81]:
kgtk("""
    query -i $OUT/class-visualization.edge.tsv.gz
    --match '()-[label{graph:g, edge_type:et}]->()'
    --where 'et = "superclass"'
    --return 'g as node1, "count" as label, COUNT(g) as node2'
    -o $OUT/class-visualization.edge.super.count.tsv.gz
""")

In [82]:
kgtk("""
    query -i $OUT/class-visualization.edge.tsv.gz
    --match 'edge:()-[label{edge_type:t}]->()'
    --where 't = "subclass"'
    -o $OUT/class-visualization.edge.subclass.tsv.gz
""")

In [83]:
kgtk("""
    query -i $OUT/class-visualization.edge.tsv.gz
    --match 'edge:()-[label{edge_type:t}]->()'
    --where 't = "superclass"'
    -o $OUT/class-visualization.edge.superclass.tsv.gz
""")

Index the file for query using the `graph` column:

In [84]:
!kgtk query -i $OUT/class-visualization.node.tsv.gz --as classviznode --idx index:graph --limit 3

node1	graph	instance_count	node_type	label	tooltip
Q10267817	Q1225194	1	many_subclasses	'autosomal recessive disease'@en	autosomal recessive disease (Q10267817)<BR/>instance count: 1<BR/>node type: many_subclasses
Q10267817	Q18553442	1	many_subclasses	'autosomal recessive disease'@en	autosomal recessive disease (Q10267817)<BR/>instance count: 1<BR/>node type: many_subclasses
Q107715	Q246672	95	many_subclasses	'physical quantity'@en	physical quantity (Q107715)<BR/>instance count: 95<BR/>node type: many_subclasses


Temporary: we need this file for my current version of visualize because it needs labels in the edge file, the new version can have the labels in the node file

Test creation of the node file:

In [85]:
root = "Q11424"
# root="Q391342"
root="Q1420"
# root="Q1107"
# root="Q889821"
# root="Q1549591"
# root="Q188724"
# root="Q946808"
kgtk(f"""
    query -i classviznode
        --match '(class)-[{{graph: "{root}", instance_count: instance_count, label: label}}]->()'
""")

Unnamed: 0,node1,graph,instance_count,node_type,label,tooltip
0,Q10273457,Q1420,397,few_subclasses,'equipment'@en,equipment (Q10273457)<BR/>instance count: 397<...
1,Q106839123,Q1420,1,few_subclasses,'means of transport'@en,means of transport (Q106839123)<BR/>instance c...
2,Q11019,Q1420,129,few_subclasses,'machine'@en,machine (Q11019)<BR/>instance count: 129<BR/>n...
3,Q1150771,Q1420,0,few_subclasses,'output'@en,output (Q1150771)<BR/>instance count: 0<BR/>no...
4,Q1183543,Q1420,242,many_subclasses,'device'@en,device (Q1183543)<BR/>instance count: 242<BR/>...
5,Q12060681,Q1420,0,few_subclasses,'multi-track vehicle'@en,multi-track vehicle (Q12060681)<BR/>instance c...
6,Q1301433,Q1420,15,few_subclasses,'land vehicle'@en,land vehicle (Q1301433)<BR/>instance count: 15...
7,Q1420,Q1420,879,many_subclasses,'motor car'@en,motor car (Q1420)<BR/>instance count: 879<BR/>...
8,Q1485500,Q1420,19,few_subclasses,'tangible good'@en,tangible good (Q1485500)<BR/>instance count: 1...
9,Q1515493,Q1420,0,few_subclasses,'road vehicle'@en,road vehicle (Q1515493)<BR/>instance count: 0<...


## Test creation of visualizations

In [86]:
roots = [
    "Q11424",
    "Q391342",
    "Q1420",
    "Q1107",
    "Q889821",
    "Q1549591",
    "Q188724",
    "Q946808",
    "Q33999",
    "Q483501",
    "Q2221906",
    "Q144",
    "Q516021",
    "Q10494269"
]

for root in roots:
    kgtk(f"""
        query -i classvizedgetest
            --match '(class)-[{{label: property, graph: "{root}", edge_type: edge_type}}]->(superclass)'
        -o $TEMP/browser/{root}.graph.low.tsv
    """)

    kgtk(f"""
        query -i classviznode
            --match '(class)-[{{graph: "{root}", instance_count: instance_count, label: label}}]->()'
        -o $TEMP/browser/{root}.node.graph.low.tsv
    """)

    # kgtk(f"""
    #     visualize-force-graph -i $TEMP/browser/{root}.graph.low.tsv
    #         --direction arrow
    #         -o $TEMP/browser/{root}.graph.low.html
    # """)

Exception in thread background thread for pid 138136:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q11424.graph.low.tsv'




Exception in thread background thread for pid 138180:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q11424.node.graph.low.tsv'




Exception in thread background thread for pid 138235:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q391342.graph.low.tsv'




Exception in thread background thread for pid 138280:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q391342.node.graph.low.tsv'




Exception in thread background thread for pid 138348:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q1420.graph.low.tsv'




Exception in thread background thread for pid 138392:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q1420.node.graph.low.tsv'




Exception in thread background thread for pid 138436:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q1107.graph.low.tsv'




Exception in thread background thread for pid 138480:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q1107.node.graph.low.tsv'




Exception in thread background thread for pid 138524:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q889821.graph.low.tsv'




Exception in thread background thread for pid 138568:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q889821.node.graph.low.tsv'




Exception in thread background thread for pid 138615:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q1549591.graph.low.tsv'




Exception in thread background thread for pid 138668:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q1549591.node.graph.low.tsv'




Exception in thread background thread for pid 138712:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q188724.graph.low.tsv'




Exception in thread background thread for pid 138756:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q188724.node.graph.low.tsv'




Exception in thread background thread for pid 138800:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q946808.graph.low.tsv'




Exception in thread background thread for pid 138844:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q946808.node.graph.low.tsv'




Exception in thread background thread for pid 138888:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q33999.graph.low.tsv'




Exception in thread background thread for pid 138932:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q33999.node.graph.low.tsv'




Exception in thread background thread for pid 138976:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q483501.graph.low.tsv'




Exception in thread background thread for pid 139024:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q483501.node.graph.low.tsv'




Exception in thread background thread for pid 139069:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q2221906.graph.low.tsv'




Exception in thread background thread for pid 139113:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q2221906.node.graph.low.tsv'




Exception in thread background thread for pid 139157:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q144.graph.low.tsv'




Exception in thread background thread for pid 139201:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q144.node.graph.low.tsv'




Exception in thread background thread for pid 139245:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q516021.graph.low.tsv'




Exception in thread background thread for pid 139289:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q516021.node.graph.low.tsv'




Exception in thread background thread for pid 139333:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q10494269.graph.low.tsv'




Exception in thread background thread for pid 139377:
Traceback (most recent call last):
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 1683, in wrap
    fn(*rgs, **kwargs)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2662, in background_thread
    handle_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 2349, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/nas/home/amandeep/miniconda3/envs/kgtk-env/lib/python3.9/site-packages/sh.py", line 905, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /bin/bash -c 'kgtk     

[Errno 2] No such file or directory: '/data/amandeep/wikidata-20211027-dwd-v3/class-visualization/temp.class-visualization/browser/Q10494269.node.graph.low.tsv'




## Tests for individual files

In [87]:
kgtk("""
    query -i $TEMP/graph.low.node.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,graph,instance_count,node_type,label
0,Q10273457,Q1420,397,few_subclasses,'equipment'@en
1,Q106839123,Q1420,1,few_subclasses,'means of transport'@en
2,Q11019,Q1420,129,few_subclasses,'machine'@en
3,Q1150771,Q1420,0,few_subclasses,'output'@en
4,Q1183543,Q1420,242,many_subclasses,'device'@en
5,Q12060681,Q1420,0,few_subclasses,'multi-track vehicle'@en
6,Q1301433,Q1420,15,few_subclasses,'land vehicle'@en
7,Q1420,Q1420,879,many_subclasses,'motor car'@en
8,Q1485500,Q1420,19,few_subclasses,'tangible good'@en
9,Q1515493,Q1420,0,few_subclasses,'road vehicle'@en


In [88]:
kgtk("""
    query -i $TEMP/graph.high.node.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,graph,instance_count,node_type,label


In [89]:
kgtk("""
    query -i $TEMP/class-visualization.node.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,graph,instance_count,node_type,label
0,Q10273457,Q1420,397,few_subclasses,'equipment'@en
1,Q106839123,Q1420,1,few_subclasses,'means of transport'@en
2,Q11019,Q1420,129,few_subclasses,'machine'@en
3,Q1150771,Q1420,0,few_subclasses,'output'@en
4,Q1183543,Q1420,242,many_subclasses,'device'@en
5,Q12060681,Q1420,0,few_subclasses,'multi-track vehicle'@en
6,Q1301433,Q1420,15,few_subclasses,'land vehicle'@en
7,Q1420,Q1420,879,many_subclasses,'motor car'@en
8,Q1485500,Q1420,19,few_subclasses,'tangible good'@en
9,Q1515493,Q1420,0,few_subclasses,'road vehicle'@en


In [90]:
kgtk("""
    query -i classviznode
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,graph,instance_count,node_type,label,tooltip
0,Q10273457,Q1420,397,few_subclasses,'equipment'@en,equipment (Q10273457)<BR/>instance count: 397<...
1,Q106839123,Q1420,1,few_subclasses,'means of transport'@en,means of transport (Q106839123)<BR/>instance c...
2,Q11019,Q1420,129,few_subclasses,'machine'@en,machine (Q11019)<BR/>instance count: 129<BR/>n...
3,Q1150771,Q1420,0,few_subclasses,'output'@en,output (Q1150771)<BR/>instance count: 0<BR/>no...
4,Q1183543,Q1420,242,many_subclasses,'device'@en,device (Q1183543)<BR/>instance count: 242<BR/>...
5,Q12060681,Q1420,0,few_subclasses,'multi-track vehicle'@en,multi-track vehicle (Q12060681)<BR/>instance c...
6,Q1301433,Q1420,15,few_subclasses,'land vehicle'@en,land vehicle (Q1301433)<BR/>instance count: 15...
7,Q1420,Q1420,879,many_subclasses,'motor car'@en,motor car (Q1420)<BR/>instance count: 879<BR/>...
8,Q1485500,Q1420,19,few_subclasses,'tangible good'@en,tangible good (Q1485500)<BR/>instance count: 1...
9,Q1515493,Q1420,0,few_subclasses,'road vehicle'@en,road vehicle (Q1515493)<BR/>instance count: 0<...


In [91]:
kgtk("""
    query -i graphbrowser
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q10273457,P279,Q2424752,Q1420,superclass
1,Q10273457,P279,Q8205328,Q1420,superclass
2,Q106839123,P279,Q16889133,Q1420,superclass
3,Q11019,P279,Q1183543,Q1420,superclass
4,Q11019,P279,Q39546,Q1420,superclass
5,Q1150771,P279,Q2995644,Q1420,superclass
6,Q1183543,P279,Q10273457,Q1420,superclass
7,Q1183543,P279,Q16686448,Q1420,superclass
8,Q1183543,P279,Q39546,Q1420,superclass
9,Q12060681,P279,Q42889,Q1420,superclass


In [92]:
kgtk("""
    query -i $TEMP/graph.high.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,label,node2,graph,edge_type


In [93]:
kgtk("""
    query -i $TEMP/graph.low.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q10273457,P279,Q2424752,Q1420,superclass
1,Q10273457,P279,Q8205328,Q1420,superclass
2,Q106839123,P279,Q16889133,Q1420,superclass
3,Q11019,P279,Q1183543,Q1420,superclass
4,Q11019,P279,Q39546,Q1420,superclass
5,Q1150771,P279,Q2995644,Q1420,superclass
6,Q1183543,P279,Q10273457,Q1420,superclass
7,Q1183543,P279,Q16686448,Q1420,superclass
8,Q1183543,P279,Q39546,Q1420,superclass
9,Q12060681,P279,Q42889,Q1420,superclass


In [94]:
kgtk("""
    query -i $TEMP/all.graph.low.sub.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,label,node2,graph,edge_type


In [95]:
kgtk("""
    query -i $TEMP/all.graph.low.super.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,label,node2,graph,edge_type
0,Q10273457,P279,Q2424752,Q1420,superclass
1,Q10273457,P279,Q8205328,Q1420,superclass
2,Q106839123,P279,Q16889133,Q1420,superclass
3,Q11019,P279,Q1183543,Q1420,superclass
4,Q11019,P279,Q39546,Q1420,superclass
5,Q1150771,P279,Q2995644,Q1420,superclass
6,Q1183543,P279,Q10273457,Q1420,superclass
7,Q1183543,P279,Q16686448,Q1420,superclass
8,Q1183543,P279,Q39546,Q1420,superclass
9,Q12060681,P279,Q42889,Q1420,superclass


In [96]:
kgtk("""
    query -i $TEMP/graph.low.node.tsv.gz
        --match '(node)-[{graph: "Q1420"}]->()'
        --order-by 'node'
""")

Unnamed: 0,node1,graph,instance_count,node_type,label
0,Q10273457,Q1420,397,few_subclasses,'equipment'@en
1,Q106839123,Q1420,1,few_subclasses,'means of transport'@en
2,Q11019,Q1420,129,few_subclasses,'machine'@en
3,Q1150771,Q1420,0,few_subclasses,'output'@en
4,Q1183543,Q1420,242,many_subclasses,'device'@en
5,Q12060681,Q1420,0,few_subclasses,'multi-track vehicle'@en
6,Q1301433,Q1420,15,few_subclasses,'land vehicle'@en
7,Q1420,Q1420,879,many_subclasses,'motor car'@en
8,Q1485500,Q1420,19,few_subclasses,'tangible good'@en
9,Q1515493,Q1420,0,few_subclasses,'road vehicle'@en


### In progress: Trim the subclasses based on the levels

The idea is to also trim the graph based on the number of levels, this may be difficult as I think some small graphs may have lots of levels, and some graphs may become large with just a small number of levels.

This is our starting point:

In [97]:
kgtk("head -i $OUT/derived.p279star.complete.tsv.gz -n 5")

Unnamed: 0,node1,label,node2,distance,id
0,Q100000030,P279star,Q100000030,0,Q100000030-P279star-Q100000030
1,Q100000030,P279star,Q14748,1,Q100000030-P279star-Q14748
2,Q100000030,P279star,Q14745,2,Q100000030-P279star-Q14745
3,Q100000030,P279star,Q1357761,3,Q100000030-P279star-Q1357761
4,Q100000030,P279star,Q2424752,3,Q100000030-P279star-Q2424752


Let's look at the distribution of distances

In [71]:
kgtk("""
    query -i p279starcomplete
        --match '(class)-[eid {distance: d}]->(superclass)'
        --return 'distinct d as distance, count(eid) as count'
        --order-by 'cast(count, int) desc'
""")

Unnamed: 0,distance,count
0,6,14920344
1,4,12395081
2,5,12068280
3,3,11432165
4,7,8660425
5,2,6976960
6,8,6681393
7,9,4448827
8,1,3077658
9,0,2503943


Filter the `p279starcomplete` file to keep only the subclasses with distance < K=10

In [72]:
kgtk("""
    query -i p279stard
        --match '(subclass)-[eid {distance: d}]->(class)'
        --return 'class as node1, "Pcount_subclasses" as label, count(distinct subclass) as node2'
        --where 'subclass != class and d < 9'
        --order-by 'cast(node2, int) desc'
    -o $TEMP/subclass.count.d10.tsv.gz
""")

`kgtk add-labels` drives me crazy, as it takes sooooo long.

In [73]:
!zcat < $TEMP/subclass.count.d10.tsv.gz | head -20 | kgtk add-labels / table

zcat: error writing to output: Broken pipe
| node1     | label             | node2   | node1;label                 |
| --------- | ----------------- | ------- | --------------------------- |
| Q35120    | Pcount_subclasses | 2366995 | 'entity'@en                 |
| Q99527517 | Pcount_subclasses | 1440970 | 'collection entity'@en      |
| Q16887380 | Pcount_subclasses | 1326944 | 'group'@en                  |
| Q20937557 | Pcount_subclasses | 1255680 | 'series'@en                 |
| Q28813620 | Pcount_subclasses | 1226806 | 'set'@en                    |
| Q488383   | Pcount_subclasses | 1185270 | 'object'@en                 |
| Q4406616  | Pcount_subclasses | 1144700 | 'concrete object'@en        |
| Q223557   | Pcount_subclasses | 1136457 | 'physical object'@en        |
| Q6671777  | Pcount_subclasses | 1110651 | 'structure'@en              |
| Q58415929 | Pcount_subclasses | 1091001 | 'spatio-temporal entity'@en |
| Q219858   | Pcount_subclasses | 1056942 | 'zone'@en                