# Example Scenario 3: Reachable occupations for selected people in Wikidata

*Carol would like to combine two subsets of Wikidata: one containing all subclass relations, and the other containing occuppations for several notable people. The combined  file  needs to be sorted by subject, after which she would compute the set of reachable nodes for those people via the properties `occupation (P106)` or `subclass of (P279)`.*

## Preparation (same as in Example 2)

To run this notebook, Carol would need the Wikidata edges file. We will work with version `20200405` of Wikidata. Presumably, this file is not present on Carol's laptop, so we need to download and unpack it first:
* please download the file [here](https://drive.google.com/file/d/1WIQIYXJC1IdSlPchtqz0NDr2zEEOz-Hb/view?usp=sharing)
* unpack it by running : `gunzip wikidata_edges_20200504.tsv.gz`

You are all set!

*Note*: Here we assume that the Wikidata file has already been transformed to KGTK format from Wikidata's `json.bz2` dump. This can be done with the following KGTK command (for demonstration purposes, we will skip this command, as its execution takes around 11 hours): `kgtk import_wikidata -i wikidata-20200504-all.json.bz2 --node wikidata_nodes_20200504.tsv --edge wikidata_edges_20200504.tsv -qual wikidata_qualifiers_20200504.tsv`

## Implementation in KGTK

First, Carol needs to extract the two subsets with the `filter` operation:

In [15]:
%%bash
kgtk filter -p ' ; P279 ; ' wikidata_edges_20200504.tsv > subclass.tsv

In [20]:
%%bash
kgtk filter -p ' Q8023,Q483203,Q1426 ;  P106 ; ' wikidata_edges_20200504.tsv > people.tsv


Then, she can merge the two files into one, sort that file, and generate the set of reachable nodes for the three nodes of interest.

In [None]:
%%bash
kgtk cat people.tsv subclass.tsv / sort -c "node1" -o cat.tsv

In [5]:
%%bash
kgtk reachable_nodes cat.tsv --subj 1 --pred 2 --obj 3 --props P106,P279 --root "Q8023,Q483203,Q1426"  -o reachable.tsv