# Partitioning a subset of Wikidata

This notebook illustrates how to partition a Wikidata KGTK edges file.

Parameters are set up in the first cell so that we can run this notebook in batch mode. Example invocation command:

```
papermill partition-wikidata.ipynb partition-wikidata.out.ipynb \
-p wikidata_input_path /data3/rogers/kgtk/gd/kgtk_public_graphs/cache/wikidata-20201130/data/all.tsv.gz \
-p wikidata_parts_path /data3/rogers/kgtk/gd/kgtk_public_graphs/cache/wikidata-20201130/parts \
```

Here is a sample of the records that might appear in the input KGTK file:
```
id	node1	label	node2	rank	node2;wikidatatype	lang
Q1-P1036-418bc4-78f5a565-0	Q1	P1036	"113"	normal	external-id	
Q1-P1343-Q19190511-ab132b87-0   Q1      P1343   Q19190511       normal  wikibase-item   
Q1-P18-92a7b3-0dcac501-0        Q1      P18     "Hubble ultra deep field.jpg"   normal  commonsMedia    
Q1-P2386-cedfb0-0fdbd641-0      Q1      P2386   +880000000000000000000000Q828224        normal  quantity        
Q1-P580-a2fccf-63cf4743-0       Q1      P580    ^-13798000000-00-00T00:00:00Z/3 normal  time    
Q1-P920-47c0f2-52689c4e-0       Q1      P920    "LEM201201756"  normal  string  
Q1-P1343-Q19190511-ab132b87-0-P805-Q84065667-0  Q1-P1343-Q19190511-ab132b87-0   P805    Q84065667               wikibase-item   
Q1-P1343-Q88672152-5080b9e2-0-P304-5724c3-0     Q1-P1343-Q88672152-5080b9e2-0   P304    "13-36"         string  
Q1-P2670-Q18343-030eb87e-0-P1107-ce87f8-0       Q1-P2670-Q18343-030eb87e-0      P1107   +0.70           quantity        
Q1-P793-Q273508-1900d69c-0-P585-a2fccf-0        Q1-P793-Q273508-1900d69c-0      P585    ^-13798000000-00-00T00:00:00Z/3         time    
P10-alias-en-282226-0   P10     alias   'gif'@en
P10-description-en      P10     description     'relevant video. For images, use the property P18. For film trailers, qualify with \"object has role\" (P3831)=\"trailer\" (Q622550)'@en                        en
P10-label-en    P10     label   'video'@en                      en
Q1-addl_wikipedia_sitelink-19e42a-0     Q1      addl_wikipedia_sitelink http://enwikiquote.org/wiki/Universe                    en
Q1-addl_wikipedia_sitelink-19e42a-0-language-0  Q1-addl_wikipedia_sitelink-19e42a-0     sitelink-language       en                      en
Q1-addl_wikipedia_sitelink-19e42a-0-site-0      Q1-addl_wikipedia_sitelink-19e42a-0     sitelink-site   enwikiquote                     en
Q1-addl_wikipedia_sitelink-19e42a-0-title-0     Q1-addl_wikipedia_sitelink-19e42a-0     sitelink-title  "Universe"                      en
Q1-wikipedia_sitelink-5e459a-0  Q1      wikipedia_sitelink      http://en.wikipedia.org/wiki/Universe                   en
Q1-wikipedia_sitelink-5e459a-0-badge-Q17437798  Q1-wikipedia_sitelink-5e459a-0  sitelink-badge  Q17437798                       en
Q1-wikipedia_sitelink-5e459a-0-language-0       Q1-wikipedia_sitelink-5e459a-0  sitelink-language       en                      en
Q1-wikipedia_sitelink-5e459a-0-site-0   Q1-wikipedia_sitelink-5e459a-0  sitelink-site   enwiki                  en
Q1-wikipedia_sitelink-5e459a-0-title-0  Q1-wikipedia_sitelink-5e459a-0  sitelink-title  "Universe"                      en
```
Here are some contraints on the contents of the input file:
- The input file starts with a KGTK header record.
  - In addition to the `id`, `node1`, `label`, and `node2` columns, the file may contain the `node2;wikidatatype` column.
  - The `node2;wikidatatype` column is used to partition claims by Wikidata property datatype.
  - If it does not exist, it will be created during the partitioning process and populated using `datatype` relationships.
  - If it does exist, any empty values in the column will be populated using `datatype` relationships.
- The `id` column must contain a nonempty value.
- The first section of an `id` value must be the `node` value for the record.
  - The qualifier extraction operations depend upon this constraint. 
- In addition to the claims and qualifiers, the input file is expected to contain:
  - English language labels for all property entities appearing in the file.
- The input file ought to contain the following:
  - claims records,
  - qualifier records,
  - alias records in appropriate languages,
  - description records in appropriate languages,
  - label records in appropriate languages, and
  - sitelink records in appropriate languages.
  - `datatype` records that map Wikidata property entities to Wikidata property datatypes. These records are required if the input file does not contain the `node2;wikidatatype` column.
- Additionally, this script provides for the appearance of `type` records in the input file.
  - `type` records that list all `entityId` values and identify them as properties or items. These records provides a correctness check on the operation of `kgtk import-wikidata`, and may be deprecated in the future.
- The input file is assumed to be unsorted. If it is already sorted on the (`id` `node1` `label` `node2`) columns , then set the `presorted` parameter to `True` to shorten the execution time of this script.

### Parameters for invoking the notebook

| Parameter | Description | Default |
| --------- | ----------- | ------- |
| `wikidata_input_path` | A folder containing the Wikidata KGTK edges to partition. | '/data4/rogers/elicit/cache/datasets/wikidata-20200803/data/all.tsv.gz' |
| `wikidata_parts_path` | A folder to receive the partitioned Wikidata files, such as `part.wikibase-item.tsv.gz` | '/data4/rogers/elicit/cache/datasets/wikidata-20200803/parts' |
| `temp_folder_path` |    A folder that may be used for temporary files. | wikidata_parts_path + '/temp' |
| `gzip_command` |        The compression command for sorting. | 'pigz'  (Note: use version 2.4 or later)|
| `kgtk_command` |        The kgtk commmand. | 'time kgtk' |
| `kgtk_options` |        The kgtk commmand options. | '--debug --timing' |
| `kgtk_extension` |      The file extension for generated KGTK files. Appending `.gz` implies gzip compression. | 'tsv.gz' |
| `presorted` |           When True, the input file is already sorted on the (`id` `node1` `label` `node2`) columns. | 'False' |
| `sort_extras` |         Extra parameters for the sort program.  The default specifies a path for temporary files. Other useful parameters include '--buffer-size' and '--parallel'. | '--parallel 24 --buffer-size 30% --temporary-directory ' + temp_folder_path |
| `use_mgzip` |           When True, use the mgzip program where appropriate for faster compression. | 'True' |
| `verbose` |             When True, produce additional feedback messages. | 'True' |

Note: if `pigz` version 2.4 (or later) is not available on your system, use `gzip`.


In [13]:
# Parameters
wikidata_input_path = '/data3/rogers/kgtk/gd/kgtk_public_graphs/cache/wikidata-20201130/data/all.tsv.gz'
wikidata_parts_path = '/data3/rogers/kgtk/gd/kgtk_public_graphs/cache/wikidata-20201130/parts'
temp_folder_path =    wikidata_parts_path + '/temp'
gzip_command =        'pigz'
kgtk_command =        'time kgtk'
kgtk_options =        '--debug --timing'
kgtk_extension =      'tsv.gz'
presorted =           'False'
sort_extras =         '--parallel 24 --buffer-size 30% --temporary-directory ' + temp_folder_path
use_mgzip =           'True'
verbose =             'True'


In [3]:
print('wikidata_input_path = %s' % repr(wikidata_input_path))
print('wikidata_parts_path = %s' % repr(wikidata_parts_path))
print('temp_folder_path = %s' % repr(temp_folder_path))
print('gzip_command = %s' % repr(gzip_command))
print('kgtk_command = %s' % repr(kgtk_command))
print('kgtk_options = %s' % repr(kgtk_options))
print('kgtk_extension = %s' % repr(kgtk_extension))
print('presorted = %s' % repr(presorted))
print('sort_extras = %s' % repr(sort_extras))
print('use_mgzip = %s' % repr(use_mgzip))
print('verbose = %s' % repr(verbose))


### Create working folders and empty them

In [2]:
!mkdir {wikidata_parts_path}
!mkdir {temp_folder_path}

In [3]:
!rm {wikidata_parts_path}/*.tsv {wikidata_parts_path}/*.tsv.gz
!rm {temp_folder_path}/*.tsv {temp_folder_path}/*.tsv.gz

### Sort the Input Data Unless Presorted
Sort the input data file by (id, node1, label, node2).
This may take a while.

In [None]:
if presorted.lower() == "true": 
    print('Using a presorted input file %s.' % repr(wikidata_input_path))
    partition_input_file = wikidata_input_path 
else: 
    print('Sorting the input file %s.' % repr(wikidata_input_path))
    partition_input_file = wikidata_parts_path + '/all.' + kgtk_extension 
    !{kgtk_command} {kgtk_options} sort2 --verbose={verbose} --gzip-command={gzip_command} \
 --input-file {wikidata_input_path} \
 --output-file {partition_input_file} \
 --columns     id node1 label node2 \
 --extra       "{sort_extras}"

### Partition the Claims, Qualifiers, and Entity Data
Split out the entity data (alias, description, label, and sitelinks) and additional metadata (datatype, type).  Separate the qualifiers from the claims.


In [8]:
!{kgtk_command} {kgtk_options} filter --verbose={verbose} --use-mgzip={use_mgzip} --first-match-only \
 --input-file {partition_input_file} \
 -p '; datatype ;'        -o {wikidata_parts_path}/metadata.property.datatypes.{kgtk_extension} \
 -p '; alias ;'           -o {wikidata_parts_path}/aliases.{kgtk_extension} \
 -p '; description ;'     -o {wikidata_parts_path}/descriptions.{kgtk_extension} \
 -p '; label ;'           -o {wikidata_parts_path}/labels.{kgtk_extension} \
 -p '; addl_wikipedia_sitelink,wikipedia_sitelink ;' \
                          -o {wikidata_parts_path}/sitelinks.{kgtk_extension} \
 -p '; sitelink-badge,sitelink-language,sitelink-site,sitelink-title ;' \
                          -o {wikidata_parts_path}/sitelinks.qualifiers.{kgtk_extension} \
 -p '; type ;'            -o {wikidata_parts_path}/metadata.types.{kgtk_extension} \
 --reject-file {temp_folder_path}/claims-and-qualifiers.sorted-by-id.{kgtk_extension}

### Sort the claims and qualifiers on Node1
Sort the combined claims and qualifiers file by the node1 column.
This may take a while.

In [None]:
!{kgtk_command} {kgtk_options} sort2 --verbose={verbose} --gzip-command={gzip_command} \
 --input-file {temp_folder_path}/claims-and-qualifiers.sorted-by-id.{kgtk_extension} \
 --output-file {temp_folder_path}/claims-and-qualifiers.sorted-by-node1.{kgtk_extension}\
 --columns     node1 \
 --extra       "{sort_extras}"

### Split the claims and qualifiers
If row A's node1 value matches some other row's id value, the then row A is a qualifier.

In [None]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file {temp_folder_path}/claims-and-qualifiers.sorted-by-node1.{kgtk_extension} \
 --filter-file {temp_folder_path}/claims-and-qualifiers.sorted-by-id.{kgtk_extension} \
 --output-file {temp_folder_path}/qualifiers.sorted-by-node1.{kgtk_extension}\
 --reject-file {temp_folder_path}/claims.sorted-by-node1.{kgtk_extension}\
 --input-keys node1 \
 --filter-keys id

### Sort the claims by ID
Sort the split claims by id, node1, label, node2.
This may take a while.

In [None]:
!{kgtk_command} {kgtk_options} sort2 --verbose={verbose} --gzip-command={gzip_command} \
 --input-file {temp_folder_path}/claims.sorted-by-node1.{kgtk_extension} \
 --output-file {temp_folder_path}/claims.no-datatype.{kgtk_extension}\
 --columns     id node1 label node2 \
 --extra       "{sort_extras}"

### Merge the Wikidata Property Datatypes into the claims
Merge the Wikidata Property Datatypes into the claims row as node2;wikidatatype. This column will be used to partition the claims by Wikidata Property Datatype ina later step.  If the claims file already has a node2;wikidatatype column, lift only when that column has an empty value.


In [None]:
!{kgtk_command} {kgtk_options} lift --verbose={verbose} --use-mgzip={use_mgzip} \
 --input-file {temp_folder_path}/claims.no-datatype.{kgtk_extension} \
 --columns-to-lift label \
 --overwrite False \
 --label-file {wikidata_parts_path}/metadata.property.datatypes.{kgtk_extension}\
 --label-value datatype \
 --output-file {wikidata_parts_path}/claims.{kgtk_extension}\
 --columns-to-write 'node2;wikidatatype'

### Sort the qualifiers by ID
Sort the split qualifiers by id, node1, label, node2.
This may take a while.

In [None]:
!{kgtk_command} {kgtk_options} sort2 --verbose={verbose} --gzip-command={gzip_command} \
 --input-file {temp_folder_path}/qualifiers.sorted-by-node1.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.{kgtk_extension}\
 --columns     id node1 label node2 \
 --extra       "{sort_extras}"

### Extract the English aliases, descriptions, labels, and sitelinks.
Aliases, descriptions, and labels are extracted by selecting rows where the `node2` value ends in the language suffix for English (`@en`) in a KGTK language-qualified string. This is an abbreviated pattern; a more general pattern would include the single quotes used to delimit a KGTK language-qualified string. If `kgtk import-wikidata` has executed properly, the abbreviated pattern should be sufficient.

Sitelink rows do not have a language-specific marker in the `node2` value. We use the `lang` column to provide the language code for English ('en').  The `lang` column is an additional column created by `kgtk import-wikidata`.

In [9]:
!{kgtk_command} {kgtk_options} filter --verbose={verbose} --use-mgzip={use_mgzip} --regex \
 --input-file {wikidata_parts_path}/aliases.{kgtk_extension} \
 -p ';; ^.*@en$' -o {wikidata_parts_path}/aliases.en.{kgtk_extension}

In [9]:
!{kgtk_command} {kgtk_options} filter --verbose={verbose} --use-mgzip={use_mgzip} --regex \
 --input-file {wikidata_parts_path}/descriptions.{kgtk_extension} \
 -p ';; ^.*@en$' -o {wikidata_parts_path}/descriptions.en.{kgtk_extension}

In [9]:
!{kgtk_command} {kgtk_options} filter --verbose={verbose} --use-mgzip={use_mgzip} --regex \
 --input-file {wikidata_parts_path}/labels.{kgtk_extension} \
 -p ';; ^.*@en$' -o {wikidata_parts_path}/labels.en.{kgtk_extension}

In [9]:
!{kgtk_command} {kgtk_options} filter --verbose={verbose} --use-mgzip={use_mgzip} \
 --input-file {wikidata_parts_path}/sitelinks.qualifiers.{kgtk_extension} \
 -p '; sitelink-language ; en' -o {temp_folder_path}/sitelinks.language.en.{kgtk_extension}

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file {wikidata_parts_path}/sitelinks.{kgtk_extension} \
 --filter-on {temp_folder_path}/sitelinks.language.en.{kgtk_extension} \
 --output-file {wikidata_parts_path}/sitelinks.en.{kgtk_extension} \
 --input-keys  id \
 --filter-keys node1

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file {wikidata_parts_path}/sitelinks.qualifiers.{kgtk_extension} \
 --filter-on {temp_folder_path}/sitelinks.language.en.{kgtk_extension} \
 --output-file {wikidata_parts_path}/sitelinks.qualifiers.en.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys node1

### Partition the claims by Wikidata Property Datatype
Wikidata has two names for each Wikidata property datatype: the name that appears in the JSON dump file, and the name that appears in the TTL dump file. `kgtk import-wikidata` currently imports rows from Wikikdata JSON dump files, and these are the names that appear below.

The `part.other` file catches any records that have an unknown Wikidata property datatype. Additional Wikidata property datatypes may occur when processing from certain Wikidata extensions.

In [9]:
!{kgtk_command} {kgtk_options} filter --verbose={verbose} --use-mgzip={use_mgzip} --first-match-only \
 --input-file {wikidata_parts_path}/claims.{kgtk_extension} \
 --obj 'node2;wikidatatype' \
 -p ';; commonsMedia'      -o {wikidata_parts_path}/claims.commonsMedia.{kgtk_extension} \
 -p ';; external-id'       -o {wikidata_parts_path}/claims.external-id.{kgtk_extension} \
 -p ';; geo-shape'         -o {wikidata_parts_path}/claims.geo-shape.{kgtk_extension} \
 -p ';; globe-coordinate'  -o {wikidata_parts_path}/claims.globe-coordinate.{kgtk_extension} \
 -p ';; math'              -o {wikidata_parts_path}/claims.math.{kgtk_extension} \
 -p ';; monolingualtext'   -o {wikidata_parts_path}/claims.monolingualtext.{kgtk_extension} \
 -p ';; musical-notation'  -o {wikidata_parts_path}/claims.musical-notation.{kgtk_extension} \
 -p ';; quantity'          -o {wikidata_parts_path}/claims.quantity.{kgtk_extension} \
 -p ';; string'            -o {wikidata_parts_path}/claims.string.{kgtk_extension} \
 -p ';; tabular-data'      -o {wikidata_parts_path}/claims.tabular-data.{kgtk_extension} \
 -p ';; time'              -o {wikidata_parts_path}/claims.time.{kgtk_extension} \
 -p ';; url'               -o {wikidata_parts_path}/claims.url.{kgtk_extension} \
 -p ';; wikibase-form'     -o {wikidata_parts_path}/claims.wikibase-form.{kgtk_extension} \
 -p ';; wikibase-item'     -o {wikidata_parts_path}/claims.wikibase-item.{kgtk_extension} \
 -p ';; wikibase-lexeme'   -o {wikidata_parts_path}/claims.wikibase-lexeme.{kgtk_extension} \
 -p ';; wikibase-property' -o {wikidata_parts_path}/claims.wikibase-property.{kgtk_extension} \
 -p ';; wikibase-sense'    -o {wikidata_parts_path}/claims.wikibase-sense.{kgtk_extension} \
 --reject-file {wikidata_parts_path}/claims.other.{kgtk_extension}

### Partition the qualifiers
Extract the qualifier records for each of the Wikidata property datatype partition files.

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.commonsMedia.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.commonsMedia.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.external-id.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.external-id.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.geo-shape.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.geo-shape.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.globe-coordinate.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.globe-coordinate.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.math.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.math.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.monolingualtext.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.monolingualtext.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.musical-notation.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.musical-notation.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.quantity.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.quantity.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.string.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.string.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.tabular-data.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.tabular-data.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.time.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.time.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.url.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.url.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.wikibase-form.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.wikibase-form.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.wikibase-item.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.wikibase-item.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.wikibase-lexeme.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.wikibase-lexeme.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.wikibase-property.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.wikibase-property.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id

In [9]:
!{kgtk_command} {kgtk_options} ifexists --verbose={verbose} --use-mgzip={use_mgzip} --presorted \
 --input-file  {wikidata_parts_path}/qualifiers.{kgtk_extension} \
 --filter-on   {wikidata_parts_path}/claims.wikibase-sense.{kgtk_extension} \
 --output-file {wikidata_parts_path}/qualifiers.wikibase-sense.{kgtk_extension} \
 --input-keys  node1 \
 --filter-keys id