# CSKG use case

In this notebook, we will *compute statistics and embeddings over a 0.1% random sample of our Commonsense Knowledge Graph (CSKG).* 

This sample contains 17,234 edges.

## Preparation

As a preparation for this notebook:
* please download a subset of the CSKG graph provided [here]()
* place it in the same directory as this notebook

## Computing statistics over the graph

Let's compute graph statistics: degrees, PageRank and HITS centrality, and other general graph descriptors.

In [1]:
%%bash
kgtk graph_statistics --directed --degrees --pagerank --hits --log cskg_summary.txt cskg_sample.tsv > cskg_stats.tsv

### Inspecting the output

This command has computed individual degree numbers, HITS hubs and authority values, and PageRank for all nodes in `cskg_stats.tsv`. Here are the last 10 lines of the file:

In [None]:
%%bash
tail cskg_stats.tsv

It has also generated an aggregated summary of these and other graph statistics in `cskg_summary.txt`. Let's print the contents of this file:

In [None]:
%%bash
cat cskg_summary.txt

## Computing embeddings

Another common operation is computing BERT-large embeddings over CSKG knowledge. Here is how:

In [7]:
%%bash
kgtk text_embedding --debug --embedding-projector-metadata-path none \
                    --embedding-projector-metadata-path none \
                    --label-properties "/r/Synonym" \
                    --isa-properties "/r/IsA" \
                    --description-properties "/r/DefinedAs" \
                    --property-value "/r/Causes" "/r/UsedFor" \
                    --has-properties "" \
                    -f kgtk_format \
                    --output-format kgtk_format \
                    --use-cache \
                    --model bert-large-nli-cls-token -i cskg_sample.tsv > cskg_embedings.txt

100%|██████████| 14184/14184 [33:19<00:00,  7.09it/s]


You can now inspect the embeddings in `cskg_embeddings.txt`.