# Example Scenario 1: Generating embeddings for ConceptNet nodes

*Alice wishes to import the English subset of ConceptNet in KGTK format. Then, she would extract a subset of ConceptNet where two concepts are connected with a precise semantic relation, like `Causes` or `UsedFor` (as opposed to weaker relations like `/r/RelatedTo`). Text embeddings would be computed for all nodes in this subset, and saved in a file called `emb.txt`.*

**Note on the expected running time:** Running this notebook takes around half an hour on a Macbook Pro laptop with MacOS Catalina 10.15, a 2.3 GHz 8-Core Intel Core i9 processor, 2TB SSD disk, and 64 GB 2667 MHz DDR4 memory.

## Preparation

To run this notebook, Alice would need the ConceptNet graph file. We will work with the latest ConceptNet, v5.7.0. Presumably, this file is not present on Alice's laptop, so we need to download and unpack it first (note: mac users might need to install `wget` first: `brew install wget`):

In [None]:
%%bash
wget https://s3.amazonaws.com/conceptnet/downloads/2019/edges/conceptnet-assertions-5.7.0.csv.gz

In [None]:
%%bash
gunzip conceptnet-assertions-5.7.0.csv.gz

## Implementation in KGTK

We will select the relevant edges from ConceptNet and sort them (note that we extract three more relations which will be used to extract labels, descriptions, and inheritance by our embedding generator below).

Then we compute text embeddings. For demonstration purposes, we will compute embeddings based on the first 30k edges.

In [1]:
%%bash
kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \
            filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " / sort -c 1,2,3 \
            | head -30000 |
            kgtk text_embedding --debug --embedding-projector-metadata-path none \
                    --embedding-projector-metadata-path none \
                    --label-properties "/r/Synonym" \
                    --isa-properties "/r/IsA" \
                    --description-properties "/r/DefinedAs" \
                    --property-value "/r/Causes" "/r/UsedFor" \
                    --has-properties "" \
                    -f kgtk_format \
                    --output-format kgtk_format \
                    --use-cache \
                    --model bert-large-nli-cls-token \
                    > emb.txt                 

100%|██████████| 19571/19571 [29:14<00:00, 11.16it/s]


Let's inspect the result, by printing the first embedding:

In [2]:
!head -2 emb.txt

node	property	value
/c/en/astragalus_glycyphyllos/n/wn/plant	text_embedding	-0.38157165,-0.021805033,0.7940887,-1.5922968,0.52496123,-0.16233969,-0.19431037,1.0408834,0.8114325,0.3559178,0.61059636,-0.24603112,0.5337883,0.4534494,-0.29937816,0.090129025,-0.30235052,-0.6983496,-1.171757,0.9471463,0.9576315,0.6795303,-1.1980538,0.65520096,-0.59407276,0.28939876,-0.6164435,-0.2264376,1.5879735,0.31625852,-0.42459768,-0.43198207,0.22300366,-0.2425214,-0.5070722,-0.08494526,-0.6393699,0.18749073,0.48675346,-0.3822635,-0.22630893,-0.54952407,0.9476757,-0.4083498,0.83604693,0.043933608,-0.14449579,0.11305623,0.97173285,-0.39300558,-0.11612919,-0.055423833,1.3205411,-0.31344137,-0.75986946,-0.4466562,1.2067158,-0.5779269,-0.6896182,0.9956776,0.1057688,1.0690029,0.6207034,-1.3134612,0.643423,-0.040023815,-0.016186167,0.020421714,0.1719726,-0.41306853,-0.39602634,-0.28910923,-0.23621234,-0.056197245,-0.797897,0.20073217,0.3226663,-0.36887905,0.48313624,0.76007056,0.052330434,0.2493825,1.0534264,