# Example Scenario 1: Generating embeddings for ConceptNet nodes

*Alice wishes to import the English subset of ConceptNet in KGTK format. Then, she would extract a subset of ConceptNet where two concepts are connected with a precise semantic relation, like `Causes` or `UsedFor` (as opposed to weaker relations like `/r/RelatedTo`). Text embeddings would be computed for all nodes in this subset, and saved in a file called `emb.txt`.*

## Preparation

To run this notebook, Alice would need the ConceptNet graph file. We will work with the latest ConceptNet, v5.7.0. Presumably, this file is not present on Alice's laptop, so we need to download and unpack it first (note: mac users might need to install `wget` first: `brew install wget`):

In [None]:
%%bash
wget https://s3.amazonaws.com/conceptnet/downloads/2019/edges/conceptnet-assertions-5.7.0.csv.gz

In [None]:
%%bash
gunzip conceptnet-assertions-5.7.0.csv.gz

## Implementation in KGTK

### Separate sort

In [5]:
%%bash
kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \
            filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " > tmp.tsv
kgtk newsort -c 1,2,3 tmp.tsv > sorted.tsv

### Sort together

In [None]:
# %%bash
# kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \
#             filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " / \
#             newsort -c 1,2,3 -o sorted.tsv #tmp.tsv

In [None]:
%%bash
kgtk text_embedding --debug --embedding-projector-metadata-path none \
                    --embedding-projector-metadata-path none \
                    --label-properties "/r/Synonym" \
                    --isa-properties "/r/IsA" \
                    --description-properties "/r/DefinedAs" \
                    --property-value "/r/Causes" "/r/UsedFor" \
                    --has-properties "" \
                    -f kgtk_format \
                    --output-format kgtk_format \
                    --use-cache \
                    --model bert-large-nli-cls-token -i sorted.tsv > emb.txt
                    

## Remarks

* sort does not work with pipes
* embeddings use unnatural relations