# Embedding
<div style="position: absolute; right:0;top:0"><a href="../evaluation.py.ipynb" style="text-decoration: none"> <font size="5">↑</font></a></div>

The Embedding Module is responsible for loading word and phrase embeddings and providing them to the other modules.

---

## General

Each embedding is specified as an entry in the configuration. Word embedding models are listed in `config.embeddings['C']` as 

```json
"identifier": {
    "name": STRING,
    "run": BOOLEAN,
    "mod": STRING,
    "cls": STRING,
    "token_info": {
        "mod": STRING,
        "cls": STRING
    },
    OPTIONAL_PARAMETERS
    }
```

and phrase embedding models in `config.embeddings['P']` as

```json
"identifier": {
    "name": STRING,
    "run": BOOLEAN,
    "mod": STRING,
    "cls": STRING,
    OPTIONAL_PARAMETERS
    }
```

with the following entries:

- `identifier` specifies the embedding. Must be unique within each class (`C` and`P`).
- `name` is a full name used for printing output
- `run` defines whether the evaluation script will run anything related to this embedding
- `mod` module path of the embedding model
- `cls` class name of the embedding model
- `token_info` the tokenizer used for the tokenization. For parameters see [tokenizer](../tokenizer/tokenizer.ipynb).
- `OPTIONAL_PARAMETERS` will usually be a filename or a URL of the trained model

---

## Word Embeddings (C)

During evaluation additional files will be created, e.g. `<id>.vocab.txt` containing the vocabulary of the embeddings for faster lookups.

### [Word2vec](./word2vec.py)
**Mod.Cls**  
`word2vec.Word2vecModel`  
**Parameters**  
`filename`  
**Install**  
Download a pretrained model from 
https://code.google.com/archive/p/word2vec/
(see "The archive is available here: GoogleNews-vectors-negative300.bin.gz")
and extract it to `data/embedding`.
Alternatively you may train a model yourself. Set the `filename` entry in your config to this file.

### GloVe (Currently not implemented)
Download one or multiple pretrained models from
https://nlp.stanford.edu/projects/glove/


### fastText (Currently not implemented)
https://fasttext.cc/docs/en/english-vectors.html 


---

## Phrase Embeddings (P)

### [Universal Sentence Encoder](./use.py)
**Mod.Cls**  
`use.USEModel`  
**Parameters**  
`module_url`  
**Install**
Will be automatically downloaded as a *TensorFlow Hub* module from the `module_url` specified in the config. See
https://tfhub.dev/google/universal-sentence-encoder/2