## Installation
We'll use *Spacy* for our word embeddings.

Install Spacy:

```
conda install -c conda-forge spacy
```

Then download the core English model:
```
python -m spacy download en_core_web_md
```


## Resources
* [shanelynn blog: Python embeddings with Spacy and Gensim](https://www.shanelynn.ie/word-embeddings-in-python-with-spacy-and-gensim/)
* [Spacy docs on vector similarity](https://spacy.io/usage/vectors-similarity)

## Working with Word Vectors

In [1]:
import spacy
import os

os.environ['SPACY_WARNING_IGNORE'] = 'W008'
nlp = spacy.load('en_core_web_md')

In [2]:
doc = nlp('The car needed to be washed badly.')

In [3]:
doc[1]

car

That word has a vector.

In [4]:
type(doc[1].vector), len(doc[1].vector)

(numpy.ndarray, 300)

## Finding similar words

From this [reddit post](https://www.reddit.com/r/spacynlp/comments/63a5th/how_do_you_find_similar_word_vectors/) from Xeoncross.

In [55]:
def most_similar(word):
    queries = [w for w in word.vocab if w.has_vector and w.is_lower == word.is_lower and w.prob >= -15]
    by_similarity = sorted(queries, key=lambda w: word.similarity(w), reverse=True)
    return by_similarity[:10]

def word_strings(words):
    return [w.lower_ for w in words]

def similar_words(word_str):
    word = nlp(word_str)
    similar_words = most_similar(word)
    return word_strings(similar_words)

In [6]:
doc[1]

car

In [46]:
results = most_similar(nlp.vocab['car'])

In [47]:
[w.lower_ for w in results]

['car',
 'automobile',
 'automobiles',
 'cars',
 'vehicle',
 'truck',
 'auto',
 'driving',
 'vehicles',
 'showroom']

In [48]:
thimble, mailbox, barrel, pool, lake, ocean = nlp('thimble mailbox barrel pool lake ocean')

In [52]:
d = lake.vector - pool.vector

In [53]:
m = mailbox.vector + d

In [58]:
most_similar(m)

AttributeError: 'numpy.ndarray' object has no attribute 'vocab'