# Word Embedding

The most commonly used models for word embeddings are [word2vec](https://github.com/dav/word2vec/) and [GloVe](https://nlp.stanford.edu/projects/glove/) which are both unsupervised approaches based on the distributional hypothesis (words that occur in the same contexts tend to have similar meanings).


![Замещающий текст](https://developers.google.com/machine-learning/crash-course/images/linear-relationships.svg)
Image from [developers.google.com](https://developers.google.com/machine-learning/crash-course/embeddings/translating-to-a-lower-dimensional-space)

Word2Vec vectors even allow some mathematic operations on vectors. For example, in this operation we are using word2vec vectors for each word:

**king - man + woman = queen**

Another word embedding method is **Glove** (“Global Vectors”). It is based on matrix factorization techniques on the word-context matrix. It first constructs a large matrix of (words x context) co-occurrence information, i.e. for each “word” (the rows), you count how frequently we see this word in some “context” (the columns) in a large corpus. Then this matrix is factorized to a lower-dimensional (word x features) matrix, where each row now stores a vector representation for each word. In general, this is done by minimizing a “reconstruction loss”. This loss tries to find the lower-dimensional representations which can explain most of the variance in the high-dimensional data.

In [None]:
!pip install gensim

In [None]:
import gensim.downloader

In [None]:
model = gensim.downloader.load("glove-wiki-gigaword-50")

In [None]:
len(model['tower'])

In [None]:
model['man']

In [None]:
model['woman']

#### Perform the vector arithmetic: "king" - "man" + "woman"

In [None]:


# Perform the vector arithmetic: "king" - "man" + "woman"
result_vector = model['king'] - model['man'] + model['woman']

# Find the most similar words to the resulting vector
similar_words = model.similar_by_vector(result_vector, topn = 3)

print(similar_words)


#### Perform the vector arithmetic: "fater" - "man" + "woman"

In [None]:


result_vector = model['father'] - model['man'] + model['woman']

# Find the most similar words to the resulting vector
similar_words = model.similar_by_vector(result_vector, topn = 3)

print(similar_words)