# Part 1: Usupervised learning

© Anatolii Stehnii, 2018

## Lecture 3: Other approaches

In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../custom.css", "r").read()
    return HTML(styles)
css_styling()

Word2vec is a classical example of how deep learning can be used to find word embeddings. However, for practical usage you can consider more novel and sophisticated methods. In this lecture I will give an overview of possible approaches.

### Gensim word2vec implementation

First of all, you probably shall not invent a bicycle and write your own word2vec for a practical task. Gensim already did this for you.

Besides fast and accurate implementation of Word2vec, their model also  contains useful tools, like finding most similar word given positive and negative words:
```python
>>> model.wv.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]
```

Practical example of Word2vec trained on custom corpus can be found in Yurii's Guts [Thrones2Vec](https://github.com/YuriyGuts/thrones2vec) project.

[Original paper](https://arxiv.org/pdf/1301.3781.pdf)

[Another paper](https://arxiv.org/pdf/1310.4546.pdf)

[Project page](https://radimrehurek.com/gensim/models/word2vec.html)

### GloVe: global vector representation

This methods, developed in Stanford, also uses a neural model to build an unsupervised probabilistic model of corpus; however, GloVe approach is different. Word2vec model minimize a cross-entropy loss of classifier, which predicts if a word will occur in prescence of another word. GloVe first counts a square matrix of word co-occurance log-probabilities and then tries to optimize word embeddings so dot product of two word vectors is equal to a value in this matrix for this two words. 

> The training objective of GloVe is to learn word vectors such that their dot product equals the logarithm of the words’ probability of co-occurrence.


![GloVe vs Word2vec performance](glove-vs-word2vec.png)

[Original paper](https://nlp.stanford.edu/pubs/glove.pdf)

[Project page](https://nlp.stanford.edu/projects/glove/)

[Nice article about topic](https://blog.acolyer.org/2016/04/22/glove-global-vectors-for-word-representation/)

### FastText

This method from Facebook research group trains skip-gram model not for exact words, but for character n-grams. Target vector is a Bag-Of-Grams vector of a context word. For example, word `<where>` can be represented by 3-grams:

```
<wh
whe
her
ere
re>
```

This approach has multiple advantages:

1. Input sparsity reduced.
2. Model can capture morphological similarity of words and incorporate this information.
3. Vectors for unknown words can be approximated.

[Original paper](https://arxiv.org/abs/1607.04606)

[Project GitHub](https://github.com/facebookresearch/fastText/#introduction)


### MUSE

This project addressed a problem of multilingual words representations. MUSE decribes methods of aligning of vector spaces between multiple languages:

>Supervised: using a train bilingual dictionary (or identical character strings as anchor points), learn a mapping from the source to the target space using (iterative) Procrustes alignment.

>Unsupervised: without any parallel data or anchor point, learn a mapping from the source to the target space using adversarial training and (iterative) Procrustes refinement.

![Words alignment](https://camo.githubusercontent.com/e8a19eb6772e722fb3fe2cd787e14ed7c4e17ddd/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6172726976616c2f6f75746c696e655f616c6c2e706e67)


[Project GitHub](https://github.com/facebookresearch/MUSE)

### Word embeddings problems

1. Word embedding for phrases ('New York', 'North Korea')
2. Polysemy - different meaning in different contexts ('river bank', 'financial bank')

### Further reading

0. Yet Another [Article](https://www.shanelynn.ie/get-busy-with-word-embeddings-introduction/) about word embeddings and word2vec.
1. Sebastian Ruder's [article](http://ruder.io/word-embeddings-2017/) about recent advances in word embeddings.
2. Wordeful [article](https://medium.com/@madrugado/advances-in-nlp-in-2017-part-ii-d8da391a3f01) of Valentin Malykh about other unsupervised methods used in NLP.
