# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [8]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [9]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)

In [24]:
model = Word2Vec()

In [25]:
model?

[0;31mType:[0m           Word2Vec
[0;31mString form:[0m    Word2Vec(vocab=0, size=100, alpha=0.025)
[0;31mFile:[0m           ~/opt/anaconda3/lib/python3.7/site-packages/gensim/models/word2vec.py
[0;31mDocstring:[0m     
Train, use and evaluate neural networks described in https://code.google.com/p/word2vec/.

Once you're finished training a model (=no more updates, only querying)
store and use only the :class:`~gensim.models.keyedvectors.KeyedVectors` instance in `self.wv` to reduce memory.

The model can be stored/loaded via its :meth:`~gensim.models.word2vec.Word2Vec.save` and
:meth:`~gensim.models.word2vec.Word2Vec.load` methods.

The trained word vectors can also be stored/loaded from a format compatible with the
original word2vec implementation via `self.wv.save_word2vec_format`
and :meth:`gensim.models.keyedvectors.KeyedVectors.load_word2vec_format`.

Some important attributes are the following:

Attributes
----------
wv : :class:`~gensim.models.keyedvectors.Word2VecKeyed

In [26]:
model.build_vocab(train_data)

In [27]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 1min 52s, sys: 1.04 s, total: 1min 53s
Wall time: 38.2 s


(68100723, 81403200)

In [29]:
res = model.wv.most_similar('lemon chicken garlic onion'.split(), topn=20)
res

[('shallot', 0.6585192084312439),
 ('shallots', 0.6159201860427856),
 ('onions', 0.548940896987915),
 ('scallions', 0.5399220585823059),
 ('camphor', 0.509620726108551),
 ('oregano', 0.5023354887962341),
 ('curry', 0.4905508756637573),
 ('tomato', 0.4837029278278351),
 ('chile', 0.48241376876831055),
 ('pepper', 0.481892466545105),
 ('thyme', 0.4789288341999054),
 ('chili', 0.4683789312839508),
 ('parsley', 0.4667717218399048),
 ('cumin', 0.465359091758728),
 ('tomatoes', 0.4570711851119995),
 ('lime', 0.45315617322921753),
 ('marjoram', 0.45168212056159973),
 ('fennel', 0.448289155960083),
 ('jalapeño', 0.44708752632141113),
 ('leek', 0.4469062089920044)]

In [31]:
model.save('Data/w2v.model')