# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [2]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [3]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)

In [4]:
train_data[0]

['place',
 'chicken',
 'butter',
 'soup',
 'onion',
 'slow',
 'cooker',
 'water',
 'covercover',
 'cook',
 'hours',
 'high',
 'minutes',
 'serving',
 'place',
 'torn',
 'biscuit',
 'dough',
 'slow',
 'cooker',
 'cook',
 'dough',
 'longer',
 'raw',
 'center']

In [5]:
model = Word2Vec()

In [6]:
model.build_vocab(train_data)

In [7]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs= model.epochs)

CPU times: user 3min 6s, sys: 1.6 s, total: 3min 7s
Wall time: 1min 44s


(67227093, 80190450)

In [9]:
model.wv.most_similar(['salad', 'chicken'], topn =20)

[('dressing', 0.5827581882476807),
 ('pheasant', 0.5737828016281128),
 ('mesclun', 0.5582780241966248),
 ('caesar', 0.5325864553451538),
 ('turkey', 0.5268767476081848),
 ('slaw', 0.5225558280944824),
 ('vinaigrette', 0.520498514175415),
 ('watercress', 0.5151207447052002),
 ('romaine', 0.5108035206794739),
 ('salads', 0.5034649968147278),
 ('frisee', 0.4967920184135437),
 ('squab', 0.4952894151210785),
 ('lettuces', 0.48918211460113525),
 ('mizuna', 0.4856293797492981),
 ('coleslaw', 0.4844737648963928),
 ('frisée', 0.48388203978538513),
 ('quail', 0.4796750247478485),
 ('dressed', 0.46689847111701965),
 ('duck', 0.4657539129257202),
 ('greens', 0.4642626643180847)]

In [10]:
model.save('Data/w2v.model')