In this notebook, we will learn how to train word2vec model in Tensorflow

* Code adapted from https://github.com/chiphuyen/stanford-tensorflow-tutorials/blob/master/examples/04_word2vec_eager.py

* Images adapted from http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

![title](w2vec.png)

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.python.ops import lookup_ops
import tensorflow.contrib.eager as tfe
import time
tf.enable_eager_execution()

In [2]:
tf.set_random_seed(42)

In [3]:
vocab_file = 'vocab.txt'
src_tgt_file = 'wiki.1M.txt.tokenized.src_tgt'

Let us checkout the file we just created! **How many lines does it have**?

Now, let us create the dataset, we will use vocab_table to convert word to integer!

### P2: Define dataset
* Convert word to indexes
* Batching

In [4]:
vocab_table = lookup_ops.index_table_from_file(vocab_file, default_value=0)

In [5]:
BATCH_SIZE=1024

In [6]:
dataset = tf.data.TextLineDataset(src_tgt_file)
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000, count=4, seed=42))
dataset = dataset.map(lambda line: tf.string_split([line]).values)
dataset = dataset.map(lambda words: vocab_table.lookup(words))
dataset = dataset.map(lambda words: (words[0], words[1]))
dataset = dataset.batch(BATCH_SIZE)

dataset = dataset.prefetch(1)
dataset_iter = iter(dataset)

In [7]:
next(dataset_iter) 

(<tf.Tensor: id=49, shape=(1024,), dtype=int64, numpy=array([1241,  508, 3151, ...,  329,   33,    8])>,
 <tf.Tensor: id=50, shape=(1024,), dtype=int64, numpy=array([22641,    57,   508, ...,    33,   145,     0])>)

In [8]:
V = vocab_table.size()
d = 128

In [9]:
class Word2Vec(tf.keras.Model):
    def __init__(self, V, d, num_sampled):
        super(Word2Vec, self).__init__()
        self.W = tfe.Variable(tf.random_uniform([V, d]))
        self.nce_W = tfe.Variable(tf.truncated_normal([V, d]))
        self.nce_b = tfe.Variable(tf.zeros(V))
        self.V = V
        self.num_sampled = num_sampled
        
    def compute_loss(self, src_words, tgt_words):
        word_vectors = tf.nn.embedding_lookup(self.W, src_words)
        loss = tf.reduce_mean(tf.nn.nce_loss(weights=self.nce_W, biases=self.nce_b, 
                              labels=tf.expand_dims(tgt_words, axis=1), 
                              inputs=word_vectors, 
                              num_sampled=self.num_sampled, num_classes=self.V))
        return loss
    def call(self, inputs):
        pass

In [10]:
w2vec = Word2Vec(V, d, num_sampled=5)

In [11]:
grad_fun = tfe.implicit_value_and_gradients(w2vec.compute_loss)
opt = tf.train.GradientDescentOptimizer(learning_rate=0.01)

In [12]:
train_step = 0
total_loss = 0.
STATS_STEP = 1000

start_time = time.time()
for src_words, tgt_words in dataset:
    loss_batch, gradients = grad_fun(src_words, tgt_words)
    total_loss += loss_batch
    opt.apply_gradients(gradients)
    train_step += 1
    if train_step % STATS_STEP == 0:
        time_taken = time.time() - start_time
        print(f'Step: {train_step} Loss: {total_loss/STATS_STEP} Time: {time_taken: 0.3f}')
        total_loss = 0.
        start_time = time.time()
print(f'Final Steps: {train_step}')

Step: 1000 Loss: 30.501628875732422 Time:  38.626
Step: 2000 Loss: 27.82444190979004 Time:  38.145
Step: 3000 Loss: 26.081783294677734 Time:  37.772
Step: 4000 Loss: 26.036054611206055 Time:  37.963
Step: 5000 Loss: 24.60239601135254 Time:  33.482
Step: 6000 Loss: 24.336790084838867 Time:  26.205
Step: 7000 Loss: 24.18866539001465 Time:  25.957
Step: 8000 Loss: 23.636003494262695 Time:  26.333
Step: 9000 Loss: 23.27335548400879 Time:  26.065
Step: 10000 Loss: 22.916406631469727 Time:  26.038
Step: 11000 Loss: 21.81801414489746 Time:  25.967
Step: 12000 Loss: 22.540122985839844 Time:  25.976
Step: 13000 Loss: 22.07505226135254 Time:  26.018
Step: 14000 Loss: 21.853620529174805 Time:  25.989
Step: 15000 Loss: 21.907264709472656 Time:  26.063
Step: 16000 Loss: 21.146833419799805 Time:  25.954
Step: 17000 Loss: 22.04213523864746 Time:  26.093
Step: 18000 Loss: 21.222883224487305 Time:  25.965
Step: 19000 Loss: 20.93439483642578 Time:  26.048
Step: 20000 Loss: 20.833341598510742 Time:  25.9

In [13]:
import os
checkpoint_prefix = os.path.join('ckpt', 'w2vec')

In [14]:
ckpt = tfe.Checkpoint(model=w2vec)

In [15]:
print(ckpt)

<tensorflow.python.training.checkpointable.util.Checkpoint object at 0x12c86ff28>


In [16]:
p = ckpt.save(checkpoint_prefix)

In [17]:
print(p)

ckpt/w2vec-1
