# Skip-gram Word2Vec

In this notebook, I'll lead you through using PyTorch to implement the [Word2Vec algorithm](https://en.wikipedia.org/wiki/Word2vec) using the skip-gram architecture. By implementing this, you'll learn about embedding words for use in natural language processing. This will come in handy when dealing with things like machine translation.

## Readings

Here are the resources I used to build this notebook. I suggest reading these either beforehand or while you're working on this material.

* A really good [conceptual overview](http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/) of Word2Vec from Chris McCormick 
* [First Word2Vec paper](https://arxiv.org/pdf/1301.3781.pdf) from Mikolov et al.
* [Neural Information Processing Systems, paper](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) with improvements for Word2Vec also from Mikolov et al.

---
## Word embeddings

When you're dealing with words in text, you end up with tens of thousands of word classes to analyze; one for each word in a vocabulary. Trying to one-hot encode these words is massively inefficient because most values in a one-hot vector will be set to zero. So, the matrix multiplication that happens in between a one-hot input vector and a first, hidden layer will result in mostly zero-valued hidden outputs.

To solve this problem and greatly increase the efficiency of our networks, we use what are called **embeddings**. Embeddings are just a fully connected layer like you've seen before. We call this layer the embedding layer and the weights are embedding weights. We skip the multiplication into the embedding layer by instead directly grabbing the hidden layer values from the weight matrix. We can do this because the multiplication of a one-hot encoded vector with a matrix returns the row of the matrix corresponding the index of the "on" input unit.

<img src='assets/lookup_matrix.png' width=50%>

Instead of doing the matrix multiplication, we use the weight matrix as a lookup table. We encode the words as integers, for example "heart" is encoded as 958, "mind" as 18094. Then to get hidden layer values for "heart", you just take the 958th row of the embedding matrix. This process is called an **embedding lookup** and the number of hidden units is the **embedding dimension**.
 
There is nothing magical going on here. The embedding lookup table is just a weight matrix. The embedding layer is just a hidden layer. The lookup is just a shortcut for the matrix multiplication. The lookup table is trained just like any weight matrix.

Embeddings aren't only used for words of course. You can use them for any model where you have a massive number of classes. A particular type of model called **Word2Vec** uses the embedding layer to find vector representations of words that contain semantic meaning.

---
## Word2Vec

The Word2Vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words.

<img src="assets/context_drink.png" width=40%>

Words that show up in similar **contexts**, such as "coffee", "tea", and "water" will have vectors near each other. Different words will be further away from one another, and relationships can be represented by distance in vector space.


There are two architectures for implementing Word2Vec:
>* CBOW (Continuous Bag-Of-Words) and 
* Skip-gram

<img src="assets/word2vec_architectures.png" width=60%>

In this implementation, we'll be using the **skip-gram architecture** with **negative sampling** because it performs better than CBOW and trains faster with negative sampling. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.

---
## Loading Data

Next, we'll ask you to load in data and place it in the `data` directory

1. Load the [text8 dataset](https://s3.amazonaws.com/video.udacity-data.com/topher/2018/October/5bbe6499_text8/text8.zip); a file of cleaned up *Wikipedia article text* from Matt Mahoney. 
2. Place that data in the `data` folder in the home directory.
3. Then you can extract it and delete the archive, zip file to save storage space.

After following these steps, you should have one file in your data directory: `data/text8`.

In [1]:
# read in the extracted text file      
with open('data/text8') as f:
    text = f.read()

# print out the first 100 characters
print(text[:100])

 anarchism originated as a term of abuse first used against early working class radicals including t


## Pre-processing

Here I'm fixing up the text to make training easier. This comes from the `utils.py` file. The `preprocess` function does a few things:
>* It converts any punctuation into tokens, so a period is changed to ` <PERIOD> `. In this data set, there aren't any periods, but it will help in other NLP problems. 
* It removes all words that show up five or *fewer* times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations. 
* It returns a list of words in the text.

This may take a few seconds to run, since our text file is quite large. If you want to write your own functions for this stuff, go for it!

In [2]:
import utils

# get list of words
words = utils.preprocess(text)
print(words[:30])

['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the', 'diggers', 'of', 'the', 'english', 'revolution', 'and', 'the', 'sans', 'culottes', 'of', 'the', 'french', 'revolution', 'whilst']


In [3]:
# print some stats about this word data
print("Total words in text: {}".format(len(words)))
print("Unique words: {}".format(len(set(words)))) # `set` removes any duplicate words

Total words in text: 16680599
Unique words: 63641


### Dictionaries

Next, I'm creating two dictionaries to convert words to integers and back again (integers to words). This is again done with a function in the `utils.py` file. `create_lookup_tables` takes in a list of words in a text and returns two dictionaries.
>* The integers are assigned in descending frequency order, so the most frequent word ("the") is given the integer 0 and the next most frequent is 1, and so on. 

Once we have our dictionaries, the words are converted to integers and stored in the list `int_words`.

In [4]:
vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)
int_words = [vocab_to_int[word] for word in words]

print(int_words[:30])

[5233, 3080, 11, 5, 194, 1, 3133, 45, 58, 155, 127, 741, 476, 10571, 133, 0, 27349, 1, 0, 102, 854, 2, 0, 15067, 58112, 1, 0, 150, 854, 3580]


## Subsampling

Words that show up often such as "the", "of", and "for" don't provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word $w_i$ in the training set, we'll discard it with probability given by 

$$ P(w_i) = 1 - \sqrt{\frac{t}{f(w_i)}} $$

where $t$ is a threshold parameter and $f(w_i)$ is the frequency of word $w_i$ in the total dataset.

> Implement subsampling for the words in `int_words`. That is, go through `int_words` and discard each word given the probablility $P(w_i)$ shown above. Note that $P(w_i)$ is the probability that a word is discarded. Assign the subsampled data to `train_words`.

In [15]:
from collections import Counter
import random
import numpy as np

threshold = 1e-5
word_counts = Counter(int_words)
print(list(word_counts.items())[0])  # dictionary of int_words, how many times they appear

# discard some frequent words, according to the subsampling equation
# create a new list of words for training
total_count = len(int_words)
freqs = {word: count/total_count for word, count in word_counts.items()}
probabilities = 1 - np.sqrt(threshold / (np.array(list(word_counts.values())) / total_count))
probability_lookup = { word: prob for word, prob in zip(word_counts.keys(), probabilities) }
train_words = [word for word in int_words if random.random() > probability_lookup[word]]

print("Before: ", len(int_words))
print("After:", len(train_words))
print(train_words[:30])

(5233, 303)
Before:  16680599
After: 4628989
[5233, 3133, 58, 741, 10571, 27349, 15067, 58112, 3580, 190, 10712, 3672, 7088, 5233, 44611, 2877, 186, 5233, 2621, 8983, 279, 4147, 141, 1137, 4860, 6753, 7573, 247, 11064, 51]


## Making batches

Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture, for each word in the text, we want to define a surrounding _context_ and grab all the words in a window around that word, with size $C$. 

From [Mikolov et al.](https://arxiv.org/pdf/1301.3781.pdf): 

"Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples... If we choose $C = 5$, for each training word we will select randomly a number $R$ in range $[ 1: C ]$, and then use $R$ words from history and $R$ words from the future of the current word as correct labels."

> **Exercise:** Implement a function `get_target` that receives a list of words, an index, and a window size, then returns a list of words in the window around the index. Make sure to use the algorithm described above, where you chose a random number of words to from the window.

Say, we have an input and we're interested in the idx=2 token, `741`: 
```
[5233, 58, 741, 10571, 27349, 0, 15067, 58112, 3580, 58, 10712]
```

For `R=2`, `get_target` should return a list of four values:
```
[5233, 58, 10571, 27349]
```

In [6]:
def get_target(words, idx, window_size=5):
    ''' Get a list of words in a window around an index. '''

    word_range = random.randint(1, window_size)
    
    return words[max(0, idx - word_range):idx] + words[idx + 1:min(len(words), idx + 1 + word_range)]

In [7]:
# test your code!

# run this cell multiple times to check for random window selection
int_text = [i for i in range(10)]
print('Input: ', int_text)
idx=5 # word index of interest

target = get_target(int_text, idx=idx, window_size=5)
print('Target: ', target)  # you should get some indices around the idx

Input:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Target:  [3, 4, 6, 7]


### Generating Batches 

Here's a generator function that returns batches of input and target data for our model, using the `get_target` function from above. The idea is that it grabs `batch_size` words from a words list. Then for each of those batches, it gets the target words in a window.

In [8]:
def get_batches(words, batch_size, window_size=5):
    ''' Create a generator of word batches as a tuple (inputs, targets) '''
    
    n_batches = len(words)//batch_size
    
    # only full batches
    words = words[:n_batches*batch_size]
    
    for idx in range(0, len(words), batch_size):
        x, y = [], []
        batch = words[idx:idx+batch_size]
        for ii in range(len(batch)):
            batch_x = batch[ii]
            batch_y = get_target(batch, ii, window_size)
            y.extend(batch_y)
            x.extend([batch_x]*len(batch_y))
        yield x, y
    

In [9]:
int_text = [i for i in range(20)]
x,y = next(get_batches(int_text, batch_size=4, window_size=5))

print('x\n', x)
print('y\n', y)

x
 [0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 3]
y
 [1, 2, 3, 0, 2, 3, 1, 3, 0, 1, 2]


---
## Validation

Here, I'm creating a function that will help us observe our model as it learns. We're going to choose a few common words and few uncommon words. Then, we'll print out the closest words to them using the cosine similarity: 

<img src="assets/two_vectors.png" width=30%>

$$
\mathrm{similarity} = \cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}||\vec{b}|}
$$


We can encode the validation words as vectors $\vec{a}$ using the embedding table, then calculate the similarity with each word vector $\vec{b}$ in the embedding table. With the similarities, we can print out the validation words and words in our embedding table semantically similar to those words. It's a nice way to check that our embedding table is grouping together words with similar semantic meanings.

In [10]:
def cosine_similarity(embedding, valid_size=16, valid_window=100, device='cpu'):
    """ Returns the cosine similarity of validation words with words in the embedding matrix.
        Here, embedding should be a PyTorch embedding module.
    """
    
    # Here we're calculating the cosine similarity between some random words and 
    # our embedding vectors. With the similarities, we can look at what words are
    # close to our random words.
    
    # sim = (a . b) / |a||b|
    
    embed_vectors = embedding.weight
    
    # magnitude of embedding vectors, |b|
    magnitudes = embed_vectors.pow(2).sum(dim=1).sqrt().unsqueeze(0)
    
    # pick N words from our ranges (0,window) and (1000,1000+window). lower id implies more frequent 
    valid_examples = np.array(random.sample(range(valid_window), valid_size//2))
    valid_examples = np.append(valid_examples,
                               random.sample(range(1000,1000+valid_window), valid_size//2))
    valid_examples = torch.LongTensor(valid_examples).to(device)
    
    valid_vectors = embedding(valid_examples)
    similarities = torch.mm(valid_vectors, embed_vectors.t())/magnitudes
        
    return valid_examples, similarities

---
# SkipGram model

Define and train the SkipGram model. 
> You'll need to define an [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) and a final, softmax output layer.

An Embedding layer takes in a number of inputs, importantly:
* **num_embeddings** – the size of the dictionary of embeddings, or how many rows you'll want in the embedding weight matrix
* **embedding_dim** – the size of each embedding vector; the embedding dimension

Below is an approximate diagram of the general structure of our network.
<img src="assets/skip_gram_arch.png" width=60%>

>* The input words are passed in as batches of input word tokens. 
* This will go into a hidden layer of linear units (our embedding layer). 
* Then, finally into a softmax output layer. 

We'll use the softmax layer to make a prediction about the context words by sampling, as usual.

---
## Negative Sampling

For every example we give the network, we train it using the output from the softmax layer. That means for each input, we're making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We'll update the weights for the correct example, but only a small number of incorrect, or noise, examples. This is called ["negative sampling"](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf). 

There are two modifications we need to make. First, since we're not taking the softmax output over all the words, we're really only concerned with one output word at a time. Similar to how we use an embedding table to map the input word to the hidden layer, we can now use another embedding table to map the hidden layer to the output word. Now we have two embedding layers, one for input words and one for output words. Secondly, we use a modified loss function where we only care about the true example and a small subset of noise examples.

$$
- \large \log{\sigma\left(u_{w_O}\hspace{0.001em}^\top v_{w_I}\right)} -
\sum_i^N \mathbb{E}_{w_i \sim P_n(w)}\log{\sigma\left(-u_{w_i}\hspace{0.001em}^\top v_{w_I}\right)}
$$

This is a little complicated so I'll go through it bit by bit. $u_{w_O}\hspace{0.001em}^\top$ is the embedding vector for our "output" target word (transposed, that's the $^\top$ symbol) and $v_{w_I}$ is the embedding vector for the "input" word. Then the first term 

$$\large \log{\sigma\left(u_{w_O}\hspace{0.001em}^\top v_{w_I}\right)}$$

says we take the log-sigmoid of the inner product of the output word vector and the input word vector. Now the second term, let's first look at 

$$\large \sum_i^N \mathbb{E}_{w_i \sim P_n(w)}$$ 

This means we're going to take a sum over words $w_i$ drawn from a noise distribution $w_i \sim P_n(w)$. The noise distribution is basically our vocabulary of words that aren't in the context of our input word. In effect, we can randomly sample words from our vocabulary to get these words. $P_n(w)$ is an arbitrary probability distribution though, which means we get to decide how to weight the words that we're sampling. This could be a uniform distribution, where we sample all words with equal probability. Or it could be according to the frequency that each word shows up in our text corpus, the unigram distribution $U(w)$. The authors found the best distribution to be $U(w)^{3/4}$, empirically. 

Finally, in 

$$\large \log{\sigma\left(-u_{w_i}\hspace{0.001em}^\top v_{w_I}\right)},$$ 

we take the log-sigmoid of the negated inner product of a noise vector with the input vector. 

<img src="assets/neg_sampling_loss.png" width=50%>

To give you an intuition for what we're doing here, remember that the sigmoid function returns a probability between 0 and 1. The first term in the loss pushes the probability that our network will predict the correct word $w_O$ towards 1. In the second term, since we are negating the sigmoid input, we're pushing the probabilities of the noise words towards 0.

In [11]:
import torch
from torch import nn
import torch.optim as optim

In [12]:
class SkipGramNeg(nn.Module):
    def __init__(self, n_vocab, n_embed, noise_dist=None):
        super().__init__()
        
        self.n_vocab = n_vocab
        self.n_embed = n_embed
        self.noise_dist = noise_dist
        
        # define embedding layers for input and output words
        self.in_embed = nn.Embedding(n_vocab, n_embed)
        self.out_embed = nn.Embedding(n_vocab, n_embed)
        
        # Initialize both embedding tables with uniform distribution
        # I believe this helps with convergence
        self.in_embed.weight.data.uniform_(-1, 1)
        self.out_embed.weight.data.uniform_(-1, 1)
        
    def forward_input(self, input_words):
        # return input vector embeddings
        return self.in_embed(input_words)
    
    def forward_output(self, output_words):
        # return output vector embeddings
        return self.out_embed(output_words)
    
    def forward_noise(self, batch_size, n_samples):
        """ Generate noise vectors with shape (batch_size, n_samples, n_embed)"""
        if self.noise_dist is None:
            # Sample words uniformly
            noise_dist = torch.ones(self.n_vocab)
        else:
            noise_dist = self.noise_dist
            
        # Sample words from our noise distribution
        noise_words = torch.multinomial(noise_dist,
                                        batch_size * n_samples,
                                        replacement=True)
        
        device = "cuda" if model.out_embed.weight.is_cuda else "cpu"
        noise_words = noise_words.to(device)
        
        ## get the noise embeddings
        # reshape the embeddings so that they have dims (batch_size, n_samples, n_embed)
        return self.out_embed(noise_words).view(batch_size, n_samples, self.n_embed)

In [13]:
class NegativeSamplingLoss(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, input_vectors, output_vectors, noise_vectors):
        
        batch_size, embed_size = input_vectors.shape
        
        # Input vectors should be a batch of column vectors
        input_vectors = input_vectors.view(batch_size, embed_size, 1)
        
        # Output vectors should be a batch of row vectors
        output_vectors = output_vectors.view(batch_size, 1, embed_size)
        
        # bmm = batch matrix multiplication
        # correct log-sigmoid loss
        out_loss = torch.bmm(output_vectors, input_vectors).sigmoid().log()
        out_loss = out_loss.squeeze()
        
        # incorrect log-sigmoid loss
        noise_loss = torch.bmm(noise_vectors.neg(), input_vectors).sigmoid().log()
        noise_loss = noise_loss.squeeze().sum(1)  # sum the losses over the sample of noise vectors

        # negate and sum correct and noisy log-sigmoid losses
        # return average batch loss
        return -(out_loss + noise_loss).mean()

### Training

Below is our training loop, and I recommend that you train on GPU, if available.

In [16]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Get our noise distribution
# Using word frequencies calculated earlier in the notebook
word_freqs = np.array(sorted(freqs.values(), reverse=True))
unigram_dist = word_freqs/word_freqs.sum()
noise_dist = torch.from_numpy(unigram_dist**(0.75)/np.sum(unigram_dist**(0.75)))

# instantiating the model
embedding_dim = 300
model = SkipGramNeg(len(vocab_to_int), embedding_dim, noise_dist=noise_dist).to(device)

# using the loss that we defined
criterion = NegativeSamplingLoss() 
optimizer = optim.Adam(model.parameters(), lr=0.003)

print_every = 1500
steps = 0
epochs = 5

# train for some number of epochs
for e in range(epochs):
    
    # get our input, target batches
    for input_words, target_words in get_batches(train_words, 32):
        steps += 1
        inputs, targets = torch.LongTensor(input_words), torch.LongTensor(target_words)
        inputs, targets = inputs.to(device), targets.to(device)
        
        # input, outpt, and noise vectors
        input_vectors = model.forward_input(inputs)
        output_vectors = model.forward_output(targets)
        noise_vectors = model.forward_noise(inputs.shape[0], 5)

        # negative sampling loss
        loss = criterion(input_vectors, output_vectors, noise_vectors)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # loss stats
        if steps % print_every == 0:
            print("Epoch: {}/{}".format(e+1, epochs))
            print("Loss: ", loss.item()) # avg batch loss at this point in training
            valid_examples, valid_similarities = cosine_similarity(model.in_embed, device=device)
            _, closest_idxs = valid_similarities.topk(6)

            valid_examples, closest_idxs = valid_examples.to('cpu'), closest_idxs.to('cpu')
            for ii, valid_idx in enumerate(valid_examples):
                closest_words = [int_to_vocab[idx.item()] for idx in closest_idxs[ii]][1:]
                print(int_to_vocab[valid_idx.item()] + " | " + ', '.join(closest_words))
            print("...\n")

Epoch: 1/5
Loss:  13.300207138061523
if | iulia, cryptanalysis, arching, informational, gj
have | panzers, gamespot, erasmus, commandments, naturally
and | belgique, experiential, nausica, code, windmill
after | noonan, mirror, sieves, predominate, kruspe
war | delos, constitutionally, chalons, pic, triumphal
they | fredrik, carabiners, liebert, impassable, remini
between | bikers, eks, pronounced, refers, miners
of | romanisation, patriarchate, low, cori, objections
orthodox | integrator, salomon, greenfacts, sloppy, madelyne
universe | startled, congreve, supranationalism, auric, shining
behind | striped, mag, emmanuelle, tengri, hermannus
powers | injunctions, amn, pulsed, underway, signage
question | ethiopian, portmanteaus, nearer, signed, tournament
event | ardant, artworks, endemol, pod, weissmuller
troops | familycolor, brotherly, usefulness, marceau, drove
...

Epoch: 1/5
Loss:  10.79105281829834
people | juggalos, ed, scientology, astrakhan, julmust
american | some, rethinkin

Epoch: 1/5
Loss:  5.484482765197754
history | mcelroy, atheism, newspaper, corp, comedian
d | eight, six, one, four, french
b | eight, parity, seven, nine, writer
six | seven, nine, d, three, win
war | ingold, ringing, cornerstone, tomb, learnt
would | hawai, calculator, archimedes, midian, arafat
years | person, agreements, bipedal, greenwich, elk
where | derby, asparagales, theorems, motorway, melanippe
hold | logie, ifv, task, airport, caloric
know | classified, webmineral, owes, stays, launched
event | refrained, beginning, judge, centrality, void
bill | akh, otto, backside, elvish, danish
animals | iud, supervisor, facial, rykodisc, malmsteen
universe | golgi, startled, enumerative, xxiii, shining
applied | mohawk, hymns, action, stosunku, rampart
heavy | pilate, landholding, expenditures, hitler, australis
...

Epoch: 1/5
Loss:  4.977888584136963
use | curae, squads, vanderbilt, ethereal, imbued
or | have, mariculture, to, is, agreed
than | journalists, enhanced, also, untimely, 

Epoch: 1/5
Loss:  5.392176151275635
on | blue, simulation, dendrites, three, axles
that | cases, hands, in, is, vs
years | hoffmann, gaku, specifically, proportionally, their
are | devoutly, harmless, length, into, offshoot
only | soprano, concerto, bypassed, significantly, of
seven | zero, companionate, john, film, isbn
when | sandow, have, ecuador, valence, assume
i | bysshe, burley, reimbursement, jurors, boxers
recorded | debunking, quarreled, maneuvers, electric, elmira
applied | be, shamil, kishore, bouncer, banat
something | hindering, slacks, vagantes, nemo, ascertained
hold | ripping, sierpinski, fillings, scapa, demyelinating
question | devin, lessons, wakefulness, oceanian, terminates
creation | saarland, bean, accompli, bek, beaver
police | along, problematic, turbofans, cambria, antonym
ocean | sheffield, twisted, fifty, mebyon, west
...

Epoch: 1/5
Loss:  5.024831295013428
can | brain, ability, each, number, produce
between | receive, quantization, applesoft, financial, o

Epoch: 1/5
Loss:  4.395859241485596
the | and, of, zero, with, or
not | concerning, that, drugs, although, cannot
had | adopted, grew, offices, synergistic, he
other | with, caused, about, swelling, statistically
first | five, official, kr, abusive, aesop
on | when, kelsey, sportsman, of, into
with | the, light, to, easily, into
its | those, in, both, with, uncles
articles | sentinel, necessitated, sammet, scholars, proteles
discovered | managed, aid, around, changing, scorned
applied | mika, spreadsheet, nomen, demonstrated, usage
accepted | distorted, transference, ideological, unified, insubordination
mathematics | theorem, if, theory, omphalos, g
bbc | radio, durkheim, www, canadian, projects
derived | word, thoughts, frans, can, anthropologists
powers | indeed, indictment, pavilion, discordian, misinterpretation
...

Epoch: 1/5
Loss:  3.7877464294433594
world | united, arts, capture, gorge, goldsworthy
people | from, community, whether, next, of
years | university, expending, live

Epoch: 1/5
Loss:  3.8311712741851807
were | mask, have, support, actions, practices
however | being, from, work, practice, good
zero | two, four, five, nine, one
they | did, during, claims, except, turgot
this | because, explicitly, systems, alternative, and
and | the, to, part, of, in
over | independent, ru, commercial, oil, gold
known | the, intensified, was, other, backpack
universe | primordial, originally, fictional, vision, matter
additional | manner, sequence, modern, random, scroll
joseph | nobel, harold, born, jan, tommy
paris | oscars, freescale, sobel, mbe, prefixing
assembly | government, elections, provincial, foreign, court
brother | corrie, anniversary, song, his, series
heavy | machines, surfaces, mongi, isolated, explosions
channel | subculture, mhz, columbus, five, satellite
...

Epoch: 1/5
Loss:  4.216150283813477
b | z, c, h, y, x
were | old, last, typically, lived, confederate
while | to, different, because, there, confusion
that | what, these, criticism, avoid, po

Epoch: 1/5
Loss:  3.7045204639434814
as | neutral, by, have, identify, a
however | proposed, physical, phase, although, equations
or | a, are, some, occur, it
in | one, eight, nine, from, six
one | nine, three, eight, six, four
his | fellow, death, b, brother, became
have | as, nevertheless, used, argue, controversies
known | large, and, direction, jolie, complex
older | per, copy, poverty, move, larger
ice | wound, losses, baum, cheer, mench
nobel | prize, american, philosopher, physicist, thomas
police | brought, plans, mr, promote, independent
pope | emperor, testament, nicene, bishop, ecumenical
applications | interface, data, application, encoding, software
liberal | political, conservatives, party, affairs, parties
applied | crick, tools, exactly, weaker, data
...

Epoch: 1/5
Loss:  3.2987215518951416
on | commonly, cause, ball, third, with
use | is, providing, example, computers, stored
system | data, standard, info, operating, signal
one | nine, three, six, four, eight
have | d

Epoch: 1/5
Loss:  4.174583435058594
first | cross, crowned, soldier, constellation, belgae
also | spins, in, religious, alternate, thus
of | their, family, the, were, by
this | to, the, power, their, attack
are | precisely, pronoun, for, accuracy, situations
years | colleagues, waited, economic, officials, age
b | calculated, f, h, we, mathcal
zero | three, nine, seven, one, six
applied | element, activation, psychology, manipulation, biological
numerous | provide, southeast, territories, most, divided
units | gtk, retailers, construction, product, imports
magazine | http, mix, mp, frank, website
channel | providers, wireless, broadcasts, serial, team
shown | coworkers, compiler, spectral, above, previous
recorded | showing, gorge, aired, save, gt
articles | page, http, com, documents, web
...

Epoch: 1/5
Loss:  2.7881016731262207
on | is, further, when, s, so
however | warned, testimony, by, which, and
in | the, and, had, one, organization
most | example, chinese, initial, legal, asse

Epoch: 1/5
Loss:  3.3079795837402344
state | liberal, party, delegates, states, belgium
people | that, protestants, dancing, your, rabbis
first | eight, four, five, detroit, manager
of | the, according, a, to, are
american | d, b, eight, six, actor
one | d, six, b, nine, seven
their | helped, guests, mails, nails, armenians
no | s, yorker, and, altered, bauer
older | inflation, householder, poverty, styrofoam, total
mean | whole, method, shall, argument, elliptic
experience | theme, feelings, met, extensive, described
centre | mexico, railway, coastal, cork, paved
except | shavuot, inhabitants, hypothetical, seats, adenauer
ice | sandy, dry, wound, tennis, choice
brother | composer, elder, edward, succession, she
taking | challenge, restoring, ideology, condemned, supporter
...

Epoch: 1/5
Loss:  4.041265964508057
from | the, on, this, in, bertold
known | duchies, one, lessons, aloe, as
two | seven, nine, june, b, six
an | subsequently, did, promote, ars, signs
his | he, for, the, seri

Epoch: 1/5
Loss:  4.3788557052612305
were | materials, require, format, entire, squirrels
had | advance, warfare, tagg, colby, unrest
american | deaths, musicians, entertainers, novelists, geneticist
which | especially, for, least, to, confirmed
also | referred, human, includes, may, grammar
into | on, today, buckham, idiosyncratic, book
be | different, because, exists, risk, method
to | next, with, a, and, legally
quite | item, added, correctly, typically, optical
cost | decentralization, process, infrastructure, regulation, contribute
numerous | productive, community, society, concentrated, offbeat
institute | sciences, educational, external, graduate, association
scale | product, is, battery, frequency, measured
square | inland, metres, east, limestone, extends
road | bridges, lane, uss, indiana, scott
behind | money, guarino, held, okedo, typically
...

Epoch: 1/5
Loss:  4.546967029571533
often | systems, simple, hapless, mathematics, scientifically
in | zero, examples, division, r

Epoch: 1/5
Loss:  5.134140968322754
the | to, of, years, addition, i
i | the, them, shenouda, asked, authenticity
after | ramsay, carlito, threw, spy, f
these | correctly, than, ears, and, to
many | lugo, each, any, break, intensity
three | eight, one, zero, five, seven
four | eight, one, zero, seven, three
into | korfball, badges, moreover, has, is
taking | illusion, velocity, constants, ideally, adding
brother | duke, bourbon, henry, heir, rivera
magazine | thanks, itv, magazines, television, guide
proposed | coincides, between, probes, economic, developing
accepted | dogma, isolation, this, zvi, monastic
operating | computer, computers, version, pc, unix
pre | development, linguistic, reasons, asian, doubling
older | versions, notable, words, viswanathan, receive
...

Epoch: 1/5
Loss:  4.385861873626709
up | falcons, raiders, christmas, less, gun
war | fought, civilians, commanders, country, military
if | we, case, commutative, log, simple
has | acoustically, truncation, fairly, add

Epoch: 2/5
Loss:  4.402277946472168
first | consecutive, and, at, th, equivalent
over | accession, went, almagro, unstable, coal
they | legs, distinctive, sink, more, variations
six | three, eight, five, zero, one
years | the, honor, qur, israelites, old
after | condemned, deacon, frankfurt, gallus, anastasius
b | d, physicist, composer, pioneer, laureate
eight | six, seven, nine, two, one
account | buried, kotorska, revealing, orthodox, calvinism
prince | king, britain, princess, vi, iv
alternative | systems, information, widely, monitoring, compatible
behind | phoenix, played, legs, shoulder, pyrrhus
lived | ascension, madhya, son, aleksandr, elijah
bill | ricky, jr, songwriter, footballer, rick
writers | ancient, christian, bible, places, bibliography
square | junction, ft, plateaus, gardens, miles
...

Epoch: 2/5
Loss:  4.495152473449707
that | sound, unable, protecting, is, be
were | eventually, well, officially, charlemagne, keep
four | seven, one, zero, five, nine
as | even, loa

Epoch: 2/5
Loss:  4.115614414215088
is | and, i, universe, has, not
th | five, afc, miami, nfl, divisional
s | book, of, i, israel, mark
who | him, mary, job, the, god
people | lives, adjective, testimony, influence, story
for | most, than, taken, considered, on
it | to, beyond, if, would, messenger
its | has, reach, almost, and, wings
consists | horizontal, subunits, vowels, relative, consonants
liberal | liberals, parliament, mps, compromise, policies
hold | position, throw, oujda, balancing, compulsory
recorded | tracks, hendrix, piercy, band, companionship
police | workers, policy, involvement, reagan, justice
placed | into, surface, lower, use, boundary
proposed | predicted, ousterhout, expression, lieu, proving
question | consider, ethical, explicit, atheist, discussed
...

Epoch: 2/5
Loss:  4.200388431549072
has | point, usually, multiple, level, contact
would | poor, nazareth, draw, lead, fairest
war | treaty, ascent, retaliation, attacks, drafted
be | says, our, sure, bit, qui

Epoch: 2/5
Loss:  3.9268102645874023
is | usually, can, distinct, size, values
has | identified, into, all, foregrip, funds
state | college, university, wisconsin, intercity, graduate
have | term, itself, shown, larwood, projects
war | guerrilla, dictatorship, regime, fascism, commanders
five | four, one, eight, nine, zero
if | we, finite, frac, subsets, equivalence
may | rather, long, this, used, see
san | francisco, university, california, county, bay
mathematics | fundamental, algebra, reasoning, proofs, geometric
grand | pez, carnival, winner, snooker, navy
older | using, lowering, median, compared, identify
liberal | party, liberals, mps, democrats, parties
experience | medications, sensations, ought, encourages, respond
construction | entire, unprofitable, networks, buildings, device
pope | constantinople, dukes, archbishop, iv, pius
...

Epoch: 2/5
Loss:  3.7851803302764893
six | two, zero, one, nine, four
new | york, area, zealand, park, victoria
is | usually, known, distinct, 

Epoch: 2/5
Loss:  4.631026744842529
where | given, dimensional, if, surface, field
th | rd, nd, nine, nfc, playoffs
one | eight, six, five, zero, nine
other | faces, related, types, using, instruments
been | has, appeared, aware, causing, led
it | quality, they, shell, end, sound
united | canada, national, states, association, charter
some | by, structure, has, the, of
pressure | vapor, heat, measured, shock, particle
hold | insists, metaphysical, kou, ten, casting
smith | glenn, robinson, peter, anthony, politician
active | nearly, trips, extremely, fermium, communities
derived | speakers, characteristic, separate, this, ionic
alternative | engineering, electronic, reproduction, improvements, register
older | recently, julian, females, format, words
award | warren, awards, braves, champion, league
...

Epoch: 2/5
Loss:  4.537525177001953
were | by, was, songs, his, led
than | designs, speed, easier, needed, execute
into | ranging, car, from, generation, across
so | transformation, top

Epoch: 2/5
Loss:  4.010573387145996
been | maintain, characteristics, greatly, their, material
people | pym, essais, pan, asian, imitate
this | if, shows, sequences, weak, category
only | others, segments, becoming, systems, physical
who | and, her, beloved, person, his
from | france, eight, war, administration, one
other | are, complex, case, of, such
known | apertures, conventionally, infinity, artois, block
units | unit, density, q, kw, coefficient
smith | massachusetts, glenn, eddie, laureate, simmons
heavy | wax, oil, bomb, derisively, prepared
issue | threaten, amended, communist, freedoms, democrats
operating | platforms, controllers, os, dos, windows
mathematics | logic, generalization, rational, mathematical, introductory
prince | emperor, throne, married, sister, emperors
arts | michigan, education, schools, buchanan, dance
...

Epoch: 2/5
Loss:  3.900251626968384
be | but, does, any, same, an
where | based, aspx, closed, klaip, condorcet
it | said, a, possibility, must, any


Epoch: 2/5
Loss:  4.6911444664001465
two | one, eight, six, october, three
by | against, defeated, armies, egyptian, led
when | is, have, a, case, sky
at | peak, on, warsaw, time, most
known | horace, finger, christians, valentinian, garibaldi
will | do, involved, single, manually, speed
have | are, origin, themselves, primarily, any
has | substances, an, other, have, of
existence | calculating, reality, universe, explain, evolution
san | santa, armenia, avenue, town, madrid
running | stroke, receivers, timing, ride, pad
behind | upset, make, looks, examination, asked
except | legionnaires, distinguish, aspartic, round, consists
pope | constantinople, papacy, patriarch, byzantine, princes
stage | parkman, culture, figures, naa, humor
account | what, torah, intended, extinction, claimed
...

Epoch: 2/5
Loss:  3.8836374282836914
also | churches, at, orthodox, aqua, shem
people | sought, polls, agreements, declaration, interests
with | j, is, him, others, words
was | against, that, war, h

Epoch: 2/5
Loss:  4.2930803298950195
and | attributed, formations, in, marine, insects
world | dominated, influence, nomadic, spent, decades
been | reporting, reform, responsible, purposes, promoting
be | each, processed, not, usually, encode
his | jack, affair, child, alleged, lover
most | meaning, range, shaivism, also, means
six | seven, zero, eight, one, nine
who | diplomatically, would, role, child, happy
egypt | arab, palestine, empire, tajikistan, mesopotamia
freedom | semitism, inquiry, totalitarian, individualism, anarchists
orthodox | churches, religions, holy, spirit, prayer
articles | archive, article, review, com, documents
know | want, really, immoral, instances, inference
mainly | populated, surrounding, settlers, inhabiting, vassal
discovered | treatise, asteroid, graf, fossils, cloths
lived | scandinavian, ethnically, descent, underworld, tribe
...

Epoch: 2/5
Loss:  4.474511623382568
may | procedure, thereafter, senate, when, does
nine | one, eight, seven, zero, five


Epoch: 2/5
Loss:  4.502342224121094
new | home, series, united, usa, train
some | frequently, itself, ensure, though, established
united | washington, national, republican, ottawa, states
two | four, zero, three, six, one
if | be, fixed, dimensional, finite, electrons
b | d, e, frac, sum, l
state | lebanon, jewry, assembly, presidents, government
often | evolved, such, various, philosophies, riding
channel | broadcasters, television, signals, cinemas, station
primarily | areas, kent, most, has, mining
gold | copper, bronze, precious, iron, wood
pressure | increases, air, heat, velocity, joint
older | patronymic, females, course, attributed, sold
freedom | degree, united, cultural, welfare, website
powers | hoped, impose, campaigning, security, constitution
joseph | alexander, eug, cult, historian, wendell
...

Epoch: 2/5
Loss:  5.561293125152588
two | four, zero, five, three, r
were | launched, vast, civilians, raids, few
six | four, eight, three, january, five
its | the, function, for

Epoch: 2/5
Loss:  5.13018274307251
war | refugees, troops, albania, corps, occupation
i | nice, just, redundant, great, p
people | americans, muslims, politically, largely, writers
by | army, ty, was, become, of
while | ability, stages, as, against, seeing
this | it, do, mommy, bundles, oersted
to | microsoft, for, been, additional, be
there | tree, depending, offices, games, counties
applications | hardware, api, integrated, version, users
shown | if, form, self, respectively, replaces
shows | stories, kermit, channel, dances, trio
ocean | coral, southwest, inland, island, islands
smith | singer, robinson, american, jeremy, howard
prince | duke, brabant, ivan, german, regent
orthodox | judaism, religions, prayers, denominations, scriptural
active | news, drafted, hiram, current, mainly
...

Epoch: 2/5
Loss:  5.630888938903809
it | a, moral, stance, the, this
the | and, in, other, only, lived
united | members, elected, presidents, efforts, elections
as | is, distinguishing, have, air, 

Epoch: 3/5
Loss:  4.702545642852783
from | between, a, or, described, within
to | or, this, which, it, for
the | this, of, it, held, three
four | three, one, five, eight, two
there | existence, itself, as, nor, common
were | did, to, him, allowing, priesthood
two | six, zero, three, one, four
use | hardware, available, system, iec, implementation
centre | tombs, architectural, buildings, garden, scenic
rise | seventh, historical, characteristics, extreme, agricultural
gold | silver, copper, cotton, worn, diana
road | highway, adjacent, urban, towns, hiking
accepted | practices, modern, agenda, permanently, given
articles | pdf, privileges, sacraments, review, legal
know | told, pain, robertson, agile, clinic
bible | testament, tanakh, biblical, oracles, judah
...

Epoch: 3/5
Loss:  4.324503421783447
only | sometimes, own, bore, by, when
eight | three, laureate, nine, one, politician
but | divine, able, into, reason, leave
are | or, other, term, referred, pre
over | addition, our, givin

Epoch: 3/5
Loss:  4.423160076141357
its | time, larger, wine, slope, threat
with | shift, make, only, a, required
where | is, an, b, about, installed
war | prisoners, aftermath, brutal, refugee, conscription
used | types, pcm, valves, effects, meanings
while | his, others, as, to, religious
d | statesman, politician, sch, dancer, gu
but | that, his, refused, so, him
square | meters, district, surrounded, football, south
mainly | largest, extending, congo, provinces, overwhelmingly
discovered | discovery, radiometric, squirrels, mrna, perfumes
professional | football, champions, teams, baseball, manchester
primarily | categories, verner, oily, european, trigram
orthodox | christianity, saints, churches, church, anglicans
derived | indo, attested, germanic, names, celtic
something | contrast, what, hearts, merely, absolutely
...

Epoch: 3/5
Loss:  3.878047466278076
up | struck, chances, punch, closing, foul
state | them, napoleonic, ayckbourn, governors, accession
of | the, in, banks, pa

Epoch: 3/5
Loss:  3.845402717590332
four | two, five, eight, three, zero
or | used, are, much, smaller, with
years | months, seven, population, votes, fifty
i | p, frac, cos, x, z
called | broken, stir, by, rightmost, aromatic
seven | nine, two, three, eight, five
nine | two, seven, ers, one, team
where | relates, rangle, image, well, underneath
ice | dry, sediments, brandy, soils, humidity
assembly | elected, legislative, legislature, cabinet, executive
powers | property, appoint, judiciary, legislative, thus
cost | improved, reducing, improvements, banking, storage
paris | et, de, warsaw, sur, hochschule
dr | academy, williams, hollywood, amy, michael
applications | microcode, memory, standard, portable, systems
centre | shore, squares, maps, occupies, surrey
...

Epoch: 3/5
Loss:  3.772275686264038
d | m, louis, codebreaking, vi, althea
were | white, legend, later, helped, yankees
new | university, brooklyn, county, college, eight
there | unlike, meaning, it, swallowing, chinese
the

Epoch: 3/5
Loss:  5.218574523925781
s | thomas, hara, where, college, as
from | trains, courthouse, late, voyager, armstrong
also | or, is, classes, invalidate, incontinence
is | x, differential, theorem, topology, also
b | eight, d, one, laureate, anthony
an | at, the, which, as, to
over | nearly, opened, discrete, join, focal
of | the, office, s, to, by
troops | militia, guerrilla, ulster, battalion, soviets
institute | carnegie, j, graduate, college, lecture
recorded | write, academically, codex, gilded, empiricist
hit | gets, song, pitch, roll, front
square | rectangle, meters, rotterdam, tallest, triangular
assembly | exercised, constitutional, decisions, quorum, elected
cost | gears, console, expensive, specifications, explosive
gold | timber, cement, silver, copper, nickel
...

Epoch: 3/5
Loss:  3.412914991378784
american | screenwriter, actor, johnson, edwin, clark
other | some, which, world, decades, investigates
world | universal, other, earths, with, heritage
eight | b, laur

Epoch: 3/5
Loss:  3.3005497455596924
at | width, firing, gauges, zero, cylindrical
be | of, notations, consists, that, hash
where | the, ordered, called, example, powered
people | thinking, madagascar, aryan, fled, indo
called | wavelength, chiral, waves, photon, cinematographers
states | president, presidential, membership, democratic, citizens
its | wheels, from, conventions, repayment, standard
into | distinguishing, developments, ho, mediterranean, plastic
except | numerals, integers, clarify, document, may
issue | ongoing, issues, term, google, productions
proposed | poincar, relation, argument, new, theory
additional | add, commonplace, beta, diagram, eth
lived | barrymore, wrote, story, literature, comics
award | miniseries, film, actors, kerry, acclaim
channel | cable, channels, tv, radio, video
bible | talmud, testament, gospels, deuterocanonical, theology
...

Epoch: 3/5
Loss:  4.0373945236206055
new | zero, zealand, graduated, and, newark
in | the, a, religious, of, foreshad

Epoch: 3/5
Loss:  3.2259480953216553
after | in, holland, portugal, east, siege
their | countryside, completely, necessary, preserve, its
about | rejecting, prescription, working, mind, currently
one | zero, five, eight, six, three
called | meiosis, off, vacuoles, capital, mitosis
more | might, thousands, beyond, independent, dawkins
used | depending, arrows, mhz, simulated, substance
zero | one, six, two, five, sq
mathematics | theorem, physics, boole, einstein, newton
mainly | muslim, ethnic, sunni, agriculture, invasions
shown | approximating, am, myself, tattoo, cm
issue | issues, citing, financial, bill, exoplanet
smith | bob, stewart, actress, journalist, anita
running | network, driver, democracy, bike, purley
accepted | transubstantiation, tribes, patriarch, interpretations, chalcedonian
operations | ordered, force, divide, numbers, police
...

Epoch: 3/5
Loss:  3.5081684589385986
time | other, relative, lost, ship, forty
are | use, verb, less, muscle, forms
or | other, cation,

Epoch: 3/5
Loss:  3.7069106101989746
other | rules, that, local, so, today
there | are, common, given, type, simply
more | irreversible, signals, conversion, highly, logical
most | used, biggest, generally, attacks, narwhal
were | ensure, some, campaign, as, called
has | for, usenet, would, process, membership
his | he, elizabeth, fellow, albright, refused
b | bilinear, y, homomorphisms, bb, circ
event | elders, evidence, seating, victories, honorum
stage | getting, movies, riffs, speed, accompaniment
active | occurs, thirds, had, supporters, concluded
units | unit, battalion, sd, army, one
primarily | trends, copra, much, various, resources
operating | proprietary, interface, virtual, databases, mainframe
joseph | wortley, leon, marshal, conrad, rudolf
running | platforms, virtual, network, pulled, pc
...

Epoch: 3/5
Loss:  4.2538557052612305
called | except, distribution, would, every, reverse
however | it, relative, way, or, the
eight | five, four, na, seven, zero
in | the, of, to, 

Epoch: 3/5
Loss:  3.8611996173858643
may | occur, that, or, they, back
while | is, certainly, if, dangerous, correctly
by | years, nine, was, cities, continued
a | to, with, for, traffic, in
state | democrats, elected, party, parliament, bank
eight | nine, one, two, three, five
five | three, nine, eight, one, m
th | bce, centuries, bc, kingdoms, prehistory
consists | types, above, length, hardwoods, belong
alternative | serialism, uqbar, introduction, web, simpler
bill | clinton, executive, senator, elected, reelection
ocean | islands, populated, bay, island, coral
heavy | captain, rocket, climates, dug, targets
magazine | journalism, films, publish, doo, incarnation
proposed | meanwhile, proposals, danger, anti, principle
woman | believe, sex, marriage, loving, walk
...

Epoch: 3/5
Loss:  4.00860595703125
been | a, an, but, however, intervention
from | six, on, has, same, also
over | with, back, in, three, offshore
years | women, male, repent, nationality, males
a | of, known, change,

Epoch: 3/5
Loss:  3.7149460315704346
than | discussed, balance, identification, traditionally, phrases
also | official, lawless, groups, provides, largo
often | are, similar, theoretical, a, particular
used | aluminium, devices, optical, electroplating, commonly
after | castle, great, officially, troops, western
with | a, as, particular, is, s
seven | two, one, nine, zero, six
for | formula, by, allows, any, the
ice | melted, cooled, basins, orissa, vanadium
prince | duchy, duke, queen, eldest, afonso
woman | child, gender, marry, never, feminism
know | everyone, asking, you, memories, answer
lived | supplication, batcave, bartholomew, undisturbed, gaugamela
professional | wrestler, hockey, baseball, aalborg, hired
existence | monotheism, universe, behave, mantra, satan
applied | scientific, axiomatization, whereas, laws, criticized
...

Epoch: 3/5
Loss:  4.722816467285156
when | vertical, was, punched, multiple, because
or | such, aluminium, lubricants, unsound, hk
but | thread, story

Epoch: 4/5
Loss:  3.342148780822754
between | spacetime, zout, dimension, divergence, quantum
d | laureate, b, statesman, politician, composer
six | five, eight, one, four, isbn
th | century, renaissance, hindu, palace, frescoes
of | in, the, as, church, tabernacle
nine | one, five, seven, eight, american
history | reading, academic, biography, dramatists, fiction
as | of, texts, in, writings, roman
resources | economics, directory, free, geology, database
joseph | b, composer, american, physicist, observances
older | better, beli, wray, load, programmer
orthodox | church, scripture, oriental, religions, faithful
behind | rid, thumb, flaw, neutral, bounds
applied | veiled, precise, cartridges, broadly, differing
bible | biblical, moses, divine, scriptures, translators
pressure | spectrum, shocks, pressures, controls, aligned
...

Epoch: 4/5
Loss:  3.4080300331115723
united | president, rights, cfa, panama, promoted
history | biography, concepts, harry, reading, stephen
a | to, with, th

Epoch: 4/5
Loss:  3.8690831661224365
history | wiki, links, retrieved, york, geography
about | grassland, coastline, places, ridges, climate
some | ibos, removing, those, term, belief
many | traditions, religious, fewer, umbrella, using
from | of, down, to, at, and
war | nazi, rebels, guerrilla, soviet, allies
by | and, common, at, also, included
often | legalization, tactics, with, avoid, might
grand | abbey, near, polgar, tourists, floors
test | cricket, points, acquisition, beacons, canberra
woman | story, trick, disapproving, smile, makin
something | nonsense, essence, describing, saying, fuck
file | url, audio, accesses, rfc, compression
question | dogmatic, theologically, unclear, doctrine, undermines
road | roads, connects, bay, airports, railway
quite | octaves, partially, difference, classified, handling
...

Epoch: 4/5
Loss:  4.292620658874512
people | culture, news, from, jiang, across
states | federation, territorial, branches, united, commission
between | pair, slanted, eq

Epoch: 4/5
Loss:  4.1529741287231445
he | inhuman, accomplishments, his, mysteries, hope
is | used, compact, it, usually, object
has | a, equal, these, be, are
called | the, and, using, input, divided
will | cash, thereby, positron, mond, cycle
a | and, with, has, some, special
up | mame, zombie, these, similarly, could
was | national, october, eight, no, ii
woman | wife, children, protagonist, concubine, confessed
animals | animal, fish, mammals, insects, insect
notes | article, theory, cryptanalysis, astrophysical, flattened
consists | number, filter, districts, extending, unpaired
older | aged, females, poverty, median, area
pressure | expensive, reactive, leak, chloride, oxygen
prince | duc, neburg, pomerania, elizabeth, mamet
mean | values, numbers, mathrm, arithmetic, mathcal
...

Epoch: 4/5
Loss:  3.2860350608825684
b | c, f, mathbb, p, r
zero | five, two, nine, four, eight
i | cdots, current, mathrm, y, infinity
these | all, development, destabilized, fighting, thing
american |

Epoch: 4/5
Loss:  4.3898820877075195
or | found, can, detection, simply, specified
his | man, her, hear, he, was
d | e, physiologist, vicomte, c, n
an | within, relationship, act, self, development
during | was, genetic, died, pows, sortavala
was | and, down, oates, carter, her
s | nine, one, john, two, gordon
see | usually, though, rarely, site, classical
animals | insects, animal, mammals, eating, larvae
experience | emotion, criteria, rigorous, facts, intellect
notes | chord, recordings, greatly, studies, grasshopper
centre | shore, newcastle, boroughs, inland, brighton
prince | gloucester, municipality, marie, arch, labrador
shows | sets, unique, words, shown, create
account | reasonably, famine, whosoever, baseman, quantifier
police | constabulary, militias, convicted, positions, commander
...

Epoch: 4/5
Loss:  4.286561965942383
five | four, zero, seven, two, nine
which | scent, long, although, bipedal, sensory
into | electrode, oxide, instead, insulating, macroblock
have | apply

Epoch: 4/5
Loss:  4.076704502105713
may | however, those, requires, has, judged
no | or, yes, follows, faster, arbitrary
but | those, he, does, opposition, believe
not | it, choose, harmony, secular, and
into | its, the, to, been, it
was | august, emperor, england, lord, and
when | have, chemical, to, than, reaction
has | the, of, may, to, cce
older | median, households, spread, total, family
instance | misleading, be, are, gamma, is
construction | building, located, connect, unusable, installation
proposed | cognitivism, initial, trials, strongly, stratified
liberal | democratic, democrats, conservatism, libertarian, liberals
orthodox | orthodoxy, catholics, baptism, communion, anglicans
gold | excellent, plastics, silver, cement, plated
powers | democracy, constitutional, participation, clauses, judiciary
...

Epoch: 4/5
Loss:  2.5733327865600586
where | performing, normalized, square, indicates, axis
five | two, zero, six, seven, four
it | not, less, rather, or, as
most | is, differ

Epoch: 4/5
Loss:  4.687714576721191
who | openly, hunting, litovsk, father, sons
d | l, henri, w, ivan, claude
between | isolated, east, scandinavia, visible, strongly
has | indexing, is, lower, given, whose
their | present, can, would, second, public
will | able, ability, they, ends, typically
system | extended, systems, designed, precise, unix
where | crest, rome, altars, east, forbidden
something | individual, man, questioned, platonist, tale
mainly | aids, neonatal, stabilize, the, vivisection
bill | owen, democrat, presidents, detroit, judiciary
scale | instruments, levels, global, measurements, compared
units | gigabyte, joule, canadian, largest, studios
powers | appointed, matters, appoint, mandate, democracy
magazine | website, dvd, cgi, comics, debuted
alternative | how, stand, inclusion, guidelines, delimit
...

Epoch: 4/5
Loss:  3.9680593013763428
than | higher, more, though, like, fiber
after | fought, formally, resulted, sadd, relationship
so | performed, reasons, order, l

Epoch: 4/5
Loss:  3.3942055702209473
also | of, are, network, spatial, frame
seven | six, five, one, eight, zero
history | similarities, philosophy, ad, origins, isbn
however | because, be, directly, over, than
would | that, eukaryotes, instance, majority, replicate
will | retrieve, than, expensive, so, whatever
eight | seven, one, three, six, two
united | government, citizens, legislative, nations, treaties
experience | addiction, experiences, recalled, psychotherapy, continuing
frac | phi, cdot, cos, mathrm, equiv
issue | agreement, kyoto, resources, negotiations, un
numerous | become, manufacturing, assyrians, lands, large
dr | isbn, randall, johnson, physicist, garland
smith | ellen, cullen, johnson, denis, isbn
bible | christ, christians, hebrews, corinthians, testament
defense | personnel, military, police, organization, agency
...

Epoch: 4/5
Loss:  4.479379653930664
first | small, the, time, commonwealth, today
is | the, of, or, epimorphism, does
where | can, differs, they, bec

Epoch: 4/5
Loss:  5.097496509552002
he | his, against, emma, divorced, returned
first | nine, dortmund, rendition, only, point
called | or, greek, a, refer, sometimes
into | repeatedly, boundaries, result, bordering, that
during | resentment, albanians, uprising, emerged, immediate
often | some, very, rarely, mainly, or
has | s, a, other, keeping, still
world | airways, haskalah, development, afghanistan, industrialized
something | asks, let, boys, t, cry
running | rivers, gerald, mike, herbert, bridge
shown | value, equal, run, must, nag
centre | station, west, tower, saarc, municipalities
square | coastline, tallest, territorial, carolina, petropavlovsk
quite | minimum, simply, rarely, saddle, merely
bible | scriptures, biblical, vulgate, testament, septuagint
nobel | prize, laureate, chemist, ernest, psychologist
...

Epoch: 4/5
Loss:  3.946023941040039
when | occasionally, forced, early, it, family
if | possible, rather, alternatively, pantheistic, value
its | in, is, over, and, in

Epoch: 4/5
Loss:  4.211816310882568
time | some, takes, girl, shocks, nesuhi
there | until, does, rest, due, allow
two | zero, four, one, nine, eight
d | four, mathematician, one, six, b
united | grenada, guatemala, nations, cuba, council
can | easily, simple, any, aliasing, such
or | such, retina, any, depends, cranial
who | astruc, vandals, fellow, antigonus, them
creation | political, fundamentally, polytheistic, nafta, fundamental
test | tests, clitoridotomy, thyroiditis, proteomics, patient
report | environmental, census, percent, accounted, https
road | parking, london, tunnels, traffic, train
question | debates, seeking, arguing, attitudes, learning
brother | nephew, teresa, grandmother, died, wife
marriage | married, widowed, intercourse, daughters, adultery
versions | minix, files, desktop, filename, graphics
...

Epoch: 4/5
Loss:  4.046407222747803
but | be, although, being, those, preoccupied
and | in, present, which, the, over
world | monaco, european, site, northern, scarp

Epoch: 5/5
Loss:  2.4591846466064453
the | and, it, a, only, was
while | but, understand, is, changing, those
to | an, all, part, these, only
their | more, to, for, and, avoided
had | his, fool, supposed, sassanid, after
two | six, seven, five, three, zero
used | stroke, injection, common, database, equipment
called | dynamically, injective, functions, representations, it
professional | championships, volleyball, manager, wrestler, association
powers | election, continuity, imperium, kingdoms, parliamentary
experience | profound, meditation, induce, generation, developing
event | might, original, standing, clear, problems
universe | multiverse, singularity, phenomena, cosmological, cosmic
additional | office, retained, won, optionally, council
governor | democrat, chief, appointed, justices, portillo
creation | occult, exploitation, indonesian, online, other
...

Epoch: 5/5
Loss:  3.2834997177124023
one | eight, five, nine, politician, four
their | to, most, whether, by, that
four | ze

Epoch: 5/5
Loss:  5.9227213859558105
was | the, and, s, she, became
over | merchant, kingdom, provides, financial, which
time | by, biruni, hidden, meant, tomb
united | banking, breaching, rights, protea, hampshire
so | thoughts, substance, everything, analogy, opposite
many | that, seen, such, than, a
where | ceremony, bahasa, andalus, albert, restored
often | nestorian, abstain, racist, sects, zwi
existence | explain, theism, historicist, afterlife, eschatology
http | www, com, htm, org, html
event | attempted, relocating, statistical, leave, corrie
stage | launch, engines, launched, duo, beers
san | francisco, vegas, cruz, dakota, houston
liberal | conservatives, coalition, libertarian, liberals, reform
bill | jockeys, american, activist, gerald, milligan
except | entries, represent, acropolis, truncated, handwriting
...

Epoch: 5/5
Loss:  3.9127228260040283
would | spot, firing, times, but, her
more | to, half, including, development, operated
three | two, five, zero, eight, seven


Epoch: 5/5
Loss:  3.484099864959717
to | the, on, and, nine, for
this | allow, between, fide, levels, make
people | cameroons, expelled, official, region, to
there | or, is, in, usage, some
between | provinces, wars, until, this, to
more | over, less, to, for, population
the | to, of, also, after, by
can | levels, is, do, subjective, not
smith | andrew, jones, david, joseph, starr
liberal | conservatives, government, coalitions, conservative, opposition
numerous | pillars, to, industries, china, throughout
bible | tanakh, testament, torah, isaiah, nevi
universe | baryons, cosmological, nucleosynthesis, galaxies, conserved
arts | colleges, academy, schools, confucius, martial
orthodox | catholicism, denominations, apostolic, catholic, orthodoxy
event | milestone, impair, destroyed, tunguska, attempted
...

Epoch: 5/5
Loss:  3.746878147125244
his | he, was, sir, memoirs, professors
the | of, as, had, in, at
is | be, all, math, to, a
united | international, security, affairs, british, pea

Epoch: 5/5
Loss:  3.6533210277557373
people | students, kuomintang, official, atheists, etonians
use | variations, purpose, pdp, design, require
only | is, each, means, recursively, in
or | fact, offense, hit, have, betting
where | each, arrangement, before, how, cookies
was | s, at, established, england, despot
up | by, scott, thirty, in, pulled
the | a, between, of, in, to
report | annual, assessment, content, wiki, cnn
hit | roll, earnhardt, yankee, win, scoreless
universe | cosmological, singularity, cosmology, cosmologists, creationism
writers | novelists, illustrators, lgbt, comics, weblog
gold | silver, sheet, jewelry, materials, coins
channel | cable, signals, tv, television, broadcast
ice | water, dry, acidic, melts, hot
dr | isbn, ralph, hopkins, malden, attended
...

Epoch: 5/5
Loss:  3.981797695159912
one | five, three, eight, two, six
more | by, formed, short, are, prolonged
nine | zero, four, five, six, one
use | powder, brine, rheumatoid, variations, pdp
is | are, only, 

Epoch: 5/5
Loss:  4.217324256896973
who | that, cromwellian, were, fabrizio, a
six | zero, four, eight, five, c
a | to, of, the, with, property
of | the, seven, a, one, in
s | nine, tribute, one, brother, first
be | integral, however, problem, formula, allows
is | defined, spectral, of, e, energy
if | suppose, then, must, impossible, be
test | surgical, dive, certification, risk, tests
mathematics | mathematicians, theorems, equations, properties, mathematical
numerous | assyria, zealots, khedive, s, please
marriage | niece, sisters, elector, boleyn, luther
paris | france, lyon, palace, sur, ville
channel | tv, cable, mbit, catv, mxy
joseph | louise, nicholas, horatio, walter, nine
ocean | kilometers, coastal, plateau, tributaries, subtropical
...

Epoch: 5/5
Loss:  3.9425337314605713
state | equations, democracy, relations, general, fructose
system | application, practical, computers, an, cpu
see | photon, distance, computation, applying, solar
not | these, are, by, or, become
only | 

Epoch: 5/5
Loss:  2.7496798038482666
after | not, death, antipopes, pius, bledsoe
at | seven, university, he, four, visited
has | the, though, to, of, only
zero | seven, two, four, five, eight
united | country, military, trade, airport, u
d | l, sident, physiologist, k, georgi
will | joram, order, take, purely, what
into | types, effect, derived, atoms, liberating
taking | finally, therapist, promotion, appropriate, boomerangs
hold | must, hypocrisy, their, within, righteousness
channel | cable, broadcast, shortwave, transmitters, broadband
joseph | impressed, d, alexander, composer, walter
magazine | games, cgi, sold, developer, comics
ocean | arctic, atlantic, intelsat, windward, nicobar
professional | women, amateur, profit, graduate, varsity
http | www, htm, com, org, portal
...

Epoch: 5/5
Loss:  4.310762882232666
after | bledsoe, bibulus, everlasting, gave, bologna
or | substituents, simply, not, the, cases
only | of, thus, evolve, has, using
however | instances, say, term, since

Epoch: 5/5
Loss:  4.405395984649658
used | different, derive, pronunciation, usage, languages
this | known, caste, offer, etiology, the
seven | two, six, three, eight, five
would | another, on, potential, regained, ever
is | not, meaning, often, non, kinds
where | into, field, then, hilbert, line
up | it, have, foot, daugava, sneaking
had | after, been, kitty, renamed, reestablishing
mainly | dispersed, beaker, bitterness, meo, bua
brother | mcduck, wife, daughter, elder, matilda
ocean | coast, nicobar, km, islands, atlantic
magazine | journals, publisher, interview, novels, animator
something | feeling, strange, feels, very, rude
versions | version, ported, mouse, proprietary, syndication
liberal | conservative, leaders, nationalist, liberalism, parties
bible | tanakh, biblical, ezra, esdras, nevi
...

Epoch: 5/5
Loss:  4.360540390014648
s | a, development, railway, demographically, during
system | integrated, systems, newer, interconnected, zero
is | if, to, single, howto, as
so | th

Epoch: 5/5
Loss:  5.03341817855835
no | t, script, label, like, you
this | still, widely, and, sometimes, doubt
war | surrender, military, panzer, forces, nazi
about | may, at, man, her, his
five | zero, one, nine, six, three
these | frequently, single, a, lacks, specific
on | from, the, nine, man, retrial
he | his, in, face, from, a
experience | cause, moral, questions, drugs, emotional
mean | horizon, climatology, probability, vector, graph
freedom | united, citizen, policy, school, prohibition
behind | stuntman, harris, alcs, closest, devdas
scale | inflation, slide, viaducts, cheddar, workforce
road | cars, biography, front, sumner, york
engine | engines, fuel, piston, turbines, thrust
ocean | atlantic, islands, coast, bordering, inland
...

Epoch: 5/5
Loss:  4.428940296173096
up | already, entire, gina, revenues, finnish
four | five, two, zero, july, links
these | are, a, increasingly, frequently, for
or | is, may, the, pythons, either
history | one, historian, heavily, eastern, h

Epoch: 5/5
Loss:  4.5938029289245605
two | four, zero, six, seven, three
only | number, some, is, substantial, of
up | did, side, having, entire, hamon
th | century, east, western, history, asia
years | est, july, birth, male, age
for | national, self, the, brazzaville, soon
three | one, zero, four, eight, seven
eight | six, four, nine, seven, three
hit | pitcher, hits, offseason, yankee, hitter
universe | cosmological, newton, hubble, theory, relativity
powers | assisting, subjugation, gains, entente, temporary
mean | sum, harmonic, graph, equivalently, variable
pressure | internal, water, armies, temperature, explode
woman | wife, girl, wear, women, parents
writers | novelists, journal, etonians, satire, artists
construction | harbors, corridor, residences, skyscrapers, railway
...

Epoch: 5/5
Loss:  3.4299685955047607
three | seven, nine, one, eight, zero
th | asia, history, east, roman, thereafter
while | struggle, allegation, relieve, coat, show
who | he, him, devote, debater, cou

In [17]:
torch.save(model, "negative_sampling.pth")

In [None]:
model = torch.load("negative_sampling.pth")

## Visualizing the word vectors

Below we'll use T-SNE to visualize how our high-dimensional word vectors cluster together. T-SNE is used to project these vectors into two dimensions while preserving local stucture. Check out [this post from Christopher Olah](http://colah.github.io/posts/2014-10-Visualizing-MNIST/) to learn more about T-SNE and other ways to visualize high-dimensional data.

In [34]:
%matplotlib inline
%config InlineBackend.figure_format = 'png'

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

In [19]:
# getting embeddings from the embedding layer of our model, by name
embeddings = model.in_embed.weight.to('cpu').data.numpy()

In [27]:
viz_words = 380
tsne = TSNE()
embed_tsne = tsne.fit_transform(embeddings[:viz_words, :])



In [None]:
fig, ax = plt.subplots(figsize=(12, 12))
for idx in range(viz_words):
    plt.scatter(*embed_tsne[idx, :], color='steelblue')
    plt.annotate(int_to_vocab[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)