## Natural Language Process

### Bag of word

In [1]:
import torch
from torch import nn, optim
from torch.autograd import Variable
import torch.nn.functional as F

The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typically, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as pretraining embeddings. It almost always helps performance a couple of percent.

In [2]:
CONTEXT_SIZE = 2 # 2 words to the left, 2 to the right
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

In [3]:
vocab = set(raw_text)
word_to_idx = {word: i for i, word in enumerate(vocab)}

data = []
for i in range(CONTEXT_SIZE, len(raw_text) - CONTEXT_SIZE):
    context = [raw_text[i-2], raw_text[i-1], raw_text[i+1], raw_text[i+2]]
    target = raw_text[i]
    data.append((context, target))

In [4]:
class CBOW(nn.Module):
    def __init__(self, n_word, n_dim, context_size):
        super(CBOW, self).__init__()
        self.embedding = nn.Embedding(n_word, n_dim)
        self.project = nn.Linear(n_dim, n_dim, bias=False)
        self.linear1 = nn.Linear(n_dim, 128)
        self.linear2 = nn.Linear(128, n_word)
        
    def forward(self, x):
        x = self.embedding(x)
        x = self.project(x)
        x = torch.sum(x, 0, keepdim=True)
        x = self.linear1(x)
        x = F.relu(x, inplace=True)
        x = self.linear2(x)
        x = F.log_softmax(x)
        return x

In [5]:
model = CBOW(len(word_to_idx), 100, CONTEXT_SIZE)
if torch.cuda.is_available():
    model = model.cuda()

In [6]:
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-3)

In [7]:
for epoch in range(100):
    print(f'epoch {epoch}')
    print('*' * 10)
    running_loss = 0
    for word in data:
        context, target = word
        context = Variable(torch.LongTensor([word_to_idx[i] for i in context]))
        target = Variable(torch.LongTensor([word_to_idx[target]]))
        if torch.cuda.is_available():
            context = context.cuda()
            target = target.cuda()
        # forward
        out = model(context)
        loss = criterion(out, target)
        running_loss += loss.data[0]
        # backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Loss: {running_loss/len(data):.6f}')

epoch 0
**********


  app.launch_new_instance()


Loss: 4.016587
epoch 1
**********
Loss: 3.947102
epoch 2
**********
Loss: 3.879901
epoch 3
**********
Loss: 3.814908
epoch 4
**********
Loss: 3.751083
epoch 5
**********
Loss: 3.687998
epoch 6
**********
Loss: 3.625721
epoch 7
**********
Loss: 3.563147
epoch 8
**********
Loss: 3.500686
epoch 9
**********
Loss: 3.438193
epoch 10
**********
Loss: 3.375107
epoch 11
**********
Loss: 3.311542
epoch 12
**********
Loss: 3.247390
epoch 13
**********
Loss: 3.182537
epoch 14
**********
Loss: 3.116996
epoch 15
**********
Loss: 3.051152
epoch 16
**********
Loss: 2.985310
epoch 17
**********
Loss: 2.919339
epoch 18
**********
Loss: 2.853139
epoch 19
**********
Loss: 2.786763
epoch 20
**********
Loss: 2.720018
epoch 21
**********
Loss: 2.653478
epoch 22
**********
Loss: 2.586878
epoch 23
**********
Loss: 2.519999
epoch 24
**********
Loss: 2.453214
epoch 25
**********
Loss: 2.385933
epoch 26
**********
Loss: 2.318097
epoch 27
**********
Loss: 2.250030
epoch 28
**********
Loss: 2.181800
epoch 29
*****