"FastText" Pytorch Implementation of the paper :

Bag of Tricks for Efficient Text Classification

"https://arxiv.org/abs/1607.01759"

In [None]:
!pip install torchtext --upgrade

Collecting torchtext
[?25l  Downloading https://files.pythonhosted.org/packages/f2/17/e7c588245aece7aa93f360894179374830daf60d7ed0bbb59332de3b3b61/torchtext-0.6.0-py3-none-any.whl (64kB)
[K     |████████████████████████████████| 71kB 2.1MB/s 
Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 7.4MB/s 
Installing collected packages: sentencepiece, torchtext
  Found existing installation: torchtext 0.3.1
    Uninstalling torchtext-0.3.1:
      Successfully uninstalled torchtext-0.3.1
Successfully installed sentencepiece-0.1.91 torchtext-0.6.0


Importing Libraries:

In [None]:
import torch
from torchtext import data
from torchtext import datasets

SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True


Generate Trigrams:

In [None]:
def generate_trigrams(x):
    n_grams = set(zip(*[x[i:] for i in range(3)]))
    for n_gram in n_grams:
        x.append(' '.join(n_gram))
    return x

Example:

In [None]:
generate_trigrams(['I ', 'have', 'watched', 'a', 'lot','of', 'TV','in', 'my', 'life'])

['I ',
 'have',
 'watched',
 'a',
 'lot',
 'of',
 'TV',
 'in',
 'my',
 'life',
 'a lot of',
 'of TV in',
 'lot of TV',
 'in my life',
 'have watched a',
 'TV in my',
 'watched a lot',
 'I  have watched']

In [None]:

TEXT = data.Field(tokenize = 'spacy', preprocessing = generate_trigrams)
LABEL = data.LabelField(dtype = torch.float)

Loading IMDB dataset and splitinf into test and train 

In [None]:
import random

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

train_data, valid_data = train_data.split(random_state = random.seed(SEED))

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:07<00:00, 11.0MB/s]


Building Vocabulary and Loading Pre-trained Word Embedings

In [None]:
MAX_VOCAB_SIZE = 20_000

TEXT.build_vocab(train_data, 
                 max_size = MAX_VOCAB_SIZE, 
                 vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)

LABEL.build_vocab(train_data)

In [None]:
BATCH_SIZE = 64

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_iterator, valid_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, valid_data, test_data), 
    batch_size = BATCH_SIZE, 
    device = device)

Model Building:


In [None]:
import torch.nn as nn
import torch.nn.functional as F

class FastText(nn.Module):
    def __init__(self, vocab_size, embedding_dim, output_dim, pad_idx):
        
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=pad_idx)
        
        self.fc = nn.Linear(embedding_dim, output_dim)
        
    def forward(self, text):
        
        #text = [sent len, batch size]
        
        embedded = self.embedding(text)
                
        #embedded = [sent len, batch size, emb dim]
        
        embedded = embedded.permute(1, 0, 2)
        
        #embedded = [batch size, sent len, emb dim]
        
        pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)).squeeze(1) 
        
        #pooled = [batch size, embedding_dim]
                
        return self.fc(pooled)

In [None]:
INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
OUTPUT_DIM = 1
PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]

model = FastText(INPUT_DIM, EMBEDDING_DIM, OUTPUT_DIM, PAD_IDX)

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 2,000,301 trainable parameters


In [None]:
#copying Pre-trained vectors to the embeding layers:

pretrained_embeddings = TEXT.vocab.vectors

model.embedding.weight.data.copy_(pretrained_embeddings)



tensor([[-0.4811,  1.0463,  1.9048,  ..., -1.2565, -0.9292, -1.7791],
        [-0.0812, -0.1946, -0.8601,  ..., -1.5480,  0.1313, -2.4751],
        [-0.0382, -0.2449,  0.7281,  ..., -0.1459,  0.8278,  0.2706],
        ...,
        [ 2.0467,  0.6702, -2.2263,  ...,  0.5931,  0.0623,  1.4688],
        [ 0.0219, -1.7445, -0.3752,  ..., -0.9976, -0.5819, -0.2369],
        [ 0.2297, -0.1678,  0.3385,  ...,  0.6222, -0.9155,  1.6893]])

In [None]:
#Put Zeros to padding and Unknown tokens:

UNK_IDX = TEXT.vocab.stoi[TEXT.unk_token]

model.embedding.weight.data[UNK_IDX] = torch.zeros(EMBEDDING_DIM)
model.embedding.weight.data[PAD_IDX] = torch.zeros(EMBEDDING_DIM)



Model Training:

In [None]:
import torch.optim as optim

optimizer = optim.Adam(model.parameters())

In [None]:
criterion = nn.BCEWithLogitsLoss()

model = model.to(device)
criterion = criterion.to(device)

Calculating Accuracy

In [None]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 7/10 right, this returns 0.7, NOT 7
    """

    #round predictions to the closest integer
    rounded_preds = torch.round(torch.sigmoid(preds))
    correct = (rounded_preds == y).float() #convert into float for division 
    acc = correct.sum() / len(correct)
    return acc

In [None]:
def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()
        
        predictions = model(batch.text).squeeze(1)
        
        loss = criterion(predictions, batch.label)
        
        acc = binary_accuracy(predictions, batch.label)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
def evaluate(model, iterator, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.eval()
    
    with torch.no_grad():
    
        for batch in iterator:

            predictions = model(batch.text).squeeze(1)
            
            loss = criterion(predictions, batch.label)
            
            acc = binary_accuracy(predictions, batch.label)

            epoch_loss += loss.item()
            epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
%%time
N_EPOCHS = 10

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

    
    train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
    
    
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'tut3-model.pt')
    
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc*100:.2f}%')

	Train Loss: 0.386 | Train Acc: 88.46%
	 Val. Loss: 0.419 |  Val. Acc: 85.37%
	Train Loss: 0.353 | Train Acc: 89.43%
	 Val. Loss: 0.430 |  Val. Acc: 85.90%
	Train Loss: 0.325 | Train Acc: 90.28%
	 Val. Loss: 0.445 |  Val. Acc: 86.47%
	Train Loss: 0.303 | Train Acc: 90.79%
	 Val. Loss: 0.457 |  Val. Acc: 86.80%
	Train Loss: 0.281 | Train Acc: 91.41%
	 Val. Loss: 0.469 |  Val. Acc: 87.26%
	Train Loss: 0.265 | Train Acc: 91.83%
	 Val. Loss: 0.487 |  Val. Acc: 87.50%
	Train Loss: 0.251 | Train Acc: 92.23%
	 Val. Loss: 0.498 |  Val. Acc: 88.00%
	Train Loss: 0.236 | Train Acc: 92.68%
	 Val. Loss: 0.511 |  Val. Acc: 88.23%
	Train Loss: 0.222 | Train Acc: 93.18%
	 Val. Loss: 0.526 |  Val. Acc: 88.31%
	Train Loss: 0.210 | Train Acc: 93.59%
	 Val. Loss: 0.538 |  Val. Acc: 88.52%
CPU times: user 1min 57s, sys: 18.5 s, total: 2min 15s
Wall time: 2min 16s


Test data:

In [None]:
model.load_state_dict(torch.load('tut3-model.pt'))

test_loss, test_acc = evaluate(model, test_iterator, criterion)

print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc*100:.2f}%')


Test Loss: 0.410 | Test Acc: 86.03%


User Input:

I have taken some reviews from IMDB both Low rating and High rating for the series "Breaking Bad"

In [None]:
import spacy
nlp = spacy.load('en')

def predict_sentiment(model, sentence):
    model.eval()
    tokenized = generate_trigrams([tok.text for tok in nlp.tokenizer(sentence)])
    indexed = [TEXT.vocab.stoi[t] for t in tokenized]
    tensor = torch.LongTensor(indexed).to(device)
    tensor = tensor.unsqueeze(1)
    prediction = torch.sigmoid(model(tensor))
    if prediction.item() < 0.5:
       print("Prediction :", prediction.item(), "and its a Negative Review") 
    else: 
       print("Prediction :", prediction.item(), "and its a Positive Review")
    return  



Negative Example:

In [None]:
predict_sentiment(model, "I decided to watch this after hearing how good it was - I hadn't watched any of it by the time the last episode had aired.After watching the pilot I thought that the show had potential and was looking forward to getting into a new show - (having just finished Dexter and whilst The Walking Dead and Game of Thrones were both on a break).The short first season spent its time building us up for nothing to actually happen. But as it was quite short and all I had heard was how amazing this show is I opened up Season 2. This was very much of the same - every episode building up to something but not much actually happening.Admittedly Season 3 was a bit better but still quite boring in comparison to many people telling me it's the best TV show ever. In all honesty I haven't bothered with the last 2 seasons as I feel that I have lost several hours of my life watching the first 3 and it's not worth putting the effort in watching the rest. There are plenty of TV shows worth watching before this one. I cannot honestly see why this program has high reviews.")

Prediction : 0.4620707631111145 and its a Negative Review


In [None]:
predict_sentiment(model, "Finally was convinced to watch the show due to all the hype. Was watching with the wife and reached the third season...wow, it felt like forever. After an episode in season 3 i just came out and told the wifey I can't do this anymore, I can't watch this. She looked me in my eyes and the relief I saw was heartwarming, she agreed! I seriously don't know what all that hype was about!")

Prediction : 0.1796533465385437 and its a Negative Review


Positive Example:

In [None]:
predict_sentiment(model, "It was very good up to the second season, but the third just spoiled it. Slow paced AF, crappy explanations, ending pretty meh. Random exponential complexity. The multiple worlds stuff, introduced on 3rd season, sucks and really made me lose interest.")

Prediction : 0.9168309569358826 and its a Positive Review


In [None]:
predict_sentiment(model, "I highly recommend this show. I don't want to compare it to any other show but it reminded me of Twin Peaks in terms of its darkness. Each episode raises audiences' suspense, which is a good thing. However, you should note every character's name on a paper, prepare a family tree otherwise it will be harder to remember. Great show!!")

Prediction : 0.9997377991676331 and its a Positive Review


Refernce:

https://arxiv.org/abs/1607.01759

http://bentrevett.com