<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ</b></h3>

---

# Задание 3

## Классификация текстов

В этом задании вам предстоит попробовать несколько методов, используемых в задаче классификации, а также понять насколько хорошо модель понимает смысл слов и какие слова в примере влияют на результат.

In [None]:
import numpy as np
import pandas as pd
import torch

# .legacy added to solve the dependency error
from torchtext.legacy import datasets
from torchtext.legacy.data import Field, LabelField, BucketIterator

from torchtext.vocab import Vectors, GloVe

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
from tqdm.autonotebook import tqdm

В этом задании мы будем использовать библиотеку torchtext. Она довольна проста в использовании и поможет нам сконцентрироваться на задаче, а не на написании Dataloader-а.

In [None]:
TEXT = Field(sequential=True, lower=True, include_lengths=True)  # Поле текста
LABEL = LabelField(dtype=torch.float)                            # Поле метки

In [None]:
SEED = 1234

torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

Датасет на котором мы будем проводить эксперементы это комментарии к фильмам из сайта IMDB.

In [None]:
train, test = datasets.IMDB.splits(TEXT, LABEL)             # load the dataset
train, valid = train.split(random_state=random.seed(SEED))  # split into test and train

In [None]:
TEXT.build_vocab(train)
LABEL.build_vocab(train)

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, valid_iter, test_iter = BucketIterator.splits(
    (train, valid, test), 
    batch_size = 64,
    sort_within_batch = True,
    device = device
)

Let's explore what we have in the iterator.

In [None]:
for batch in train_iter:
    print("Embedded features dimentions: \n[sent len, batch size] =", batch.text[0].shape, batch.text[0].device)
    print()
    print("Text lengths for all elements in the batch:\n", batch.text[1].cpu(), batch.text[1].cpu().device)
    print()
    print("Batch labels:\n", batch.label)
    break

Embedded features dimentions: 
[sent len, batch size] = torch.Size([114, 64]) cuda:0

Text lengths for all elements in the batch:
 tensor([114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114,
        114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114, 114,
        114, 114, 114, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113,
        113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113, 113,
        113, 113, 113, 113, 113, 113, 113, 113]) cpu

Batch labels:
 tensor([1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1.,
        0., 1., 1., 0., 0., 1., 0., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 1.,
        0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 0.,
        0., 1., 1., 1., 1., 0., 1., 1., 1., 0.], device='cuda:0')


Sentence length is the max sent length in the batch. You can check for yiourself that this is the maximum of the lengths outputted in text lengths. **Note,** we need to put the `text_length` on `cpu`, see: https://github.com/pytorch/pytorch/issues/43227

## RNN

Для начала попробуем использовать рекурентные нейронные сети. На семинаре вы познакомились с GRU, вы можете также попробовать LSTM. Можно использовать для классификации как hidden_state, так и output последнего токена.

In [None]:
class RNNBaseline(nn.Module):
    def __init__(self,
                 vocab_size, embedding_dim, hidden_dim, output_dim,
                 n_layers, bidirectional, dropout, pad_idx):
        
        super().__init__()
        self.bidirectional = bidirectional
        self.dropout = dropout

        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        
        # YOUR CODE GOES HERE
        self.rnn = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, 
                           num_layers=n_layers, dropout=dropout, bidirectional=bidirectional)
        
        # YOUR CODE GOES HERE
        if self.bidirectional:
            self.fc = nn.Linear(2*hidden_dim, output_dim)
        else:
            self.fc = nn.Linear(hidden_dim, output_dim)
        
        
    def forward(self, text, text_lengths):
        
        # text dims      [sent len, batch size]
        # embedded dims  [sent len, batch size, emb dim]
        embedded = self.embedding(text)
        
        # pack sequence
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths.cpu())
        
        # cell arg for LSTM, remove for GRU
        packed_output, (hidden, cell) = self.rnn(packed_embedded)
        # unpack sequence
        output, output_lengths = nn.utils.rnn.pad_packed_sequence(packed_output)  

        # output dims   [sent len, batch size, hid dim * num directions]
        # output over padding tokens are zero tensors
        
        # hidden dims   [num layers * num directions, batch size, hid dim]
        # cell dims     [num layers * num directions, batch size, hid dim]
        
        # concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers
        # recall how the biderectional network looks to understand why we get indexes -1 and -2
        # hidden dims both cases [batch size, hid dim * num directions]
        if self.bidirectional:
            hidden = torch.cat([hidden[-2,:,:], hidden[-1,:,:]], dim=1)
        else:
            # only one direction, take the last one
            hidden = hidden[-1,:,:]


        # and apply dropout (p - probability to zero-out an element of an input tensor)
        hidden = nn.Dropout(p=self.dropout)(hidden)
            
        return self.fc(hidden)

Поиграйтесь с гиперпараметрами

In [None]:
# # model hyperparameters
# vocab_size = len(TEXT.vocab)
# emb_dim = 100
# hidden_dim = 256
# output_dim = 1
# n_layers = 2
# bidirectional = True
# dropout = 0.2
# PAD_IDX = TEXT.vocab.stoi[TEXT.pad_token]


Обучите сетку! Используйте любые вам удобные инструменты, Catalyst, PyTorch Lightning или свои велосипеды.

In [None]:
def training_rnn(model, criterion, optimizer, max_grad_norm,
             max_epochs, patience,
             train_iter, valid_iter, device="cpu"):
    # note, criterion assumes reduction='sum' to account for batches of different sizes


    # -------------------------------------------------------------------------
    # Traininng
    # -------------------------------------------------------------------------
    min_loss = np.inf
    cur_patience = 0

    for epoch in range(1, max_epochs + 1):
        train_loss = 0.0
        train_objs = 0

        model.train()
        pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        for it, batch in pbar: 
            
                optimizer.zero_grad()
                input_embeds = batch.text[0].to(device)
                text_lengths = batch.text[1].to(device)
                labels = torch.unsqueeze(batch.label, 1).to(device)

                prediction = model(input_embeds, text_lengths)
                loss = criterion(prediction, labels)
                loss.backward()

                train_loss += loss.item()

                # new step: gradient clipping
                if max_grad_norm is not None:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
                
                # update the gradients
                optimizer.step()
                
                # display statistics
                pbar.set_description(f"Epoch {epoch}. Train Loss: {loss.item() / len(labels):.4}")

                # record how many samples we had in this batch
                train_objs += len(labels)

        train_loss /= train_objs

        # ---------------------------------------------------------------------
        # Validation
        # ---------------------------------------------------------------------
        valid_loss = 0.0
        valid_objs = 0

        pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        # test mode and no gradient calculation needed
        model.eval()
        with torch.no_grad():
            for it, batch in pbar:
                input_embeds = batch.text[0].to(device)
                text_lengths = batch.text[1].to(device)
                labels = torch.unsqueeze(batch.label, 1).to(device)
                prediction = model(input_embeds, text_lengths)
                loss = criterion(prediction, labels)
                valid_loss += loss.item()
                valid_objs += len(labels)

                # display statistics
                pbar.set_description(f"Epoch {epoch}. Valid Loss: {loss.item() / len(labels):.4}")

        valid_loss /= valid_objs

        # record best model if doing better
        if valid_loss < min_loss:
            min_loss = valid_loss
            best_model = model.state_dict()
        # check for early stopping otherwise
        else:
            cur_patience += 1
            if cur_patience == patience:
                cur_patience = 0
                break
        
        print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, train_loss, valid_loss))
    model.load_state_dict(best_model)

In [None]:
def testing_rnn(model, criterion, test_iter, device="cpu"):
    # ---------------------------------------------------------------------
    # Testing
    # ---------------------------------------------------------------------
    test_loss = 0.0
    correct_preds = 0

    test_tp = 0.0
    test_tn = 0.0
    test_fp = 0.0
    test_fn = 0.0

    test_objs = 0

    pbar = tqdm(enumerate(test_iter), total=len(test_iter), leave=False)
    # test mode and no gradient calculation needed
    model.eval()
    with torch.no_grad():
        for it, batch in pbar:
            input_embeds = batch.text[0].to(device)
            text_lengths = batch.text[1].to(device)
            labels = torch.unsqueeze(batch.label, 1).to(device)
            prediction = model(input_embeds, text_lengths)
            loss = criterion(prediction, labels)
            test_loss += loss.item()
            test_objs += len(labels)

            preds = torch.sigmoid(prediction)       # [batch size, 1]
            preds = (preds > 0.5).to(torch.float)

            preds = preds.reshape(-1, 1).cpu()
            labs = labels.reshape(-1, 1).cpu()

            correct_preds += (labs == preds).sum()

            test_tp += torch.logical_and(preds==1, labs==1).sum()
            test_tn += torch.logical_and(preds==0, labs==0).sum()
            test_fp += torch.logical_and(preds==1, labs==0).sum()
            test_fn += torch.logical_and(preds==0, labs==1).sum()
        
            # display statistics
            pbar.set_description(f"Test Loss: {loss.item() / len(labels):.4}")

    test_loss /= test_objs
    test_acc = correct_preds / test_objs
    test_f1 = test_tp / (test_tp + 0.5*(test_fp + test_fn))

    print(f'Test Loss: {test_loss:.4}')
    print(f'Test Accuracy: {test_acc:.4}')
    # print(f'TP rate: {test_tp / test_objs:.4}')
    # print(f'TN rate: {test_tn / test_objs:.4}')
    # print(f'FP rate: {test_fp / test_objs:.4}')
    # print(f'FN rate: {test_fn / test_objs:.4}')
    print(f'F1 score: {test_f1:.4}')


In [None]:
model = RNNBaseline(
    vocab_size=len(TEXT.vocab),
    embedding_dim=100,
    hidden_dim=256,
    output_dim=1,
    n_layers=2,
    bidirectional=True,
    dropout=0.2,
    pad_idx=TEXT.vocab.stoi[TEXT.pad_token]
)

model = model.to(device)

In [None]:
criterion = nn.BCEWithLogitsLoss(reduction='sum') # to account for batches of different sizes
optimizer = torch.optim.Adam(model.parameters())

In [None]:
# training hyperparameters as arguments
training_rnn(model=model, criterion=criterion, optimizer=optimizer, max_grad_norm=2,
         max_epochs=20, patience=3,
         train_iter=train_iter, valid_iter=valid_iter, device=device)

HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

Epoch: 1, Training Loss: 0.6501353975568499, Validation Loss: 0.6270638387044271


HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

Epoch: 2, Training Loss: 0.5928358008248465, Validation Loss: 0.5412094528834025


HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

Epoch: 3, Training Loss: 0.4375962215968541, Validation Loss: 0.44616806405385334


HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

Epoch: 4, Training Loss: 0.30271724583762033, Validation Loss: 0.4533872182210286


HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

Epoch: 5, Training Loss: 0.19917298927307128, Validation Loss: 0.4039086194356283


HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

Epoch: 6, Training Loss: 0.12693643804277693, Validation Loss: 0.5141480538050334


HBox(children=(FloatProgress(value=0.0, max=274.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=118.0), HTML(value='')))

In [None]:
testing_rnn(model=model, criterion=criterion,
        test_iter=test_iter, device=device)

HBox(children=(FloatProgress(value=0.0, max=391.0), HTML(value='')))

Test Loss: 0.6352
Test Accuracy: 0.8255
F1 score: 0.8147


Посчитайте f1-score вашего классификатора на тестовом датасете.

**Ответ**: 0.794

## CNN

![](https://www.researchgate.net/publication/333752473/figure/fig1/AS:769346934673412@1560438011375/Standard-CNN-on-text-classification.png)

Для классификации текстов также часто используют сверточные нейронные сети. Идея в том, что как правило сентимент содержат словосочетания из двух-трех слов, например "очень хороший фильм" или "невероятная скука". Проходясь сверткой по этим словам мы получим какой-то большой скор и выхватим его с помощью MaxPool. Далее идет обычная полносвязная сетка. Важный момент: свертки применяются не последовательно, а параллельно. Давайте попробуем!

In [None]:
TEXT = Field(sequential=True, lower=True, batch_first=True)  # batch_first because we use conv  
LABEL = LabelField(batch_first=True, dtype=torch.float)

train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

TEXT.build_vocab(trn)
LABEL.build_vocab(trn)

device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
train_iter, valid_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 256),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)

In [None]:
for batch in train_iter:
    print("Embedded features dimentions: \n[batch size, sent len] =", batch.text.shape, batch.text.device)
    print()
    print("Batch labels:\n", batch.label)
    break

Embedded features dimentions: 
[batch size, sent len] = torch.Size([128, 1475]) cuda:0

Batch labels:
 tensor([0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1.,
        0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1., 1.,
        1., 1., 0., 0., 0., 1., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 1.,
        0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1.,
        0., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1.,
        1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0.,
        0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0.,
        1., 0.], device='cuda:0')


Вы можете использовать Conv2d с `in_channels=1, kernel_size=(kernel_sizes[0], emb_dim))` или Conv1d c `in_channels=emb_dim, kernel_size=kernel_sizes[0]`. Но хорошенько подумайте над shape в обоих случаях.

In [None]:
class CNN(nn.Module):
    def __init__(
        self,
        vocab_size,
        emb_dim,
        out_channels,
        kernel_sizes,
        dropout=0.5,
    ):
        super().__init__()
        
        self.embedding = nn.Embedding(vocab_size, emb_dim)
        self.conv_0 = nn.Conv1d(in_channels=emb_dim, out_channels=out_channels,
                                kernel_size=kernel_sizes[0], padding=1)  # YOUR CODE GOES HERE
        
        self.conv_1 = nn.Conv1d(in_channels=emb_dim, out_channels=out_channels,
                                kernel_size=kernel_sizes[1], padding=1)  # YOUR CODE GOES HERE
        
        self.conv_2 = nn.Conv1d(in_channels=emb_dim, out_channels=out_channels,
                                kernel_size=kernel_sizes[2], padding=1)  # YOUR CODE GOES HERE
        
        self.fc = nn.Linear(len(kernel_sizes) * out_channels, 1)
        
        self.dropout = nn.Dropout(dropout)
        
        
    def forward(self, text):
        
        #print(text.shape) # torch.Size([128, 877])
        embedded = self.embedding(text)
        #print(embedded.shape) # torch.Size([128, 877, 300])
        embedded = embedded.permute(0, 2, 1)  # may be reshape here
        #print(embedded.shape) # torch.Size([128, 300, 877])
        
        conved_0 = F.relu(self.conv_0(embedded))  # may be reshape here
        conved_1 = F.relu(self.conv_1(embedded))  # may be reshape here
        conved_2 = F.relu(self.conv_2(embedded))  # may be reshape here
        
        pooled_0 = F.max_pool1d(conved_0, conved_0.shape[2]).squeeze(2)
        pooled_1 = F.max_pool1d(conved_1, conved_1.shape[2]).squeeze(2)
        pooled_2 = F.max_pool1d(conved_2, conved_2.shape[2]).squeeze(2)
        
        cat = self.dropout(torch.cat((pooled_0, pooled_1, pooled_2), dim=1))
            
        return self.fc(cat)

In [None]:
# # model hyperparameters

# kernel_sizes = [3, 4, 5]
# vocab_size = len(TEXT.vocab)
# out_channels=64
# dropout = 0.5
# dim = 300

model = CNN(vocab_size=len(TEXT.vocab),
            emb_dim=300,
            out_channels=64,
            kernel_sizes=[3, 4, 5],
            dropout=0.5)

model.to(device)

CNN(
  (embedding): Embedding(202268, 300)
  (conv_0): Conv1d(300, 64, kernel_size=(3,), stride=(1,), padding=(1,))
  (conv_1): Conv1d(300, 64, kernel_size=(4,), stride=(1,), padding=(1,))
  (conv_2): Conv1d(300, 64, kernel_size=(5,), stride=(1,), padding=(1,))
  (fc): Linear(in_features=192, out_features=1, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

In [None]:
criterion = nn.BCEWithLogitsLoss(reduction='sum') # to account for batches of different sizes
optimizer = torch.optim.Adam(model.parameters())

Обучите!

In [None]:
def training_cnn(model, criterion, optimizer, max_grad_norm,
             max_epochs, patience,
             train_iter, valid_iter, device="cpu"):
    # note, criterion assumes reduction='sum' to account for batches of different sizes


    # -------------------------------------------------------------------------
    # Traininng
    # -------------------------------------------------------------------------
    min_loss = np.inf
    cur_patience = 0

    for epoch in range(1, max_epochs + 1):
        train_loss = 0.0
        train_objs = 0

        model.train()
        pbar = tqdm(enumerate(train_iter), total=len(train_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        for it, batch in pbar: 
            
                optimizer.zero_grad()
                input_embeds = batch.text.to(device)
                labels = torch.unsqueeze(batch.label, 1).to(device)

                prediction = model(input_embeds)
                loss = criterion(prediction, labels)
                loss.backward()

                train_loss += loss.item()

                # new step: gradient clipping
                if max_grad_norm is not None:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
                
                # update the gradients
                optimizer.step()
                
                # display statistics
                pbar.set_description(f"Epoch {epoch}. Train Loss: {loss.item() / len(labels):.4}")

                # record how many samples we had in this batch
                train_objs += len(labels)

        train_loss /= train_objs

        # ---------------------------------------------------------------------
        # Validation
        # ---------------------------------------------------------------------
        valid_loss = 0.0
        valid_objs = 0

        pbar = tqdm(enumerate(valid_iter), total=len(valid_iter), leave=False)
        pbar.set_description(f"Epoch {epoch}")
        # test mode and no gradient calculation needed
        model.eval()
        with torch.no_grad():
            for it, batch in pbar:
                input_embeds = batch.text.to(device)
                labels = torch.unsqueeze(batch.label, 1).to(device)
                prediction = model(input_embeds)
                loss = criterion(prediction, labels)
                valid_loss += loss.item()
                valid_objs += len(labels)

                # display statistics
                pbar.set_description(f"Epoch {epoch}. Valid Loss: {loss.item() / len(labels):.4}")

        valid_loss /= valid_objs

        # record best model if doing better
        if valid_loss < min_loss:
            min_loss = valid_loss
            best_model = model.state_dict()
        # check for early stopping otherwise
        else:
            cur_patience += 1
            if cur_patience == patience:
                cur_patience = 0
                break
        
        print('Epoch: {}, Training Loss: {}, Validation Loss: {}'.format(epoch, train_loss, valid_loss))
    model.load_state_dict(best_model)

In [None]:
# training hyperparameters as arguments
training_cnn(model=model, criterion=criterion, optimizer=optimizer, max_grad_norm=None,
         max_epochs=20, patience=3,
         train_iter=train_iter, valid_iter=valid_iter, device=device)

HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 1, Training Loss: 0.6495634680611747, Validation Loss: 0.476577348836263


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 2, Training Loss: 0.49900211552211216, Validation Loss: 0.42731297200520835


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 3, Training Loss: 0.43236524527413506, Validation Loss: 0.3941449203491211


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 4, Training Loss: 0.3708957024710519, Validation Loss: 0.36383080774943033


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 5, Training Loss: 0.31528231549944197, Validation Loss: 0.34818512827555337


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 6, Training Loss: 0.2680716057913644, Validation Loss: 0.337379541015625


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 7, Training Loss: 0.19849421702793665, Validation Loss: 0.3340445218404134


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 8, Training Loss: 0.1432853064945766, Validation Loss: 0.33707023366292316


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 9, Training Loss: 0.1025441073008946, Validation Loss: 0.3427331952412923


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

In [None]:
def testing_cnn(model, criterion, test_iter, device="cpu"):
    # ---------------------------------------------------------------------
    # Testing
    # ---------------------------------------------------------------------
    test_loss = 0.0
    correct_preds = 0

    test_tp = 0.0
    test_tn = 0.0
    test_fp = 0.0
    test_fn = 0.0

    test_objs = 0

    pbar = tqdm(enumerate(test_iter), total=len(test_iter), leave=False)
    # test mode and no gradient calculation needed
    model.eval()
    with torch.no_grad():
        for it, batch in pbar:
            input_embeds = batch.text.to(device)
            labels = torch.unsqueeze(batch.label, 1).to(device)
            prediction = model(input_embeds)
            loss = criterion(prediction, labels)
            test_loss += loss.item()
            test_objs += len(labels)

            preds = torch.sigmoid(prediction)       # [batch size, 1]
            preds = (preds > 0.5).to(torch.float)

            preds = preds.reshape(-1, 1).cpu()
            labs = labels.reshape(-1, 1).cpu()

            correct_preds += (labs == preds).sum()

            test_tp += torch.logical_and(preds==1, labs==1).sum()
            test_tn += torch.logical_and(preds==0, labs==0).sum()
            test_fp += torch.logical_and(preds==1, labs==0).sum()
            test_fn += torch.logical_and(preds==0, labs==1).sum()
        
            # display statistics
            pbar.set_description(f"Test Loss: {loss.item() / len(labels):.4}")

    test_loss /= test_objs
    test_acc = correct_preds / test_objs
    test_f1 = test_tp / (test_tp + 0.5*(test_fp + test_fn))

    print(f'Test Loss: {test_loss:.4}')
    print(f'Test Accuracy: {test_acc:.4}')
    # print(f'TP rate: {test_tp / test_objs:.4}')
    # print(f'TN rate: {test_tn / test_objs:.4}')
    # print(f'FP rate: {test_fp / test_objs:.4}')
    # print(f'FN rate: {test_fn / test_objs:.4}')
    print(f'F1 score: {test_f1:.4}')


In [None]:
testing_cnn(model=model, criterion=criterion,
        test_iter=test_iter, device=device)

HBox(children=(FloatProgress(value=0.0, max=98.0), HTML(value='')))

Test Loss: 0.3655
Test Accuracy: 0.8558
F1 score: 0.8576


Посчитайте f1-score вашего классификатора.

**Ответ**: 0.8576

## Интерпретируемость

Посмотрим, куда смотрит наша модель. Достаточно запустить код ниже.

In [None]:
!pip install -q captum

[K     |████████████████████████████████| 4.4MB 8.9MB/s 
[?25h

In [None]:
from captum.attr import LayerIntegratedGradients, TokenReferenceBase, visualization

PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)

In [None]:
def forward_with_softmax(inp):
    logits = model(inp)
    return torch.softmax(logits, 0)[0][1]

def forward_with_sigmoid(input):
    return torch.sigmoid(model(input))


# accumalate couple samples in this array for visualization purposes
vis_data_records_ig = []

def interpret_sentence(model, sentence, min_len = 7, label = 0):
    model.eval()
    text = [tok for tok in TEXT.tokenize(sentence)]
    if len(text) < min_len:
        text += ['pad'] * (min_len - len(text))
    indexed = [TEXT.vocab.stoi[t] for t in text]

    model.zero_grad()

    input_indices = torch.tensor(indexed, device=device)
    input_indices = input_indices.unsqueeze(0)
    
    # input_indices dim: [sequence_length]
    seq_length = min_len

    # predict
    pred = forward_with_sigmoid(input_indices).item()
    pred_ind = round(pred)

    # generate reference indices for each sample
    reference_indices = token_reference.generate_reference(seq_length, device=device).unsqueeze(0)

    # compute attributions and approximation delta using layer integrated gradients
    attributions_ig, delta = lig.attribute(input_indices, reference_indices, \
                                           n_steps=5000, return_convergence_delta=True)

    print('pred: ', LABEL.vocab.itos[pred_ind], '(', '%.2f'%pred, ')', ', delta: ', abs(delta))

    add_attributions_to_visualizer(attributions_ig, text, pred, pred_ind, label, delta, vis_data_records_ig)
    
def add_attributions_to_visualizer(attributions, text, pred, pred_ind, label, delta, vis_data_records):
    attributions = attributions.sum(dim=2).squeeze(0)
    attributions = attributions / torch.norm(attributions)
    attributions = attributions.cpu().detach().numpy()

    # storing couple samples in an array for visualization purposes
    vis_data_records.append(visualization.VisualizationDataRecord(
                            attributions,
                            pred,
                            LABEL.vocab.itos[pred_ind],
                            LABEL.vocab.itos[label],
                            LABEL.vocab.itos[1],
                            attributions.sum(),       
                            text,
                            delta))

In [None]:
interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

pred:  pos ( 1.00 ) , delta:  tensor([0.0002], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.03 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.33 ) , delta:  tensor([3.9914e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([2.3354e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.05 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.09 ) , delta:  tensor([0.0002], device='cuda:0', dtype=torch.float64)


Попробуйте добавить свои примеры!

In [None]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (1.00),pos,1.5,It was a fantastic performance ! pad
,,,,
pos,neg (0.03),pos,1.61,Best film ever pad pad pad pad
,,,,
pos,neg (0.33),pos,1.39,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.07,It was a horrible movie pad pad
,,,,
neg,neg (0.05),pos,0.73,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (1.00),pos,1.5,It was a fantastic performance ! pad
,,,,
pos,neg (0.03),pos,1.61,Best film ever pad pad pad pad
,,,,
pos,neg (0.33),pos,1.39,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-0.07,It was a horrible movie pad pad
,,,,
neg,neg (0.05),pos,0.73,I've never watched something as bad pad
,,,,


## Эмбэдинги слов

Вы ведь не забыли, как мы можем применить знания о word2vec и GloVe. Давайте попробуем!

In [None]:
TEXT.build_vocab(trn, vectors=GloVe(name='6B', dim=300))# YOUR CODE GOES HERE
# подсказка: один из импортов пока не использовался, быть может он нужен в строке выше :)
LABEL.build_vocab(trn)

word_embeddings = TEXT.vocab.vectors

kernel_sizes = [3, 4, 5]
vocab_size = len(TEXT.vocab)
dropout = 0.5
dim = 300

In [None]:
train, tst = datasets.IMDB.splits(TEXT, LABEL)
trn, vld = train.split(random_state=random.seed(SEED))

device = "cuda" if torch.cuda.is_available() else "cpu"

train_iter, valid_iter, test_iter = BucketIterator.splits(
        (trn, vld, tst),
        batch_sizes=(128, 256, 256),
        sort=False,
        sort_key= lambda x: len(x.src),
        sort_within_batch=False,
        device=device,
        repeat=False,
)

In [None]:
model = CNN(vocab_size=vocab_size, emb_dim=dim, out_channels=64,
            kernel_sizes=kernel_sizes, dropout=dropout)

word_embeddings = TEXT.vocab.vectors

prev_shape = model.embedding.weight.shape

model.embedding.weight.data.copy_(word_embeddings)

assert prev_shape == model.embedding.weight.shape
model.to(device)

optimizer = torch.optim.Adam(model.parameters())

Вы знаете, что делать.

In [None]:
# training hyperparameters as arguments
training_cnn(model=model, criterion=criterion, optimizer=optimizer, max_grad_norm=None,
         max_epochs=20, patience=3,
         train_iter=train_iter, valid_iter=valid_iter, device=device)

HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 1, Training Loss: 0.5128663722446987, Validation Loss: 0.36645290374755857


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 2, Training Loss: 0.31415374559674947, Validation Loss: 0.33394276733398437


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 3, Training Loss: 0.18000206538609095, Validation Loss: 0.3024916882832845


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 4, Training Loss: 0.0775527012688773, Validation Loss: 0.3282683836619059


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

Epoch: 5, Training Loss: 0.028532534878594536, Validation Loss: 0.36055744857788086


HBox(children=(FloatProgress(value=0.0, max=137.0), HTML(value='')))

HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value='')))

In [None]:
testing_cnn(model=model, criterion=criterion,
        test_iter=test_iter, device=device)

HBox(children=(FloatProgress(value=0.0, max=98.0), HTML(value='')))

Test Loss: 0.3972
Test Accuracy: 0.8619
F1 score: 0.8606


Посчитайте f1-score вашего классификатора.

**Ответ**: 0.861

Проверим насколько все хорошо!

In [None]:
PAD_IND = TEXT.vocab.stoi['pad']

token_reference = TokenReferenceBase(reference_token_idx=PAD_IND)
lig = LayerIntegratedGradients(model, model.embedding)
vis_data_records_ig = []

interpret_sentence(model, 'It was a fantastic performance !', label=1)
interpret_sentence(model, 'Best film ever', label=1)
interpret_sentence(model, 'Such a great show!', label=1)
interpret_sentence(model, 'It was a horrible movie', label=0)
interpret_sentence(model, 'I\'ve never watched something as bad', label=0)
interpret_sentence(model, 'It is a disgusting movie!', label=0)

pred:  pos ( 0.97 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.46 ) , delta:  tensor([8.9547e-05], device='cuda:0', dtype=torch.float64)
pred:  pos ( 0.84 ) , delta:  tensor([0.0001], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([0.0003], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.23 ) , delta:  tensor([8.3432e-05], device='cuda:0', dtype=torch.float64)
pred:  neg ( 0.00 ) , delta:  tensor([4.7821e-06], device='cuda:0', dtype=torch.float64)


In [None]:
print('Visualize attributions based on Integrated Gradients')
visualization.visualize_text(vis_data_records_ig)

Visualize attributions based on Integrated Gradients


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.97),pos,1.69,It was a fantastic performance ! pad
,,,,
pos,neg (0.46),pos,1.25,Best film ever pad pad pad pad
,,,,
pos,pos (0.84),pos,1.54,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-1.0,It was a horrible movie pad pad
,,,,
neg,neg (0.23),pos,-0.32,I've never watched something as bad pad
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
pos,pos (0.97),pos,1.69,It was a fantastic performance ! pad
,,,,
pos,neg (0.46),pos,1.25,Best film ever pad pad pad pad
,,,,
pos,pos (0.84),pos,1.54,Such a great show! pad pad pad
,,,,
neg,neg (0.00),pos,-1.0,It was a horrible movie pad pad
,,,,
neg,neg (0.23),pos,-0.32,I've never watched something as bad pad
,,,,
