Link do notatnika: https://colab.research.google.com/drive/1VWEtKfx8RTOgwfoGqwWrGM2GTtR_wiP1?usp=sharing

---



<h1>ZMGSN Lista 5. - Architektura Sieci typu Transformer</h1>

<h2>Opis zadania</h2>

W ramach zadania należy:
<ol>
<li>Zapoznać się z poniższym eksperymentem z wykorzystaniem sieci typu Transformer</li>
<li>Dokonać analizy wpływu hiperparametrów eksperymentu, np. kroku uczenia, rozmiaru pakietu (ang. <i>batch size</i>), liczby epok na wyniki sieci typu Transformer (10pkt)</li>
<li>Wykorzystać inne architektury wstępnie wyuczonych sieci typu Transformer (np. RoBERTa, XLM-RoBERTa, DistilBERT, AlBERT, DeBERTa, XLNet, MPNet, LaBSE itp.) w celu uzasadnienia w jaki sposób ich architektura, sposób uczenia i zbiór uczący mogły mieć wpływ na wyniki (30pkt)</li>
<li>Wykorzystać inne rozmiary wstępnie wyuczonych sieci typu Transformer w celu uzasadnienia w jaki sposób ich architektura, sposób uczenia i zbiór uczący mogły mieć wpływ na wyniki - porównać modele o tej samej architekturze, ale różnych rozmiarach np. (XLM-RoBERTa-base, XLM-RoBERTa-large) (20pkt)</li>
<li>Dokonać modyfikacji rozszerzenia wstępnie wyuczonej sieci typu Transformer w celu zbadania wpływu architektury na wyniki (5pkt)</li>
<li>Zaimplementować własne warianty rozszerzeń architektury wstępnie wyuczonej sieci typu Transformer w celu zbadania ich wpływu na wyniki (15pkt)</li>
<li>Zbadać wpływ maksymalnej długości tekstu oraz różnych strategii paddingu dla każdego z wykorzystanych wstępnie wyuczonych modeli (20 pkt)</li>
<li>Dokonać ewaluacji różnych wstępnie wyuczonych modeli sieci typu Transformer (różne rozmiary modeli oraz typy), modyfikacji oryginalnego rozszerzenia oraz opracowanych rozszerzeń zgodnie z punktami 2., 5., 6. i 7.  </li>
<li>Dokonać usystematyzowanej ewaluacji porównawczej wszystkich modeli sieci typu transformer, ich wariantów, wykorzystanych rozszerzeń w celu zbadania różnic, podobieństw oraz analogicznych cech ich charakterystyki.</li>
<li>Opracować procedurę ewaluacji jakości działania modeli sieci typu Transformer, uwzględniającą różne metody wizualizacji (np. wykresy, miary, klasy), klasteryzacji, redukcji wymiarów (np. t-SNE), walidacji krzyżowej, wpływu charakterystyki zbioru uczącego na działanie modelu, podatności na semantykę tekstów w zbiorze uczącym i testowym itp. </li>
</ol>

Należy przygotować raport w LaTeX, który będzie zawierać opis architektury typu Transformer, opis wybranych architektur, opis wykonanych eksperymentów, opis procedury ewaluacji, wyniki ewaluacji oraz wnioski. Ocena za raport będzie stanowić 50% oceny za listę. Maksymalna liczba punktów za każde zadanie będzie przyznana za implementację oraz kompletny opis w raporcie. Punkty za zadania 2-7 będą przyznane za implementację oraz ewaluację.

Ocenie podlegać będzie jakość wykonania zadania, w tym:
<ol>
<li>Właściwe wykonanie zadań</li>
<li>Rzetelne opracowanie wyników, uwzględniające analizę jakościową i ilościową</li>
<li>Opracowanie wniosków mających na celu wyjaśnienie badanych zjawisk i uzyskanych wyników</li>
<li>Opracowanie i wyjaśnienie kodu źródłowego</li>
<li>Raport </li>
</ol>


<h2>Import używanych bibliotek</h2>

In [None]:
import os

import gdown
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import transformers

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.utils.class_weight import compute_class_weight
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from transformers import AdamW, AutoModel, AutoTokenizer

<h2>Inicjalizacja ziarna generatora liczb pseudolosowych</h2>

In [None]:
torch.manual_seed(0)

<torch._C.Generator at 0x7be0583555b0>

<h2>Określenie domyślnego urządzenia na podstawie sprawdzenia dostępności karty graficznej</h2>

In [None]:
device = torch.device("cpu") if not torch.cuda.is_available() else torch.device("cuda:0")
print("Using device", device)

Using device cuda:0


<h2>Pobranie i rozpakowanie zbioru danych</h2>

In [None]:
if os.path.exists('data.csv'):
  os.remove('data.csv')

In [None]:
url = 'https://drive.google.com/uc?id=1HSnB-D0dKDI2bE9iOsp-Vr8tumihdvbH'
output = 'data.csv'

gdown.download(url, output, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1HSnB-D0dKDI2bE9iOsp-Vr8tumihdvbH
To: /content/data.csv
100%|██████████| 467k/467k [00:00<00:00, 107MB/s]


'data.csv'

<h2>Wczytanie zbioru danych</h2>

In [None]:
df = pd.read_csv('data.csv')
df.head()

Unnamed: 0,label,text
0,0,"Go until jurong point, crazy.. Available only ..."
1,0,Ok lar... Joking wif u oni...
2,1,Free entry in 2 a wkly comp to win FA Cup fina...
3,0,U dun say so early hor... U c already then say...
4,0,"Nah I don't think he goes to usf, he lives aro..."


<h2>Podział zbioru na podzbiór uczący i testowy</h2>

In [None]:
train_text, temp_text, train_labels, temp_labels = train_test_split(df['text'], df['label'],
                                                                    random_state=2018,
                                                                    test_size=0.3,
                                                                    stratify=df['label'])


val_text, test_text, val_labels, test_labels = train_test_split(temp_text, temp_labels,
                                                                random_state=2018,
                                                                test_size=0.5,
                                                                stratify=temp_labels)

<h2>Pobranie modelu wstępnie wyuczonej sieci typu Transformer</h2>

In [None]:
# Pobranie oraz wczytanie modelu transformera
pretrained_model = AutoModel.from_pretrained('bert-base-uncased')

# Pobranie oraz wczytanie dedykowanego tokenizatora
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

<h2>Tokenizacja oraz wygenerowanie wektorowych reprezentacji tekstów</h2>

In [None]:
# Tokenizacja i wygenerowanie reprezentacji wektorowych tekstów ze zbioru uczącego
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

# Tokenizacja i wygenerowanie reprezentacji wektorowych tekstów ze zbioru walidacyjnego
tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

# Tokenizacja i wygenerowanie reprezentacji wektorowych tekstów ze zbioru testowego
tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

<h2>Konwersja list na tensory</h2>

In [None]:
train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

<h2>Przygotowanie instancji klas typu DataLoader</h2>

In [None]:
# Określenie rozmiaru pakietu (ang. batch size)
batch_size = 32

# Utworzenie obiektu klasy nadrzędnej dla zbiorów: uczącego, walidacyjnego i teestowego
train_data = TensorDataset(train_seq, train_mask, train_y)

# Przygotowanie obiektu klasy pozwalającej na próbkowanie zbioru uczącego
train_sampler = RandomSampler(train_data)

# Przygotowanie obhiektu klasy DataLoader dla zbioru uczącego
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

# Przygotowanie klasy nadrzędnej dla zbioru walidacyjnego
val_data = TensorDataset(val_seq, val_mask, val_y)

# Przygotowanie obiektu klasy pozwalającej na próbkowanie zbioru uczącego
val_sampler = SequentialSampler(val_data)

# Przygotowanie obhiektu klasy DataLoader dla zbioru walidacyjnego
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

<h2>Przygotowanie rozszerzenia architektury wstępnie wyuczonego modelu sieci typu Transformer</h2>

In [None]:
# Zamrozenie wszystkich parametrów pierwotnej sieci
for param in pretrained_model.parameters():
    param.requires_grad = False

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

<h2>Konfiguracja eksperymentu</h2>

In [None]:
# Inicjalizacja rozszerzonej architektury pierwotnym modelem
model = Ext_Arch(pretrained_model)

# Przeniesienie modelu do pamięci domyślnego urządzenia
model = model.to(device)

In [None]:
# Inicjalizacja optymalizatora
optimizer = AdamW(model.parameters(),lr = 1e-3)



In [None]:
# Obliczenie wag klas
class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)

print("Class Weights:", class_weights)

Class Weights: [0.57743559 3.72848948]


In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 10

<h2>Kalibracja modelu (ang. fine-tuning)</h2>

In [None]:
def train():

    model.train()
    total_loss, total_accuracy = 0, 0

    # przygotowanie listy do przechowywania predykcji modelu
    total_preds=[]

    for step,batch in enumerate(train_dataloader):

        if step % 50 == 0 and not step == 0:
            print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))

        batch = [r.to(device) for r in batch]

        sent_id, mask, labels = batch

        model.zero_grad()

        preds = model(sent_id, mask)

        loss = cross_entropy(preds, labels)

        total_loss = total_loss + loss.item()

        loss.backward()

        # Normalizacja wartości gradientów
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        optimizer.step()

        preds=preds.detach().cpu().numpy()

    total_preds.append(preds)

    avg_loss = total_loss / len(train_dataloader)

    # Predykcje modelu mają wymiary (liczba pakietów, rozmiar pakietu, liczba klas).
    # Przekształcenie ich do wymiarów (liczba próbek, liczba klas)
    total_preds  = np.concatenate(total_preds, axis=0)

    return avg_loss, total_preds

In [None]:
def evaluate():

    print("\nEvaluating...")

    model.eval()

    total_loss, total_accuracy = 0, 0

    total_preds = []

    for step,batch in enumerate(val_dataloader):

        if step % 50 == 0 and not step == 0:

            # elapsed = format_time(time.time() - t0)

            print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(val_dataloader)))

        batch = [t.to(device) for t in batch]

        sent_id, mask, labels = batch

        with torch.no_grad():

            preds = model(sent_id, mask)

            loss = cross_entropy(preds,labels)

            total_loss = total_loss + loss.item()

            preds = preds.detach().cpu().numpy()

            total_preds.append(preds)

    avg_loss = total_loss / len(val_dataloader)

    total_preds  = np.concatenate(total_preds, axis=0)

    return avg_loss, total_preds

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
# Wczytanie wartości parametrów najlepszego modelu
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

<h2>Wygenerowanie predykcji za pomocą skalibrowanego modelu oraz ocena ich jakości</h2>

In [None]:
# Wygenerowanie predykcji dla zbioru testowego
with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

In [None]:
# Ocena jakości predykcji modelu
preds = np.argmax(preds, axis = 1)
print(classification_report(test_y, preds))

              precision    recall  f1-score   support

           0       0.97      0.90      0.94       724
           1       0.57      0.81      0.67       112

    accuracy                           0.89       836
   macro avg       0.77      0.86      0.80       836
weighted avg       0.92      0.89      0.90       836



#**Eksperymenty**

## 2. Dokonać analizy wpływu hiperparametrów eksperymentu, np. kroku uczenia, rozmiaru pakietu (ang. batch size), liczby epok na wyniki sieci typu Transformer (10pkt)

###Krok uczenia

####0.0001

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 10

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('lr=0.0001')
print(classification_report(test_y, preds))

lr=0.0001
              precision    recall  f1-score   support

           0       1.00      0.92      0.96       724
           1       0.66      0.98      0.79       112

    accuracy                           0.93       836
   macro avg       0.83      0.95      0.87       836
weighted avg       0.95      0.93      0.93       836



####0.001

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 10

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('lr=0.001')
print(classification_report(test_y, preds))

lr=0.001
              precision    recall  f1-score   support

           0       0.99      0.97      0.98       724
           1       0.83      0.92      0.87       112

    accuracy                           0.96       836
   macro avg       0.91      0.95      0.93       836
weighted avg       0.97      0.96      0.96       836



###Batch size

####16

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 10

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bs=16')
print(classification_report(test_y, preds))

bs=16
              precision    recall  f1-score   support

           0       0.99      0.96      0.98       724
           1       0.80      0.94      0.86       112

    accuracy                           0.96       836
   macro avg       0.89      0.95      0.92       836
weighted avg       0.96      0.96      0.96       836



####64

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 10

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bs=64')
print(classification_report(test_y, preds))

bs=64
              precision    recall  f1-score   support

           0       0.99      0.96      0.97       724
           1       0.77      0.96      0.86       112

    accuracy                           0.96       836
   macro avg       0.88      0.96      0.92       836
weighted avg       0.96      0.96      0.96       836



###Epoki

####5

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 5

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('epochs=5')
print(classification_report(test_y, preds))

epochs=5
              precision    recall  f1-score   support

           0       1.00      0.88      0.94       724
           1       0.57      0.98      0.72       112

    accuracy                           0.90       836
   macro avg       0.78      0.93      0.83       836
weighted avg       0.94      0.90      0.91       836



####20

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 20

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('epochs=20')
print(classification_report(test_y, preds))

epochs=20
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       724
           1       0.88      0.95      0.91       112

    accuracy                           0.98       836
   macro avg       0.94      0.96      0.95       836
weighted avg       0.98      0.98      0.98       836



####40

In [None]:
# Konwersja listy z wagami klas do typu tensorowego
weights= torch.tensor(class_weights,dtype=torch.float)

# Przeniesienie wag do pamięci domyślnego urządzenia
weights = weights.to(device)

# Określenie funkcji straty
cross_entropy  = nn.NLLLoss(weight=weights)

# Określenie liczby epok
epochs = 40

In [None]:
# Inicjalizacja początkowej wartości funkcji straty
best_valid_loss = float('inf')

# Inicjalizacja list na wartości funkcji straty na zbiorze uczącym i walidacyjnym
train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('epochs=40')
print(classification_report(test_y, preds))

epochs=40
              precision    recall  f1-score   support

           0       0.99      0.98      0.98       724
           1       0.87      0.95      0.91       112

    accuracy                           0.97       836
   macro avg       0.93      0.96      0.95       836
weighted avg       0.98      0.97      0.97       836



##3. Wykorzystać inne architektury wstępnie wyuczonych sieci typu Transformer (np. RoBERTa, XLM-RoBERTa, DistilBERT, AlBERT, DeBERTa, XLNet, MPNet, LaBSE itp.) w celu uzasadnienia w jaki sposób ich architektura, sposób uczenia i zbiór uczący mogły mieć wpływ na wyniki (30pkt)

###Distilbert

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
pretrained_model = AutoModel.from_pretrained("distilbert-base-uncased")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 30

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.659
Validation Loss: 0.612

 Epoch 2 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.574
Validation Loss: 0.524

 Epoch 3 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.487
Validation Loss: 0.439

 Epoch 4 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.411
Validation Loss: 0.367

 Epoch 5 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.346
Validation Loss: 0.309

 Epoch 6 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.299
Validation Loss: 0.266

 Epoch 7 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.265
Validation Loss: 0.237

 Epoch 8 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.234
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('distilbert')
print(classification_report(test_y, preds))

distilbert
              precision    recall  f1-score   support

           0       0.99      0.97      0.98       724
           1       0.81      0.95      0.87       112

    accuracy                           0.96       836
   macro avg       0.90      0.96      0.93       836
weighted avg       0.97      0.96      0.96       836



###Roberta

In [None]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
pretrained_model = AutoModel.from_pretrained("roberta-base")

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 30

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.699
Validation Loss: 0.690

 Epoch 2 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.686
Validation Loss: 0.682

 Epoch 3 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.678
Validation Loss: 0.674

 Epoch 4 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.670
Validation Loss: 0.666

 Epoch 5 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.663
Validation Loss: 0.659

 Epoch 6 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.655
Validation Loss: 0.651

 Epoch 7 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.648
Validation Loss: 0.643

 Epoch 8 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.641
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('roberta')
print(classification_report(test_y, preds))

roberta
              precision    recall  f1-score   support

           0       0.98      0.99      0.98       724
           1       0.92      0.88      0.90       112

    accuracy                           0.97       836
   macro avg       0.95      0.93      0.94       836
weighted avg       0.97      0.97      0.97       836



##4. Wykorzystać inne rozmiary wstępnie wyuczonych sieci typu Transformer w celu uzasadnienia w jaki sposób ich architektura, sposób uczenia i zbiór uczący mogły mieć wpływ na wyniki - porównać modele o tej samej architekturze, ale różnych rozmiarach np. (XLM-RoBERTa-base, XLM-RoBERTa-large) (20pkt)



###xlm-roberta-base

In [None]:
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
pretrained_model = AutoModel.from_pretrained("xlm-roberta-base")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 30

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.691
Validation Loss: 0.685

 Epoch 2 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.683
Validation Loss: 0.679

 Epoch 3 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.676
Validation Loss: 0.671

 Epoch 4 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.669
Validation Loss: 0.666

 Epoch 5 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.660
Validation Loss: 0.661

 Epoch 6 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.655
Validation Loss: 0.654

 Epoch 7 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.648
Validation Loss: 0.648

 Epoch 8 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.639
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('xlm-roberta-base')
print(classification_report(test_y, preds))

xlm-roberta-base
              precision    recall  f1-score   support

           0       0.99      0.96      0.98       724
           1       0.81      0.96      0.88       112

    accuracy                           0.96       836
   macro avg       0.90      0.96      0.93       836
weighted avg       0.97      0.96      0.97       836



###bert-large-uncased

In [None]:
#bert-base-uncased
pretrained_model

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
  

In [None]:
#bert-large-uncased
pretrained_model

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 1024, padding_idx=0)
    (position_embeddings): Embedding(512, 1024)
    (token_type_embeddings): Embedding(2, 1024)
    (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-23): 24 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=1024, out_features=1024, bias=True)
            (key): Linear(in_features=1024, out_features=1024, bias=True)
            (value): Linear(in_features=1024, out_features=1024, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=1024, out_features=1024, bias=True)
            (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inpl

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased")
pretrained_model = AutoModel.from_pretrained("bert-large-uncased")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(1024,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 30

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.597
Validation Loss: 0.517

 Epoch 2 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.466
Validation Loss: 0.402

 Epoch 3 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.371
Validation Loss: 0.332

 Epoch 4 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.309
Validation Loss: 0.279

 Epoch 5 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.269
Validation Loss: 0.243

 Epoch 6 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.238
Validation Loss: 0.222

 Epoch 7 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.222
Validation Loss: 0.196

 Epoch 8 / 30
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.207
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert-large-uncased')
print(classification_report(test_y, preds))

bert-large-uncased
              precision    recall  f1-score   support

           0       0.99      0.98      0.98       724
           1       0.86      0.96      0.91       112

    accuracy                           0.97       836
   macro avg       0.93      0.97      0.95       836
weighted avg       0.98      0.97      0.97       836



##5. Dokonać modyfikacji rozszerzenia wstępnie wyuczonej sieci typu Transformer w celu zbadania wpływu architektury na wyniki (5pkt)


###bert - bez dropoutu

In [None]:
pretrained_model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        # self.dropout = nn.Dropout(0.1)  #bez Dropoutu
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        # x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)
model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()
    valid_loss, _ = evaluate()

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.619
Validation Loss: 0.596

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.592
Validation Loss: 0.573

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.568
Validation Loss: 0.542

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.549
Validation Loss: 0.521

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.527
Validation Loss: 0.500

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.512
Validation Loss: 0.480

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.490
Validation Loss: 0.472

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.480
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert bez dropoutu')
print(classification_report(test_y, preds))

bert bez dropoutu
              precision    recall  f1-score   support

           0       0.98      0.92      0.95       724
           1       0.62      0.87      0.72       112

    accuracy                           0.91       836
   macro avg       0.80      0.89      0.84       836
weighted avg       0.93      0.91      0.92       836



###bert - więcej warstw

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)  #bez Dropoutu
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,256)
        self.fc3 = nn.Linear(256, 2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)
model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()
    valid_loss, _ = evaluate()

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.679
Validation Loss: 0.663

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.647
Validation Loss: 0.625

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.617
Validation Loss: 0.589

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.581
Validation Loss: 0.542

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.543
Validation Loss: 0.502

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.501
Validation Loss: 0.464

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.475
Validation Loss: 0.428

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.441
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert więcej warstw')
print(classification_report(test_y, preds))

bert więcej warstw
              precision    recall  f1-score   support

           0       0.98      0.95      0.96       724
           1       0.72      0.88      0.79       112

    accuracy                           0.94       836
   macro avg       0.85      0.92      0.88       836
weighted avg       0.95      0.94      0.94       836



##6. Zaimplementować własne warianty rozszerzeń architektury wstępnie wyuczonej sieci typu Transformer w celu zbadania ich wpływu na wyniki (15pkt)


###bert - rozszerzenie conv

In [None]:
pretrained_model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()

        self.conv1 = nn.Conv1d(768,512, kernel_size=3, padding=1)
        self.conv2 = nn.Conv1d(512,256, kernel_size=3, padding=1)

        self.lin1 = nn.Linear(256, 2)

        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)

        cls_hs = cls_hs.unsqueeze(2)

        x = self.conv1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)

        x = self.conv2(x)
        x = self.relu(x)
        x = self.dropout(x)

        x = x.squeeze(2)

        x = self.lin1(x)
        x = self.relu(x)
        x = self.dropout(x)

        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)
model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()
    valid_loss, _ = evaluate()

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.690
Validation Loss: 0.685

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.681
Validation Loss: 0.673

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.667
Validation Loss: 0.651

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.643
Validation Loss: 0.622

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.621
Validation Loss: 0.587

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.586
Validation Loss: 0.551

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.557
Validation Loss: 0.515

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.528
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert conv1d')
print(classification_report(test_y, preds))

bert conv1d
              precision    recall  f1-score   support

           0       0.99      0.93      0.95       724
           1       0.65      0.91      0.76       112

    accuracy                           0.92       836
   macro avg       0.82      0.92      0.86       836
weighted avg       0.94      0.92      0.93       836



###bert - rozszerzenie lstm

In [None]:
pretrained_model = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()

        self.lstm1 = nn.LSTM(768, 512, batch_first=True)
        self.lstm2 = nn.LSTM(512, 256, batch_first=True)

        self.lin1 = nn.Linear(256, 2)

        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)

        x, _ = self.lstm1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)

        x, _ = self.lstm2(x)
        x = self.relu(x)
        x = self.dropout(x)

        x = self.lin1(x)
        x = self.relu(x)
        x = self.dropout(x)

        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)
model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()
    valid_loss, _ = evaluate()

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.693
Validation Loss: 0.693

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.693
Validation Loss: 0.693

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.693
Validation Loss: 0.692

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.692
Validation Loss: 0.690

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.690
Validation Loss: 0.688

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.689
Validation Loss: 0.687

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.688
Validation Loss: 0.686

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.686
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert conv1d')
print(classification_report(test_y, preds))

bert conv1d
              precision    recall  f1-score   support

           0       0.98      0.76      0.85       724
           1       0.36      0.88      0.51       112

    accuracy                           0.77       836
   macro avg       0.67      0.82      0.68       836
weighted avg       0.89      0.77      0.81       836



##7. Zbadać wpływ maksymalnej długości tekstu oraz różnych strategii paddingu dla każdego z wykorzystanych wstępnie wyuczonych modeli (20 pkt)


###bert

####max text

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
pretrained_model = AutoModel.from_pretrained("bert-base-uncased")

In [None]:
max_length = max(len(tokenizer.tokenize(text)) for text in train_text.tolist())

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 10

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 10


KeyboardInterrupt: 

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert max len')
print(classification_report(test_y, preds))

bert max len
              precision    recall  f1-score   support

           0       0.99      0.88      0.93       724
           1       0.54      0.94      0.69       112

    accuracy                           0.89       836
   macro avg       0.77      0.91      0.81       836
weighted avg       0.93      0.89      0.90       836



####max=10

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
pretrained_model = AutoModel.from_pretrained("bert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.689
Validation Loss: 0.674

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.671
Validation Loss: 0.656

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.661
Validation Loss: 0.646

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.650
Validation Loss: 0.637

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.642
Validation Loss: 0.623

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.633
Validation Loss: 0.613

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.626
Validation Loss: 0.603

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.614
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert max=10')
print(classification_report(test_y, preds))

bert max=10
              precision    recall  f1-score   support

           0       0.96      0.80      0.87       724
           1       0.37      0.79      0.51       112

    accuracy                           0.79       836
   macro avg       0.67      0.79      0.69       836
weighted avg       0.88      0.79      0.82       836



####max=50

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
pretrained_model = AutoModel.from_pretrained("bert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.658
Validation Loss: 0.621

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.612
Validation Loss: 0.577

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.567
Validation Loss: 0.534

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.531
Validation Loss: 0.497

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.495
Validation Loss: 0.456

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.465
Validation Loss: 0.426

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.438
Validation Loss: 0.404

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.408
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert max=50')
print(classification_report(test_y, preds))

bert max=50
              precision    recall  f1-score   support

           0       0.98      0.92      0.95       724
           1       0.65      0.90      0.75       112

    accuracy                           0.92       836
   macro avg       0.82      0.91      0.85       836
weighted avg       0.94      0.92      0.93       836



####longest padding

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
pretrained_model = AutoModel.from_pretrained("bert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.659
Validation Loss: 0.620

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.598
Validation Loss: 0.569

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.548
Validation Loss: 0.514

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.508
Validation Loss: 0.469

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.468
Validation Loss: 0.426

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.437
Validation Loss: 0.401

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.411
Validation Loss: 0.376

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.392
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('bert longest padding')
print(classification_report(test_y, preds))

bert longest padding
              precision    recall  f1-score   support

           0       0.99      0.92      0.95       724
           1       0.64      0.92      0.75       112

    accuracy                           0.92       836
   macro avg       0.81      0.92      0.85       836
weighted avg       0.94      0.92      0.92       836



###distilbert

####max text

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
pretrained_model = AutoModel.from_pretrained("distilbert-base-uncased")

In [None]:
max_length = max(len(tokenizer.tokenize(text)) for text in train_text.tolist())

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 10

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.659
Validation Loss: 0.596

 Epoch 2 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.547
Validation Loss: 0.479

 Epoch 3 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.440
Validation Loss: 0.376

 Epoch 4 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.348
Validation Loss: 0.294

 Epoch 5 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.280
Validation Loss: 0.232

 Epoch 6 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.224
Validation Loss: 0.188

 Epoch 7 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.196
Validation Loss: 0.161

 Epoch 8 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.162
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('distilbert max len')
print(classification_report(test_y, preds))

distilbert max len
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       724
           1       0.86      0.96      0.91       112

    accuracy                           0.97       836
   macro avg       0.93      0.97      0.95       836
weighted avg       0.98      0.97      0.98       836



####max=10

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
pretrained_model = AutoModel.from_pretrained("distilbert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.672
Validation Loss: 0.648

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.632
Validation Loss: 0.603

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.590
Validation Loss: 0.559

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.550
Validation Loss: 0.517

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.511
Validation Loss: 0.481

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.481
Validation Loss: 0.447

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.448
Validation Loss: 0.420

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.426
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('distilbert max=10')
print(classification_report(test_y, preds))

distilbert max=10
              precision    recall  f1-score   support

           0       0.97      0.91      0.94       724
           1       0.60      0.84      0.70       112

    accuracy                           0.90       836
   macro avg       0.79      0.88      0.82       836
weighted avg       0.92      0.90      0.91       836



####max=50

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
pretrained_model = AutoModel.from_pretrained("distilbert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.657
Validation Loss: 0.592

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.539
Validation Loss: 0.472

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.434
Validation Loss: 0.371

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.343
Validation Loss: 0.292

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.278
Validation Loss: 0.234

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.228
Validation Loss: 0.193

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.191
Validation Loss: 0.163

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.172
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('distilbert max=50')
print(classification_report(test_y, preds))

distilbert max=50
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       724
           1       0.86      0.96      0.91       112

    accuracy                           0.97       836
   macro avg       0.93      0.97      0.95       836
weighted avg       0.98      0.97      0.98       836



####longest padding

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
pretrained_model = AutoModel.from_pretrained("distilbert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.639
Validation Loss: 0.577

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.526
Validation Loss: 0.460

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.420
Validation Loss: 0.359

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.333
Validation Loss: 0.278

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.264
Validation Loss: 0.222

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.218
Validation Loss: 0.184

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.182
Validation Loss: 0.155

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.162
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('distilbert longest padding')
print(classification_report(test_y, preds))

distilbert longest padding
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       724
           1       0.87      0.96      0.92       112

    accuracy                           0.98       836
   macro avg       0.93      0.97      0.95       836
weighted avg       0.98      0.98      0.98       836



###roberta

####max text

In [None]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
pretrained_model = AutoModel.from_pretrained("roberta-base")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
max_length = max(len(tokenizer.tokenize(text)) for text in train_text.tolist())

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = max_length,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 10

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.690
Validation Loss: 0.682

 Epoch 2 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.679
Validation Loss: 0.671

 Epoch 3 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.667
Validation Loss: 0.661

 Epoch 4 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.655
Validation Loss: 0.650

 Epoch 5 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.643
Validation Loss: 0.639

 Epoch 6 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.631
Validation Loss: 0.627

 Epoch 7 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.618
Validation Loss: 0.615

 Epoch 8 / 10
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.603
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('roberta max len')
print(classification_report(test_y, preds))

roberta max len
              precision    recall  f1-score   support

           0       0.98      0.99      0.98       724
           1       0.94      0.84      0.89       112

    accuracy                           0.97       836
   macro avg       0.96      0.92      0.94       836
weighted avg       0.97      0.97      0.97       836



####max=10

In [None]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
pretrained_model = AutoModel.from_pretrained("roberta-base")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 10,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.693
Validation Loss: 0.689

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.690
Validation Loss: 0.687

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.688
Validation Loss: 0.685

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.685
Validation Loss: 0.683

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.681
Validation Loss: 0.680

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.680
Validation Loss: 0.677

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.677
Validation Loss: 0.674

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.673
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('roberta max=10')
print(classification_report(test_y, preds))

roberta max=10
              precision    recall  f1-score   support

           0       0.97      0.98      0.97       724
           1       0.85      0.79      0.82       112

    accuracy                           0.95       836
   macro avg       0.91      0.88      0.90       836
weighted avg       0.95      0.95      0.95       836



####max=50

In [None]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
pretrained_model = AutoModel.from_pretrained("roberta-base")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 50,
    padding='longest',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.700
Validation Loss: 0.690

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.679
Validation Loss: 0.672

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.664
Validation Loss: 0.660

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.651
Validation Loss: 0.647

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.639
Validation Loss: 0.634

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.626
Validation Loss: 0.622

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.615
Validation Loss: 0.609

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.599
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('roberta max=50')
print(classification_report(test_y, preds))

roberta max=50
              precision    recall  f1-score   support

           0       0.98      0.99      0.99       724
           1       0.95      0.89      0.92       112

    accuracy                           0.98       836
   macro avg       0.97      0.94      0.95       836
weighted avg       0.98      0.98      0.98       836



####longest padding

In [None]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
pretrained_model = AutoModel.from_pretrained("roberta-base")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = max_length,
    padding='longest',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)



In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.704
Validation Loss: 0.694

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.685
Validation Loss: 0.677

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.668
Validation Loss: 0.664

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.654
Validation Loss: 0.652

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.645
Validation Loss: 0.640

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.632
Validation Loss: 0.629

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.621
Validation Loss: 0.617

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.609
Validat

In [None]:
path = 'saved_weights.pt'
model.load_state_dict(torch.load(path))

with torch.no_grad():
    preds = model(test_seq.to(device), test_mask.to(device))
    preds = preds.detach().cpu().numpy()

preds = np.argmax(preds, axis = 1)
print('roberta longest padding')
print(classification_report(test_y, preds))

roberta longest padding
              precision    recall  f1-score   support

           0       0.98      0.99      0.99       724
           1       0.93      0.89      0.91       112

    accuracy                           0.98       836
   macro avg       0.95      0.94      0.95       836
weighted avg       0.98      0.98      0.98       836



##9. Opracować procedurę ewaluacji jakości działania modeli sieci typu Transformer, uwzględniającą różne metody wizualizacji (np. wykresy, miary, klasy), klasteryzacji, redukcji wymiarów (np. t-SNE), walidacji krzyżowej, wpływu charakterystyki zbioru uczącego na działanie modelu, podatności na semantykę tekstów w zbiorze uczącym i testowym itp.

###Bert

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
pretrained_model = AutoModel.from_pretrained("bert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        _, cls_hs = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)





In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.681
Validation Loss: 0.656

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.650
Validation Loss: 0.630

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.626
Validation Loss: 0.609

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.603
Validation Loss: 0.580

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.578
Validation Loss: 0.556

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.554
Validation Loss: 0.529

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.539
Validation Loss: 0.510

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.515
Validat

In [None]:
val_embeddings, labels = get_text_embeddings(model, val_dataloader)

pca = PCA(n_components=2)
val_pca = pca.fit_transform(val_embeddings)

pca_bert = pd.DataFrame(data={'PC 1': val_pca[:, 0],
                              'PC 2': val_pca[:, 1],
                              'Model': 'Bert',
                              'Label': labels})

fig = px.scatter(pca_bert, x='PC 1', y='PC 2', color='Label',
                 title='Rozmieszczenie PCA tekstów w Bert', width=800)

fig.show()

###Distilbert

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
pretrained_model = AutoModel.from_pretrained("distilbert-base-uncased")

In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)





In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.651
Validation Loss: 0.600

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.560
Validation Loss: 0.507

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.472
Validation Loss: 0.420

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.396
Validation Loss: 0.352

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.337
Validation Loss: 0.301

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.291
Validation Loss: 0.263

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.257
Validation Loss: 0.232

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.228
Validat

In [None]:
val_embeddings, labels = get_text_embeddings(model, val_dataloader)

pca = PCA(n_components=2)
val_pca = pca.fit_transform(val_embeddings)

pca_distilbert = pd.DataFrame(data={'PC 1': val_pca[:, 0],
                              'PC 2': val_pca[:, 1],
                              'Model': 'distilbert',
                              'Label': labels})

fig = px.scatter(pca_bert, x='PC 1', y='PC 2', color='Label',
                 title='Rozmieszczenie PCA tekstów w distilbert', width=800)

fig.show()

###Roberta

In [None]:
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
pretrained_model = AutoModel.from_pretrained("roberta-base")

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
tokens_train = tokenizer.batch_encode_plus(
    train_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_val = tokenizer.batch_encode_plus(
    val_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)

tokens_test = tokenizer.batch_encode_plus(
    test_text.tolist(),
    max_length = 25,
    padding='max_length',
    truncation=True
)


train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

val_seq = torch.tensor(tokens_val['input_ids'])
val_mask = torch.tensor(tokens_val['attention_mask'])
val_y = torch.tensor(val_labels.tolist())

test_seq = torch.tensor(tokens_test['input_ids'])
test_mask = torch.tensor(tokens_test['attention_mask'])
test_y = torch.tensor(test_labels.tolist())

In [None]:
batch_size = 32

train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

val_data = TensorDataset(val_seq, val_mask, val_y)
val_sampler = SequentialSampler(val_data)
val_dataloader = DataLoader(val_data, sampler = val_sampler, batch_size=batch_size)

In [None]:
class Ext_Arch(nn.Module):

    def __init__(self, pretrained_model):
        super(Ext_Arch, self).__init__()

        self.pretrained_model = pretrained_model
        self.dropout = nn.Dropout(0.1)
        self.relu =  nn.ReLU()
        self.fc1 = nn.Linear(768,512)
        self.fc2 = nn.Linear(512,2)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, sent_id, mask):

        cls = self.pretrained_model(sent_id, attention_mask=mask, return_dict=False)
        cls_hs = cls[0][:, 0, :]

        x = self.fc1(cls_hs)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.softmax(x)

        return x

In [None]:
model = Ext_Arch(pretrained_model)

model = model.to(device)

In [None]:
for param in pretrained_model.parameters():
    param.requires_grad = False

optimizer = AdamW(model.parameters(),lr = 1e-5)

class_weights = compute_class_weight('balanced', classes=np.unique(train_labels), y=train_labels)





In [None]:
weights= torch.tensor(class_weights,dtype=torch.float)

weights = weights.to(device)

cross_entropy  = nn.NLLLoss(weight=weights)

epochs = 20

In [None]:
best_valid_loss = float('inf')

train_losses=[]
valid_losses=[]

for epoch in range(epochs):

    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()

    valid_loss, _ = evaluate()

    # zapisanie najlepszego modelu
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'saved_weights.pt')

    train_losses.append(train_loss)
    valid_losses.append(valid_loss)

    print(f'\nTraining Loss: {train_loss:.3f}')
    print(f'Validation Loss: {valid_loss:.3f}')


 Epoch 1 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.688
Validation Loss: 0.684

 Epoch 2 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.681
Validation Loss: 0.677

 Epoch 3 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.674
Validation Loss: 0.670

 Epoch 4 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.666
Validation Loss: 0.663

 Epoch 5 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.662
Validation Loss: 0.657

 Epoch 6 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.653
Validation Loss: 0.649

 Epoch 7 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.645
Validation Loss: 0.641

 Epoch 8 / 20
  Batch    50  of    122.
  Batch   100  of    122.

Evaluating...

Training Loss: 0.637
Validat

In [None]:
val_embeddings, labels = get_text_embeddings(model, val_dataloader)

pca = PCA(n_components=2)
val_pca = pca.fit_transform(val_embeddings)

pca_roberta = pd.DataFrame(data={'PC 1': val_pca[:, 0],
                              'PC 2': val_pca[:, 1],
                              'Model': 'roberta',
                              'Label': labels})

fig = px.scatter(pca_bert, x='PC 1', y='PC 2', color='Label',
                 title='Rozmieszczenie PCA tekstów w roberta', width=800)

fig.show()

###Summary

In [None]:
all_pca = pd.concat([pca_bert, pca_distilbert, pca_roberta])

symbol_mapping = {0: 'circle', 1: 'diamond'}

all_pca['Symbol'] = all_pca['Label'].map(symbol_mapping)

In [None]:
symbol_mapping = {0: 'circle', 1: 'diamond'}
all_pca['Symbol'] = all_pca['Label'].map(symbol_mapping)

size_mapping = {0: 2, 1: 5}
all_pca['Size'] = all_pca['Label'].map(size_mapping)

# Stwórz interaktywny wykres rozmieszczenia z niestandardowymi symbolami i rozmiarami
fig = px.scatter(all_pca, x='PC 1', y='PC 2', color='Model', symbol='Symbol', size='Size',
                 title='Rozmieszczenie PCA tekstów', width=1000, size_max=10)

fig.show()