![picture](https://drive.google.com/uc?export=view&id=1eCsjNAtjXuXfqBLxeEnsBpOikUO06msr)

<br>

---
---

<div class="alert alert-block alert-warning">
<h1><span style="color:green"> Foundations of Artificial Intelligence<br> (AI701-Fall2022) </span><h1>

<h2><span style="color:green"> Lab-13 </span><h2>
</div>

---
---

# Sentiment Analysis Using BERT
#### **What is Sentiment Analysis?**
Sentiment analysis is a technique that detects the underlying sentiment in a piece of text. It is the process of classifying text as either positive, negative, or neutral. Machine learning and Deep learning techniques are used to evaluate a piece of text and determine the sentiment behind it.
#### **Why is sentiment analysis useful?**
Sentiment analysis is essential for businesses to gauge customer response.

Imagine this: Your company has just released a new product that is being advertised on a number of different channels.

In order to gauge customer’s response to this product, sentiment analysis can be performed. Customers usually talk about products on social media and customer feedback forums. This data can be collected and analyzed to gauge overall customer response.

### **BERT**
Link to the [paper](https://arxiv.org/pdf/1810.04805.pdf).  
*Bidirectional Encoder Representations from Transformers* is a recent paper published by researchers at Google AI Language. It has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks. 
#### **How BERT works?**
BERT makes use of Transformer, an attention mechanism that learns contextual relations between words (or sub-words) in a text. In its vanilla form, Transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task. Since BERT’s goal is to generate a language model, only the encoder mechanism is necessary. The detailed workings of Transformer are described in a paper by Google.

## **Lab Tasks**
In this lab we will continue looking into the Hugging Face Transformers Python library. We will implement a Multi-Class Sentiment Analysis model using the pretrained `BertTokenizer` and `BertForSequenceClassification`.

**Sections**
1. Install and Import Libraries
2. Download Dataset and Connect to Drive
3. Hyperparameters
4. Data Preprocessing
5. Train and Validation Split
6. Pretrained Tokenizer and Encoding Data
7. Initialize Pretrained Classifcation Model
8. DataLoaders, Optimizer and Scheduler
9. Training Loop
10. Evaluation


##1. Install and Import Libraries
- Uncomment and install transformers and datasets library

In [2]:
# !pip install transformers
# !pip install datasets

In [3]:
import torch
import datasets
import transformers

print(transformers.__version__)
print(torch.__version__)
print(datasets.__version__)

4.24.0
1.12.1+cu113
2.7.0


In [4]:
import random
import pandas as pd
import numpy as np
from tqdm import tqdm

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

from transformers import BertTokenizer
from transformers import BertForSequenceClassification
from transformers import get_linear_schedule_with_warmup

from torch.optim import AdamW
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

##2. Download Dataset and Connect to Drive
- Download dataset.zip from the following [link](www.google.com).
- Add it to your google drive or local drive
- Give the path to the folder containing the .zip file and set it as `dir`

In [5]:
import os
import zipfile

GOOGLE_DRIVE = True # If running on Google Colab

if GOOGLE_DRIVE == True:
    # If using Google Colab add the glove.6B.100d.txt.zip to your My Drive
    from google.colab import drive
    drive.mount('drive')

    dir = "/content/drive/My Drive/"
    files = os.listdir(dir)
    print(files)

    zip_ref = zipfile.ZipFile(dir + 'dataset.zip', 'r')
    zip_ref.extractall("/tmp/")
    zip_ref.close()

    csv_dir = os.path.join('/tmp/', 'smile-annotations-final.csv')
    print(csv_dir)

else: # if running on local environment
    dir = os.getcwd()
    files = os.listdir(dir)
    print(files)
    
    zip_ref = zipfile.ZipFile(dir + '/' + 'dataset.zip', 'r')
    zip_ref.extractall(os.getcwd())
    zip_ref.close()
    
    csv_dir = os.path.join(dir, 'smile-annotations-final.csv')
    print(csv_dir)

Mounted at drive
['dataset.zip', 'Object_Orient_Programming.ipynb', 'MultiClass_Sentiment_Classification.ipynb']
/tmp/smile-annotations-final.csv


##3. Hyperparameters

In [6]:
# Here are the hyperparameters you will use throughout the notebook
# Feel free to play around with them and see if you can increase the per_class accuracy and f1-score
max_seq_length = 256

train_batch_size = 4
val_batch_size = 32

learning_rate = 1e-5
epsilon = 1e-8

epochs = 10

seed_val = 42

test_size=0.15

##4. Data Preprocessing
- Here we use `pandas` library to do most of our preprocessing.
- The data processing section is already done for you.
- However it's advised to go over and understand what's happening.

In [7]:
df = pd.read_csv(csv_dir, names=['id', 'text', 'category'])
df.set_index('id', inplace=True)
df

Unnamed: 0_level_0,text,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
611857364396965889,@aandraous @britishmuseum @AndrewsAntonio Merc...,nocode
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,happy
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,happy
614877582664835073,@Sofabsports thank you for following me back. ...,happy
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,happy
...,...,...
613678555935973376,MT @AliHaggett: Looking forward to our public ...,happy
613294681225621504,@britishmuseum Upper arm guard?,nocode
615246897670922240,@MrStuchbery @britishmuseum Mesmerising.,happy
613016084371914753,@NationalGallery The 2nd GENOCIDE against #Bia...,not-relevant


In [8]:
df.category.value_counts()

nocode               1572
happy                1137
not-relevant          214
angry                  57
surprise               35
sad                    32
happy|surprise         11
happy|sad               9
disgust|angry           7
disgust                 6
sad|disgust             2
sad|angry               2
sad|disgust|angry       1
Name: category, dtype: int64

In [9]:
# We will only use the top 6 labels and ignore the rest
df = df[df.category.isin(['happy', 'not-relevant', 'angry', 'surprise', 'sad', 'disgust'])]
df.category.value_counts()

happy           1137
not-relevant     214
angry             57
surprise          35
sad               32
disgust            6
Name: category, dtype: int64

In [10]:
len(df)

1481

In [11]:
# We create a label dict so that we can use it to encode the labels
possible_labels = df.category.unique()
label_dict = {}
for index, possible_label in enumerate(possible_labels):
    label_dict[possible_label] = index
label_dict

{'happy': 0,
 'not-relevant': 1,
 'angry': 2,
 'disgust': 3,
 'sad': 4,
 'surprise': 5}

In [12]:
df['category'] = df['category'].map(label_dict)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,text,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,0
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,0
614877582664835073,@Sofabsports thank you for following me back. ...,0
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,0
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,0
...,...,...
611258135270060033,@_TheWhitechapel @Campaignforwool @SlowTextile...,1
612214539468279808,“@britishmuseum: Thanks for ranking us #1 in @...,0
613678555935973376,MT @AliHaggett: Looking forward to our public ...,0
615246897670922240,@MrStuchbery @britishmuseum Mesmerising.,0


In [13]:
df['category'].value_counts()

0    1137
1     214
2      57
5      35
4      32
3       6
Name: category, dtype: int64

##5. Train and Validation Split
- Use sklearn's train_test_split functionality to get the train and validation split.
- Use the stratify parameter to help get a stratified split based on the 5 different categories
- To achieve the same results as below make sure to use the `seed_val` as the `random_state`.  
[train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

In [14]:
X_train, X_val, y_train, y_val = train_test_split(df.index.values, 
                                                  df.category.values, 
                                                  test_size= test_size, 
                                                  random_state=seed_val,
                                                  stratify=df.category.values) # Here we are splitting according to the ratio of class labels in our dataset

X_train.shape, X_val.shape, y_train.shape, y_val.shape

((1258,), (223,), (1258,), (223,))

In [15]:
df['data_type'] = 'not_set'
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,text,category,data_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,0,not_set
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,0,not_set
614877582664835073,@Sofabsports thank you for following me back. ...,0,not_set
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,0,not_set
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,0,not_set
...,...,...,...
611258135270060033,@_TheWhitechapel @Campaignforwool @SlowTextile...,1,not_set
612214539468279808,“@britishmuseum: Thanks for ranking us #1 in @...,0,not_set
613678555935973376,MT @AliHaggett: Looking forward to our public ...,0,not_set
615246897670922240,@MrStuchbery @britishmuseum Mesmerising.,0,not_set


In [16]:
df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


Unnamed: 0_level_0,text,category,data_type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
614484565059596288,Dorian Gray with Rainbow Scarf #LoveWins (from...,0,train
614746522043973632,@SelectShowcase @Tate_StIves ... Replace with ...,0,train
614877582664835073,@Sofabsports thank you for following me back. ...,0,train
611932373039644672,@britishmuseum @TudorHistory What a beautiful ...,0,train
611570404268883969,@NationalGallery @ThePoldarkian I have always ...,0,train
...,...,...,...
611258135270060033,@_TheWhitechapel @Campaignforwool @SlowTextile...,1,train
612214539468279808,“@britishmuseum: Thanks for ranking us #1 in @...,0,train
613678555935973376,MT @AliHaggett: Looking forward to our public ...,0,train
615246897670922240,@MrStuchbery @britishmuseum Mesmerising.,0,train


In [17]:
df.groupby(['category', 'data_type']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,text
category,data_type,Unnamed: 2_level_1
0,train,966
0,val,171
1,train,182
1,val,32
2,train,48
2,val,9
3,train,5
3,val,1
4,train,27
4,val,5


##6. Pretrained Tokenizer and Encoding Data
- First we will instantiate our tokenizer from the pretrained `BertTokenizer`
- Then we will encode the training and validation data using `tokenizer.batch_encode_plus`. The training one is done for you.

[BertTokenizer](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)

[batch_encode_plus](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.BatchEncoding)

In [18]:
tokenizer = BertTokenizer.from_pretrained(
    'bert-base-uncased',
    do_lower_case=True
)

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [20]:
# For pretty printing
def print_encoding(model_inputs, indent=4):
    indent_str = " " * indent
    print("{")
    for k, v in model_inputs.items():
        print(indent_str + k + ":")
        print(indent_str + indent_str + str(v))
    print("}")

In [21]:
random_output = tokenizer(["I'm excited to learn about Hugging Face Transformers!"])
print_encoding(random_output)


{
    input_ids:
        [[101, 1045, 1005, 1049, 7568, 2000, 4553, 2055, 17662, 2227, 19081, 999, 102]]
    token_type_ids:
        [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
    attention_mask:
        [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
}


### Last time our id's were like this.
This is to show that the special tokens, tokenization algorithm and process are not always the same.

CLS token: < s> | CLS token id: 0. 

SEP token: </s> | CLS token id: 2. 

Pad token: <pad> | Pad token id: 1. 

In [22]:
print(f"CLS token: {tokenizer.cls_token} | CLS token id: {tokenizer.cls_token_id}")
print(f"SEP token: {tokenizer.sep_token} | SEP token id: {tokenizer.sep_token_id}")
print(f"Pad token: {tokenizer.pad_token} | Pad token id: {tokenizer.pad_token_id}")

CLS token: [CLS] | CLS token id: 101
SEP token: [SEP] | SEP token id: 102
Pad token: [PAD] | Pad token id: 0


In [24]:
# Looking at the first 10 examples
df[df.data_type=='val'].text.values[:10]

array(["@RAMMuseum Please vote for us as @sainsbury #sidwell's local charity PRT http://t.co/IguyWk5MJT http://t.co/5kPGLNeW4H",
       'Kudos, @FitzMuseum_UK, for a top-flight WC exhibition!',
       '@stiveshouse @MillenniumArt @Tate_StIves @Tremenheere ooh... Please let us know how it is!!!',
       '@tatteredstones @kettlesyard One of the best places in the country.',
       '@dr_shibley @britishmuseum Awwh - possum!',
       'A week writing reports about reports suddenly brightened by the thought of #DefiningBeauty at @britishmuseum tomorrow with @LittleMissMoo',
       '@britishmuseum we love the badge! Here she is visiting Peel Park in 1851 http://t.co/VONu7avCip',
       "This de Hooch 'Courtyard of a House in Delft' us quite simply my favorite painting in @NationalGallery Thanks for placing it center stage.",
       '@ladywhitepeace1 @britishmuseum actually the longest day this year will be June 30th when the next leap second will be added at 23:59:60',
       '@britishmuseum 

In [25]:
encoded_data_train = tokenizer.batch_encode_plus(
    df[df.data_type=='train'].text.values,
    add_special_tokens=True,
    return_attention_mask=True,
    padding='max_length',
    truncation=True,
    max_length=max_seq_length, # 256
    return_tensors='pt'
)

encoded_data_val = tokenizer.batch_encode_plus(
    df[df.data_type=='val'].text.values,
    add_special_tokens=True,
    return_attention_mask=True,
    padding='max_length',
    truncation=True,
    max_length=max_seq_length, # 256
    return_tensors='pt'
)

In [26]:
encoded_data_train

{'input_ids': tensor([[  101, 16092,  3897,  ...,     0,     0,     0],
        [  101,  1030, 27034,  ...,     0,     0,     0],
        [  101,  1030, 10682,  ...,     0,     0,     0],
        ...,
        [  101, 11047,  1030,  ...,     0,     0,     0],
        [  101,  1030,  3680,  ...,     0,     0,     0],
        [  101,  1030,  2120,  ...,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]])}

In [28]:
# Here we are seperating all of our labels, input_ids and attention_mask into seperate tensors
input_ids_train = encoded_data_train['input_ids']
attention_masks_train = encoded_data_train['attention_mask']
labels_train = torch.tensor(df[df.data_type=='train'].category.values)

input_ids_val = encoded_data_val['input_ids']
attention_masks_val = encoded_data_val['attention_mask']
labels_val = torch.tensor(df[df.data_type=='val'].category.values)

In [29]:
# Here we are converting our encoded data to a TensorDataset object which we can easily feed to our dataloader.
dataset_train = TensorDataset(input_ids_train, 
                              attention_masks_train,
                              labels_train)

dataset_val = TensorDataset(input_ids_val, 
                            attention_masks_val,
                           labels_val)

In [30]:
len(dataset_train)

1258

In [None]:
# Single example of what the dataset_val looks like
dataset_val[0]

(tensor([  101,  1030,  8223,  7606, 14820,  3531,  3789,  2005,  2149,  2004,
          1030, 18952,  3619,  4917,  1001, 15765,  4381,  1005,  1055,  2334,
          5952, 10975,  2102,  8299,  1024,  1013,  1013,  1056,  1012,  2522,
          1013,  1045, 12193,  2100, 26291,  2629,  2213,  3501,  2102,  8299,
          1024,  1013,  1013,  1056,  1012,  2522,  1013,  1019,  2243, 26952,
         19666,  7974,  2549,  2232,   102,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,  

##7. Initialize Pretrained Classifcation Model
- Here we will initialize our model using the pretrained `BertForSequenceClassification`. 
- Make sure to specify the number of labels as a parameter.   
[BertForSequenceClassification](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForSequenceClassification)

In [31]:
model = BertForSequenceClassification.from_pretrained(
                                      'bert-base-uncased', 
                                      num_labels = len(label_dict),
                                      output_attentions = True,
                                      output_hidden_states = True
                                     )

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [40]:
# Taking batch of 2 samples to feed into the model
random_inputs = {'input_ids':      dataset_val[:2][0],
                'attention_mask': dataset_val[:2][1],
                'labels':         dataset_val[:2][2],
                }
random_inputs

{'input_ids': tensor([[  101,  1030,  8223,  7606, 14820,  3531,  3789,  2005,  2149,  2004,
           1030, 18952,  3619,  4917,  1001, 15765,  4381,  1005,  1055,  2334,
           5952, 10975,  2102,  8299,  1024,  1013,  1013,  1056,  1012,  2522,
           1013,  1045, 12193,  2100, 26291,  2629,  2213,  3501,  2102,  8299,
           1024,  1013,  1013,  1056,  1012,  2522,  1013,  1019,  2243, 26952,
          19666,  7974,  2549,  2232,   102,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,    

In [44]:
# Loss
random_output = model(**random_inputs)
random_output[0]

tensor(1.5694, grad_fn=<NllLossBackward0>)

In [45]:
# Logits
random_output[1].shape

torch.Size([2, 6])

In [51]:
# Hidden states
# (batch_size, sequence_length, hidden_size)
# one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer
# random_output[2][0-12].shape
random_output[2][12].shape # last hidden layer

torch.Size([2, 256, 768])

##8. DataLoaders, Optimizer and Scheduler
- In this section we will create the train and val dataloaders
- Use the `RandomSampler` for the `dataloader_train` and `SequentialSampler` for the `dataloader_val`. Batch sizes are mentioned in the Hyperparameter section (Feel free to choose you own if you wish to).

- We will also initialize an AdamW optimizer and a scheduler for this section.


[DataLoader](https://pytorch.org/docs/stable/data.html#module-torch.utils.data).  
[AdamW Optimizer](https://pytorch.org/docs/stable/data.html#module-torch.utils.data).   
[get_linear_schedule_with_warmup](https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.get_linear_schedule_with_warmup)

In [52]:
# Creating DataLoaders
dataloader_train = DataLoader(
    dataset_train,
    sampler=RandomSampler(dataset_train),
    batch_size=train_batch_size
)

dataloader_val = DataLoader(
    dataset_val,
    sampler=SequentialSampler(dataset_val),
    batch_size=val_batch_size
)

In [56]:
vars(dataloader_train)

{'dataset': <torch.utils.data.dataset.TensorDataset at 0x7f0cc1cb7ad0>,
 'num_workers': 0,
 'prefetch_factor': 2,
 'pin_memory': False,
 'pin_memory_device': '',
 'timeout': 0,
 'worker_init_fn': None,
 '_DataLoader__multiprocessing_context': None,
 '_dataset_kind': 0,
 'batch_size': 4,
 'drop_last': False,
 'sampler': <torch.utils.data.sampler.RandomSampler at 0x7f0cc145b3d0>,
 'batch_sampler': <torch.utils.data.sampler.BatchSampler at 0x7f0cc145b2d0>,
 'generator': None,
 'collate_fn': <function torch.utils.data._utils.collate.default_collate(batch)>,
 'persistent_workers': False,
 '_DataLoader__initialized': True,
 '_IterableDataset_len_called': None,
 '_iterator': None}

In [55]:
next(iter(dataloader_train))

[tensor([[  101,  1001,  6284,  ...,     0,     0,     0],
         [  101,  1030, 11350,  ...,     0,     0,     0],
         [  101,  2061,  7568,  ...,     0,     0,     0],
         [  101,  1030, 11503,  ...,     0,     0,     0]]),
 tensor([[1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0]]),
 tensor([0, 0, 0, 0])]

In [57]:
# Creating an optimizer and linear scheduler
optimizer = AdamW(
    model.parameters(),
    lr = learning_rate,
    eps = epsilon
)

# Here you can read up on different learning rate schedulers
# https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.SchedulerType
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0,
    num_training_steps = len(dataloader_train)*epochs
)

##9. Training Loop
- Here is where the magic happens
- In this section you will write the code for the evaluation step
- The training step is already written for you, and the evaluate function is called in the training step.

In [58]:
# Set the seed so results are reproducible
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
print(device)

cuda


In [None]:
def f1_score_func(preds, labels):
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return f1_score(labels_flat, preds_flat, average = 'weighted')

In [None]:
def evaluate(dataloader_val):

    model.eval() # Make sure your model is in eval mode
    
    loss_val_total = 0
    predictions, true_vals = [], []
    
    for batch in dataloader_val: # Each batch of size (32) val_batch_size 
        
        batch = tuple(b.to(device) for b in batch) # 
        
        inputs = {'input_ids':      batch[0],
                  'attention_mask': batch[1],
                  'labels':         batch[2],
                 }

        with torch.no_grad():        
            outputs = model(**inputs)
            
        loss = outputs[0]
        logits = outputs[1]
        loss_val_total += loss.item()

        logits = logits.detach().cpu().numpy()
        label_ids = inputs['labels'].cpu().numpy()
        predictions.append(logits)
        true_vals.append(label_ids)
    
    loss_val_avg = loss_val_total/len(dataloader_val) 
    
    predictions = np.concatenate(predictions, axis=0)
    true_vals = np.concatenate(true_vals, axis=0)
            
    return loss_val_avg, predictions, true_vals


In [None]:
for epoch in tqdm(range(1, epochs+1)):
    model.train()
    loss_train_total = 0
    
    for batch in dataloader_train:
        model.zero_grad()
        batch = tuple(b.to(device) for b in batch)
        inputs = {
            'input_ids': batch[0],
            'attention_mask': batch[1],
            'labels': batch[2]
        }
        
        outputs = model(**inputs)
        loss = outputs[0]
        loss_train_total +=loss.item()
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        
        optimizer.step()
        scheduler.step()
        
    
    loss_train_avg = loss_train_total/len(dataloader_train)
    val_loss, predictions, true_vals = evaluate(dataloader_val)
    val_f1 = f1_score_func(predictions, true_vals)
    
    print('\n')
    print(f'Training loss: {loss_train_avg}')
    print(f'Validation loss: {val_loss}')
    print(f'F1 Score (weighted): {val_f1}')

 10%|█         | 1/10 [01:11<10:45, 71.72s/it]

Training loss: 0.4160490727746889
Validation loss: 0.5217188979898181
F1 Score (weighted): 0.8448362856698388


 20%|██        | 2/10 [02:23<09:35, 71.89s/it]

Training loss: 0.2819987665558796
Validation loss: 0.5831213210310254
F1 Score (weighted): 0.8647375419796943


 30%|███       | 3/10 [03:35<08:21, 71.63s/it]

Training loss: 0.2003306888236058
Validation loss: 0.5241365730762482
F1 Score (weighted): 0.8698563647921927


 40%|████      | 4/10 [04:46<07:08, 71.50s/it]

Training loss: 0.1467235044879277
Validation loss: 0.5237758564097541
F1 Score (weighted): 0.8730602832449538


 50%|█████     | 5/10 [05:57<05:57, 71.49s/it]

Training loss: 0.09782353054347728
Validation loss: 0.5386862754821777
F1 Score (weighted): 0.8643619139780658


 60%|██████    | 6/10 [07:09<04:45, 71.46s/it]

Training loss: 0.07362441378072762
Validation loss: 0.5460513021264758
F1 Score (weighted): 0.8672862163834301


 70%|███████   | 7/10 [08:20<03:34, 71.40s/it]

Training loss: 0.060112493098639544
Validation loss: 0.5465028317911285
F1 Score (weighted): 0.868708782732374


 80%|████████  | 8/10 [09:31<02:22, 71.35s/it]

Training loss: 0.044929447308522726
Validation loss: 0.5559744558164051
F1 Score (weighted): 0.8697478304525038


 90%|█████████ | 9/10 [10:43<01:11, 71.33s/it]

Training loss: 0.038530970653814695
Validation loss: 0.5592575924737113
F1 Score (weighted): 0.868213456585959


100%|██████████| 10/10 [11:54<00:00, 71.44s/it]

Training loss: 0.03596847862973514
Validation loss: 0.559837458389146
F1 Score (weighted): 0.868213456585959





##10. Evaluation

In [None]:
def accuracy_per_class(preds, labels):
    label_dict_inverse = {v: k for k, v in label_dict.items()}
    
    preds_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    
    for label in np.unique(labels_flat):
        y_preds = preds_flat[labels_flat==label]
        y_true = labels_flat[labels_flat==label]
        print(f'Class: {label_dict_inverse[label]}')
        print(f'Accuracy:{len(y_preds[y_preds==label])}/{len(y_true)}\n')

In [None]:
accuracy_per_class(predictions, true_vals)

Class: happy
Accuracy:163/171

Class: not-relevant
Accuracy:24/32

Class: angry
Accuracy:4/9

Class: disgust
Accuracy:0/1

Class: sad
Accuracy:0/5

Class: surprise
Accuracy:3/5



## The End