# Fine Tuning DistilBERT for MultiLabel Text Classification

### Introduction

In this tutorial we will be fine tuning a transformer model for the **Multilabel text classification** problem.
This is one of the most common business problems where a given piece of text/sentence/document needs to be classified into one or more of categories out of the given list. For example a movie can be categorized into 1 or more genres.

#### Flow of the notebook

The notebook will be divided into seperate sections to provide a organized walk through for the process used. This process can be modified for individual use cases. The sections are:

1. [Importing Python Libraries and preparing the environment](#section01)
2. [Importing and Pre-Processing the domain data](#section02)
3. [Preparing the Dataset and Dataloader](#section03)
4. [Creating the Neural Network for Fine Tuning](#section04)
5. [Fine Tuning the Model](#section05)
6. [Validating the Model Performance](#section06)
7. [Saving the model and artifacts for Inference in Future](#section07)

#### Technical Details

This script leverages on multiple tools designed by other teams. Details of the tools used below. Please ensure that these elements are present in your setup to successfully implement this script.

 - Data:
	 - We are using the Jigsaw toxic data from [Kaggle](https://www.kaggle.com/)
     - This is competion provide the souce dataset [Toxic Comment Competition](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)
	 - We are referring only to the first csv file from the data dump: `train.csv`
	 - There are rows of data.  Where each row has the following data-point:
		 - Comment Text
		 - `toxic`
		 - `severe_toxic`
		 - `obscene`
		 - `threat`
		 - `insult`
		 - `identity_hate`

Each comment can be marked for multiple categories. If the comment is `toxic` and `obscene`, then for both those headers the value will be `1` and for the others it will be `0`.


 - Language Model Used:
	 - DistilBERT is a smaller transformer model as compared to BERT or Roberta. It is created by process of distillation applied to Bert.  
	 - [Blog-Post](https://medium.com/huggingface/distilbert-8cf3380435b5)
	 - [Research Paper](https://arxiv.org/pdf/1910.01108)
     - [Documentation for python](https://huggingface.co/transformers/model_doc/distilbert.html)


 - Hardware Requirements:
	 - Python 3.6 and above
	 - Pytorch, Transformers and All the stock Python ML Libraries
	 - GPU enabled setup


 - Script Objective:
	 - The objective of this script is to fine tune DistilBERT to be able to label a comment  into the following categories:
		 - `toxic`
		 - `severe_toxic`
		 - `obscene`
		 - `threat`
		 - `insult`
		 - `identity_hate`

---
***NOTE***
- *It is to be noted that the overall mechanisms for a multiclass and multilabel problems are similar, except for few differences namely:*
	- *Loss function is designed to evaluate all the probability of categories individually rather than as compared to other categories. Hence the use of `BCE` rather than `Cross Entropy` when defining loss.*
	- *Sigmoid of the outputs calcuated to rather than Softmax. Again for the reasons defined in the previous point*
	- *The [loss metrics](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.hamming_loss.html) and **Hamming Score**  are used for direct comparison of expected vs predicted*
---

<a id='section01'></a>
### Importing Python Libraries and preparing the environment

At this step we will be importing the libraries and modules needed to run our script. Libraries are:
* warnings
* Numpy
* Pandas
* tqdm
* scikit-learn metrics
* Pytorch
* Pytorch Utils for Dataset and Dataloader
* Transformers
* DistilBERT Model and Tokenizer
* logging

Followed by that we will preapre the device for CUDA execeution. This configuration is needed if you want to leverage on onboard GPU.

In [None]:
! pip install transformers==3.0.2

Traceback (most recent call last):
  File "/usr/local/bin/pip3", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/main.py", line 78, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/__init__.py", line 114, in create_command
    module = importlib.import_module(module_path)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_

In [None]:
# Importing stock ml libraries
import warnings
warnings.simplefilter('ignore')
import numpy as np
import pandas as pd
from tqdm import tqdm
from sklearn import metrics
import transformers
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers import DistilBertTokenizer, DistilBertModel
import logging
logging.basicConfig(level=logging.ERROR)

In [None]:
# # Setting up the device for GPU usage

from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [None]:
def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):
    acc_list = []
    for i in range(y_true.shape[0]):
        set_true = set( np.where(y_true[i])[0] )
        set_pred = set( np.where(y_pred[i])[0] )
        tmp_a = None
        if len(set_true) == 0 and len(set_pred) == 0:
            tmp_a = 1
        else:
            tmp_a = len(set_true.intersection(set_pred))/\
                    float( len(set_true.union(set_pred)) )
        acc_list.append(tmp_a)
    return np.mean(acc_list)

<a id='section02'></a>
### Importing and Pre-Processing the domain data

We will be working with the data and preparing for fine tuning purposes.
*Assuming that the `train.csv` is already downloaded, unzipped and saved in your `data` folder*

* First step will be to remove the **id** column from the data.
* A new dataframe is made and input text is stored in the **text** column.
* The values of all the categories and coverting it into a list.
* The list is appened as a new column names as **labels**.

In [None]:
data = pd.read_csv('train.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

In [None]:
import pandas as pd

splits = {'train': 'train.jsonl', 'validation': 'validation.jsonl', 'test': 'test.jsonl'}
df = pd.read_json("hf://datasets/SetFit/emotion/" + splits["train"], lines=True)

In [None]:
df

Unnamed: 0,text,label,label_text
0,i didnt feel humiliated,0,sadness
1,i can go from feeling so hopeless to so damned...,0,sadness
2,im grabbing a minute to post i feel greedy wrong,3,anger
3,i am ever feeling nostalgic about the fireplac...,2,love
4,i am feeling grouchy,3,anger
...,...,...,...
15995,i just had a very brief time in the beanbag an...,0,sadness
15996,i am now turning and i feel pathetic that i am...,0,sadness
15997,i feel strong and good overall,1,joy
15998,i feel like this was such a rude comment and i...,3,anger


In [None]:
# One-hot encode the 'label_text' column using get_dummies
dummies = pd.get_dummies(df['label_text'], dtype=int)  # dtype=int ensures 1 and 0 instead of True/False

# Concatenate the one-hot encoded columns back to the original DataFrame
df = pd.concat([df, dummies], axis='columns')


In [None]:
df

Unnamed: 0,text,label,label_text,anger,fear,joy,love,sadness,surprise
0,i didnt feel humiliated,0,sadness,0,0,0,0,1,0
1,i can go from feeling so hopeless to so damned...,0,sadness,0,0,0,0,1,0
2,im grabbing a minute to post i feel greedy wrong,3,anger,1,0,0,0,0,0
3,i am ever feeling nostalgic about the fireplac...,2,love,0,0,0,1,0,0
4,i am feeling grouchy,3,anger,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...
15995,i just had a very brief time in the beanbag an...,0,sadness,0,0,0,0,1,0
15996,i am now turning and i feel pathetic that i am...,0,sadness,0,0,0,0,1,0
15997,i feel strong and good overall,1,joy,0,0,1,0,0,0
15998,i feel like this was such a rude comment and i...,3,anger,1,0,0,0,0,0


In [None]:
df.drop(['label_text','label'], inplace=True, axis=1)

In [None]:
df

Unnamed: 0,text,anger,fear,joy,love,sadness,surprise
0,i didnt feel humiliated,0,0,0,0,1,0
1,i can go from feeling so hopeless to so damned...,0,0,0,0,1,0
2,im grabbing a minute to post i feel greedy wrong,1,0,0,0,0,0
3,i am ever feeling nostalgic about the fireplac...,0,0,0,1,0,0
4,i am feeling grouchy,1,0,0,0,0,0
...,...,...,...,...,...,...,...
15995,i just had a very brief time in the beanbag an...,0,0,0,0,1,0
15996,i am now turning and i feel pathetic that i am...,0,0,0,0,1,0
15997,i feel strong and good overall,0,0,1,0,0,0
15998,i feel like this was such a rude comment and i...,1,0,0,0,0,0


In [None]:
new_df = pd.DataFrame()
new_df['text'] = df['text']
new_df['labels'] = df.iloc[:, 1:].values.tolist()

In [None]:
new_df.head()

Unnamed: 0,text,labels
0,i didnt feel humiliated,"[0, 0, 0, 0, 1, 0]"
1,i can go from feeling so hopeless to so damned...,"[0, 0, 0, 0, 1, 0]"
2,im grabbing a minute to post i feel greedy wrong,"[1, 0, 0, 0, 0, 0]"
3,i am ever feeling nostalgic about the fireplac...,"[0, 0, 0, 1, 0, 0]"
4,i am feeling grouchy,"[1, 0, 0, 0, 0, 0]"


In [None]:
import pandas as pd

df = pd.read_csv("hf://datasets/valurank/Emotion_headline/Emotion_headline.csv")

In [None]:
new_df = pd.DataFrame()
new_df['text'] = df['Headline']
new_df['labels'] = df.iloc[:, 1:].values.tolist()

In [None]:
new_df.head()

Unnamed: 0,text,labels
0,Britt beats Brooks in Alabama Senate runoff,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]"
1,"2024 Watch: Trump, Pence praise Supreme Court ...","[0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]"
2,"Report dives into Latinos’ trust in science, r...","[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]"
3,Woman 'tries to pay hitman to kill Sports Dire...,"[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
4,"Four worst moves of the Cowboys’ offseason, in...","[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"


In [None]:
# Sections of config

# Defining some key variables that will be used later on in the training
MAX_LEN = 128
TRAIN_BATCH_SIZE = 4
VALID_BATCH_SIZE = 4
LEARNING_RATE = 1e-05
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', truncation=True, do_lower_case=True)

In [None]:
class MultiLabelDataset(Dataset):

    def __init__(self, dataframe, tokenizer, max_len):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.text = dataframe.text
        self.targets = self.data.labels
        self.max_len = max_len

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        text = str(self.text[index])
        text = " ".join(text.split())

        inputs = self.tokenizer.encode_plus(
            text,
            None,
            add_special_tokens=True,
            max_length=self.max_len,
            pad_to_max_length=True,
            return_token_type_ids=True
        )
        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        token_type_ids = inputs["token_type_ids"]


        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
            'targets': torch.tensor(self.targets[index], dtype=torch.float)
        }

In [None]:
# Creating the dataset and dataloader for the neural network

train_size = 0.8
train_data=new_df.sample(frac=train_size,random_state=200)
test_data=new_df.drop(train_data.index).reset_index(drop=True)
train_data = train_data.reset_index(drop=True)


print("FULL Dataset: {}".format(new_df.shape))
print("TRAIN Dataset: {}".format(train_data.shape))
print("TEST Dataset: {}".format(test_data.shape))

training_set = MultiLabelDataset(train_data, tokenizer, MAX_LEN)
testing_set = MultiLabelDataset(test_data, tokenizer, MAX_LEN)

FULL Dataset: (16000, 2)
TRAIN Dataset: (12800, 2)
TEST Dataset: (3200, 2)


In [None]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

training_loader = DataLoader(training_set, **train_params)
testing_loader = DataLoader(testing_set, **test_params)

<a id='section04'></a>
### Creating the Neural Network for Fine Tuning

#### Neural Network
 - We will be creating a neural network with the `DistilBERTClass`.
 - This network will have the `DistilBERT` model.  Follwed by a `Droput` and `Linear Layer`. They are added for the purpose of **Regulariaztion** and **Classification** respectively.
 - In the forward loop, there are 2 output from the `DistilBERTClass` layer.
 - The second output `output_1` or called the `pooled output` is passed to the `Drop Out layer` and the subsequent output is given to the `Linear layer`.
 - Keep note the number of dimensions for `Linear Layer` is **6** because that is the total number of categories in which we are looking to classify our model.
 - The data will be fed to the `DistilBERTClass` as defined in the dataset.
 - Final layer outputs is what will be used to calcuate the loss and to determine the accuracy of models prediction.
 - We will initiate an instance of the network called `model`. This instance will be used for training and then to save the final trained model for future inference.

#### Loss Function and Optimizer
 - The Loss is defined in the next cell as `loss_fn`.
 - As defined above, the loss function used will be a combination of Binary Cross Entropy which is implemented as [BCELogits Loss](https://pytorch.org/docs/stable/nn.html#bcewithlogitsloss) in PyTorch
 - `Optimizer` is defined in the next cell.
 - `Optimizer` is used to update the weights of the neural network to improve its performance.

In [None]:
# Creating the customized model, by adding a drop out and a dense layer on top of distil bert to get the final output for the model.

class DistilBERTClass(torch.nn.Module):
    def __init__(self):
        super(DistilBERTClass, self).__init__()
        self.l1 = DistilBertModel.from_pretrained("distilbert-base-uncased")
        self.pre_classifier = torch.nn.Linear(768, 768)
        self.dropout = torch.nn.Dropout(0.1)
        self.classifier = torch.nn.Linear(768, 6) # change (768,n) where n is the no:of emotions

    def forward(self, input_ids, attention_mask, token_type_ids):
        output_1 = self.l1(input_ids=input_ids, attention_mask=attention_mask)
        hidden_state = output_1[0]
        pooler = hidden_state[:, 0]
        pooler = self.pre_classifier(pooler)
        pooler = torch.nn.Tanh()(pooler)
        pooler = self.dropout(pooler)
        output = self.classifier(pooler)
        return output

model = DistilBERTClass()
model.to(device)

DistilBERTClass(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in

In [None]:
def loss_fn(outputs, targets):
    return torch.nn.BCEWithLogitsLoss()(outputs, targets)

In [None]:
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

In [None]:
EPOCHS = 3
def train(epoch):
    model.train()
    for _,data in tqdm(enumerate(training_loader, 0)):
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
        targets = data['targets'].to(device, dtype = torch.float)

        outputs = model(ids, mask, token_type_ids)

        optimizer.zero_grad()
        loss = loss_fn(outputs, targets)
        if _%5000==0:
            print(f'Epoch: {epoch}, Loss:  {loss.item()}')

        loss.backward()
        optimizer.step()

In [None]:
for epoch in range(EPOCHS):
    train(epoch)

2it [00:00, 17.68it/s]

Epoch: 0, Loss:  0.7071778774261475


3200it [03:06, 17.15it/s]
4it [00:00, 17.92it/s]

Epoch: 1, Loss:  0.01288855355232954


3200it [03:05, 17.22it/s]
4it [00:00, 17.83it/s]

Epoch: 2, Loss:  0.009594162926077843


3200it [03:05, 17.25it/s]


In [None]:
model.train()
for i in range(10):
  for _,data in tqdm(enumerate(training_loader, 0)):
          ids = data['ids'].to(device, dtype = torch.long)
          mask = data['mask'].to(device, dtype = torch.long)
          token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
          targets = data['targets'].to(device, dtype = torch.float)

          outputs = model(ids, mask, token_type_ids)

          optimizer.zero_grad()
          loss = loss_fn(outputs, targets)
          if _%5000==0:
              print(f'Epoch: {i}, Loss:  {loss.item()}')

          loss.backward()
          optimizer.step()

0it [00:00, ?it/s]Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
2it [00:00, 16.24it/s]

Epoch: 0, Loss:  0.7061051726341248


5004it [04:59, 17.12it/s]

Epoch: 0, Loss:  0.2233511358499527


5817it [05:46, 16.79it/s]
4it [00:00, 18.35it/s]

Epoch: 1, Loss:  0.18594244122505188


5003it [04:50, 17.06it/s]

Epoch: 1, Loss:  0.14594748616218567


5817it [05:37, 17.23it/s]
4it [00:00, 18.31it/s]

Epoch: 2, Loss:  0.2660398781299591


5003it [04:50, 16.97it/s]

Epoch: 2, Loss:  0.07577978074550629


5817it [05:37, 17.21it/s]
3it [00:00, 19.19it/s]

Epoch: 3, Loss:  0.06570392847061157


5004it [04:50, 17.30it/s]

Epoch: 3, Loss:  0.05684646591544151


5817it [05:37, 17.22it/s]
3it [00:00, 18.94it/s]

Epoch: 4, Loss:  0.042901381850242615


5003it [04:50, 17.37it/s]

Epoch: 4, Loss:  0.054703108966350555


5817it [05:37, 17.23it/s]
4it [00:00, 18.26it/s]

Epoch: 5, Loss:  0.02771526388823986


5004it [04:50, 17.36it/s]

Epoch: 5, Loss:  0.04838643968105316


5817it [05:37, 17.25it/s]
4it [00:00, 17.95it/s]

Epoch: 6, Loss:  0.0437968485057354


5004it [04:50, 16.71it/s]

Epoch: 6, Loss:  0.04676428437232971


5817it [05:37, 17.25it/s]
4it [00:00, 18.18it/s]

Epoch: 7, Loss:  0.030532604083418846


5003it [04:50, 17.19it/s]

Epoch: 7, Loss:  0.05094043165445328


5817it [05:37, 17.23it/s]
3it [00:00, 19.37it/s]

Epoch: 8, Loss:  0.0779891312122345


5003it [04:49, 17.10it/s]

Epoch: 8, Loss:  0.15734344720840454


5817it [05:37, 17.25it/s]
3it [00:00, 19.14it/s]

Epoch: 9, Loss:  0.007869182154536247


5003it [04:50, 17.43it/s]

Epoch: 9, Loss:  0.02529795467853546


5817it [05:37, 17.24it/s]


<a id='section06'></a>
### Validating the Model

During the validation stage we pass the unseen data(Testing Dataset) to the model. This step determines how good the model performs on the unseen data.

This unseen data is the 20% of `train.csv` which was seperated during the Dataset creation stage.
During the validation stage the weights of the model are not updated. Only the final output is compared to the actual value. This comparison is then used to calcuate the accuracy of the model.

As defined above to get a measure of our models performance we are using the following metrics.
- Hamming Score
- Hamming Loss


In [None]:
def validation(testing_loader):
    model.eval()
    fin_targets=[]
    fin_outputs=[]
    with torch.no_grad():
        for _, data in tqdm(enumerate(testing_loader, 0)):
            ids = data['ids'].to(device, dtype = torch.long)
            mask = data['mask'].to(device, dtype = torch.long)
            token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
            targets = data['targets'].to(device, dtype = torch.float)
            outputs = model(ids, mask, token_type_ids)
            fin_targets.extend(targets.cpu().detach().numpy().tolist())
            fin_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist())
    return fin_outputs, fin_targets

In [None]:
outputs, targets = validation(testing_loader)

final_outputs = np.array(outputs) >=0.5

800it [00:15, 52.81it/s]


In [None]:

final_outputs

array([[False, False, False,  True, False, False],
       [False, False, False, False,  True, False],
       [False, False,  True, False, False, False],
       ...,
       [False, False,  True, False, False, False],
       [False,  True, False, False, False,  True],
       [False, False,  True, False, False, False]])

In [None]:
val_hamming_loss = metrics.hamming_loss(targets, final_outputs)
val_hamming_score = hamming_score(np.array(targets), np.array(final_outputs))

print(f"Hamming Score = {val_hamming_score}")
print(f"Hamming Loss = {val_hamming_loss}")

Hamming Score = 0.93484375
Hamming Loss = 0.020885416666666667


In [None]:
model.eval()

DistilBERTClass(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in

<a id='section07'></a>
### Saving the Trained Model for inference

This is the final step in the process of fine tuning the model.

The model and its vocabulary are saved locally. These files are then used in the future to make inference on new inputs of news headlines.

In [None]:
# Saving the files for inference

output_model_file = './models1/pytorch_distilbert_news.bin'
output_vocab_file = './models1/vocab_distilbert_news.bin'

torch.save(model, output_model_file)
tokenizer.save_vocabulary(output_vocab_file)

print('Saved')

Saved


In [None]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Paths to the saved files
output_model_file = './models1/pytorch_distilbert_news.bin'
output_vocab_file = './models1/vocab_distilbert_news.bin'

# Load the model
model = torch.load(output_model_file)

# Check if CUDA is available and set device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)  # Move model to the appropriate device
model.eval()  # Set the model to evaluation mode

# Load the tokenizer
tokenizer = DistilBertTokenizer(vocab_file=output_vocab_file, truncation=True, do_lower_case=True)
#tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', truncation=True, do_lower_case=True)

# Define the emotion labels (these should correspond to your classes)
emotion_labels = ['anger', 'fear', 'joy' , 'love' , 'sadness' , 'suprise']

# Function to perform multi-label classification inference
def predict(text, threshold=0.5):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

    # Move inputs to the same device as the model
    for key in inputs.keys():
        inputs[key] = inputs[key].to(device)

    # Run inference (get logits)
    with torch.no_grad():  # No need for gradients during inference
        outputs = model(**inputs,token_type_ids=None)


    return outputs

# Example input text
text = "Yes mom I won in life"

# Get prediction
outbut = predict(text, threshold=0.5)
print(outbut)


Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


tensor([[-1.7893, -6.0976,  2.3578, -5.2209, -4.3939, -6.4538]],
       device='cuda:0')


In [None]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Paths to the saved files
output_model_file = './models/pytorch_distilbert_news.bin'
output_vocab_file = './models/vocab_distilbert_news.bin'

# Load the model
model = torch.load(output_model_file)

# Check if CUDA is available and set device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)  # Move model to the appropriate device
model.eval()  # Set the model to evaluation mode

# Load the tokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', truncation=True, do_lower_case=True)

# Define the emotion labels (these should correspond to your classes)
emotion_labels = ['anger', 'fear', 'joy' , 'love' , 'sadness' , 'suprise']

# Function to perform multi-label classification inference
def predict(text, threshold=0.5):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

    # Move inputs to the same device as the model
    for key in inputs.keys():
        inputs[key] = inputs[key].to(device)

    # Run inference (get logits)
    with torch.no_grad():  # No need for gradients during inference
        outputs = model(**inputs,token_type_ids=None)


    return outputs

# Example input text
text = "Ukraine war: Soldiers hungry to learn from British Army as they sharpen skills on Salisbury Plain"

# Get prediction
outbut = predict(text, threshold=0.5)
print(outbut)


tensor([[-6.4372, -6.2906, -3.9113, -7.3869, -5.8981, -7.8235, -3.9546, -5.5527,
          6.9318,  1.0175, -5.3168, -5.2629, -4.6515]], device='cuda:0')


In [None]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Paths to the saved files
output_model_file = './models/pytorch_distilbert_news.bin'
output_vocab_file = './models/vocab_distilbert_news.bin'

# Load the model
model = torch.load(output_model_file)

# Check if CUDA is available and set device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)  # Move model to the appropriate device
model.eval()  # Set the model to evaluation mode

# Load the tokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', truncation=True, do_lower_case=True)

# Define the emotion labels (these should correspond to your classes)
emotion_labels = ['anger', 'fear', 'joy' , 'love' , 'sadness' , 'suprise']

# Function to perform multi-label classification inference
def predict(text, threshold=0.5):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

    # Move inputs to the same device as the model
    for key in inputs.keys():
        inputs[key] = inputs[key].to(device)

    # Run inference (get logits)
    with torch.no_grad():  # No need for gradients during inference
        outputs = model(**inputs,token_type_ids=None)


    return outputs

# Example input text
text = "I am feeling very mad"

# Get prediction
outbut = predict(text, threshold=0.5)
print(outbut)


tensor([[ 3.8319, -4.0123, -4.2141, -5.0378, -4.4937, -4.2525]],
       device='cuda:0')


In [None]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Paths to the saved files
output_model_file = './models/pytorch_distilbert_news.bin'
output_vocab_file = './models/vocab_distilbert_news.bin'

# Load the model
model = torch.load(output_model_file)

# Check if CUDA is available and set device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)  # Move model to the appropriate device
model.eval()  # Set the model to evaluation mode

# Load the tokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', truncation=True, do_lower_case=True)

# Define the emotion labels (these should correspond to your classes)
emotion_labels = ['anger', 'fear', 'joy' , 'love' , 'sadness' , 'suprise']

# Function to perform multi-label classification inference
def predict(text, threshold=0.5):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

    # Move inputs to the same device as the model
    for key in inputs.keys():
        inputs[key] = inputs[key].to(device)

    # Run inference (get logits)
    with torch.no_grad():  # No need for gradients during inference
        outputs = model(**inputs,token_type_ids=None)


    return outputs

# Example input text
text = "Shocking news I did not expect that"

# Get prediction
outbut = predict(text, threshold=0.5)
print(outbut)


tensor([[-0.1613, -4.2545, -4.0753, -4.5225, -2.4708, -4.9678]],
       device='cuda:0')
