Loading and Preprocessing

In [None]:
!pip install SentencePiece
!pip install transformers
!pip install git+https://github.com/PytorchLightning/pytorch-lightning.git@master --upgrade
!pip install pytorch-lightning

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting SentencePiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 7.3 MB/s 
[?25hInstalling collected packages: SentencePiece
Successfully installed SentencePiece-0.1.97
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 34.1 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 50.3 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 71.0 MB/s 
Installi

In [None]:
import pandas as pd
import re
from google.colab import drive
import sklearn
from sklearn.model_selection import train_test_split

from transformers import AlbertTokenizer
import torch
from torch.utils.data import DataLoader, TensorDataset
import pytorch_lightning as pl

In [None]:
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
df = pd.read_csv('gdrive/MyDrive/hate_speech_data.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,tweet,class
0,0,!!! RT @mayasolovely: As a woman you shouldn't...,0
1,1,""" momma said no pussy cats inside my doghouse """,0
2,2,"""@Addicted2Guys: -SimplyAddictedToGuys http://...",0
3,3,"""@AllAboutManFeet: http://t.co/3gzUpfuMev"" woo...",0
4,4,"""@Allyhaaaaa: Lemmie eat a Oreo &amp; do these...",0


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Function to remove username mentions

def clean_tweet(tweet):
    tweet = tweet.lower().strip()
    tweet = re.sub("(@[A-Za-z0-9]+)", "", tweet) # Removes words followed by @
    tweet = re.sub("([^0-9A-Za-z \t])", "", tweet) # Removes words at start of string
    return tweet


In [None]:
df["tweet"] = df["tweet"].apply(clean_tweet)

In [None]:
x = df["tweet"].values
y = df["class"].values

# Split into training and validation sets

train_tweets, val_tweets, train_labels, val_labels = train_test_split(x, y)

In [None]:
# Load pre-trained AlbertTokenizer 

tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/684 [00:00<?, ?B/s]

In [None]:
# Tokenize tweets

train_tokens = tokenizer(list(train_tweets), return_tensors="pt", padding=True, truncation=True, max_length=64)
val_tokens = tokenizer(list(val_tweets), return_tensors="pt", padding=True, truncation=True, max_length=64)


In [None]:
# Create lists of tokens

device = "cuda"
trn = [train_tokens["input_ids"].to(device), train_tokens["attention_mask"].to(device),
      train_tokens["token_type_ids"].to(device), torch.tensor(train_labels).to(device)]
val = [val_tokens["input_ids"].to(device), val_tokens["attention_mask"].to(device),
      val_tokens["token_type_ids"].to(device), torch.tensor(val_labels).to(device)]

In [None]:
# Dataloader class

BATCH_SIZE = 32
class ClassificationData(pl.LightningDataModule):
    def __init__(self, trn, val):
        super().__init__()

        self.trn = DataLoader(TensorDataset(*trn), batch_size=BATCH_SIZE)
        self.val = DataLoader(TensorDataset(*val), batch_size=BATCH_SIZE)

    def train_dataloader(self): return self.trn
    def val_dataloader(self): return self.val

dls = ClassificationData(trn, val)

In [None]:
# This should return a list of 4 tensors - input_ids, attention_masks, token_type_ids, and labels
next(iter(dls.trn))

[tensor([[    2,  3668,    29,  ...,     0,     0,     0],
         [    2, 24974,    87,  ...,     0,     0,     0],
         [    2,  8409,   107,  ...,     0,     0,     0],
         ...,
         [    2,  3398,    28,  ...,     0,     0,     0],
         [    2,   137,   396,  ...,     0,     0,     0],
         [    2,  2167,    93,  ...,     0,     0,     0]], device='cuda:0'),
 tensor([[1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         ...,
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0]], device='cuda:0'),
 tensor([[0, 0, 0,  ..., 0, 0, 0],
         [0, 0, 0,  ..., 0, 0, 0],
         [0, 0, 0,  ..., 0, 0, 0],
         ...,
         [0, 0, 0,  ..., 0, 0, 0],
         [0, 0, 0,  ..., 0, 0, 0],
         [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0'),
 tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 1, 0, 1, 0, 1, 0, 0], device='cuda

Training a Text Classifier Using Pre-trained ALBERT

In [None]:
from transformers import AlbertModel
albert_model = AlbertModel.from_pretrained('albert-base-v2')

Downloading:   0%|          | 0.00/47.4M [00:00<?, ?B/s]

Some weights of the model checkpoint at albert-base-v2 were not used when initializing AlbertModel: ['predictions.LayerNorm.weight', 'predictions.decoder.bias', 'predictions.decoder.weight', 'predictions.dense.weight', 'predictions.dense.bias', 'predictions.bias', 'predictions.LayerNorm.bias']
- This IS expected if you are initializing AlbertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
class AlbertClassifier(pl.LightningModule):
    def __init__(self, dropout_p, hid_dim, output_dim):
        super().__init__()
        self.albert = albert_model
        self.dropout = torch.nn.Dropout(dropout_p)
        self.linear_1 = torch.nn.Linear(hid_dim,hid_dim)
        self.linear_2 = torch.nn.Linear(hid_dim, output_dim)
        self.loss = torch.nn.NLLLoss()

    def forward(self, input_ids, attention_mask, token_ids):
        x1 = self.albert(input_ids, attention_mask=attention_mask, token_type_ids=token_ids)[0]
        x1 = x1[:,0]
        x1 = self.dropout(torch.nn.ReLU()(self.linear_1(x1)))
        output  = torch.log_softmax(self.linear_2(x1), dim = 1)
        return output

    def training_step(self, batch, ix):
        pred = self(batch[0], batch[1], batch[2])
        loss = self.loss(pred, batch[3].view(-1))
        return loss

    def validation_step(self, batch, ix):
        pred = self(batch[0], batch[1], batch[2])
        loss = self.loss(pred, batch[3].view(-1))
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-5)

m = AlbertClassifier(0.5, 768, 2)

In [None]:
# Train the model

# dls is the object of the dataloader class
device = "cuda"
t = pl.Trainer(max_epochs=3, gpus=1)
t.fit(m.to(device), dls)

  rank_zero_deprecation(
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name     | Type        | Params
-----------------------------------------
0 | albert   | AlbertModel | 11.7 M
1 | dropout  | Dropout     | 0     
2 | linear_1 | Linear      | 590 K 
3 | linear_2 | Linear      | 1.5 K 
4 | loss     | NLLLoss     | 0     
-----------------------------------------
12.3 M    Trainable params
0         Non-trainable params
12.3 M    Total params
49.103    Total estimated model params size (MB)


Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=3` reached.


In [None]:
print(m)

AlbertClassifier(
  (albert): AlbertModel(
    (embeddings): AlbertEmbeddings(
      (word_embeddings): Embedding(30000, 128, padding_idx=0)
      (position_embeddings): Embedding(512, 128)
      (token_type_embeddings): Embedding(2, 128)
      (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0, inplace=False)
    )
    (encoder): AlbertTransformer(
      (embedding_hidden_mapping_in): Linear(in_features=128, out_features=768, bias=True)
      (albert_layer_groups): ModuleList(
        (0): AlbertLayerGroup(
          (albert_layers): ModuleList(
            (0): AlbertLayer(
              (full_layer_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (attention): AlbertAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
   

Model Results and Validation

In [None]:
val_batch = next(iter(dls.val))

device = "cuda"
m.to(device)

val_pred = m(val_batch[0], val_batch[1], val_batch[2]) # m is the model created
val_label = val_pred.data.max(1)[1].cpu().numpy()

val_true = val_batch[3].reshape(BATCH_SIZE).cpu().numpy() # BATCH_SIZE is 32

In [None]:
# Calculate precision

precision_score = sklearn.metrics.precision_score(val_true, val_label)

print(precision_score)

0.7777777777777778


Conclusions:

The project outlined the implementation of a pre-trained ALBERT transformer. The implementation was to use the weights of the pre-trained transformer to build a text classifier that would classify occurrences of a given Twitter dataset into hateful speech or non hateful speech.

The problem of hate speech detection is a practical one pertaining to the real world. Several hate speech detection models are already in use in companies like Facebook and Twitter, and still a lot of instances of hateful speech get overlooked. The aim is to minimize false positives and increase the precision score while building a model that classifies hate speech.

The state-of-the-art architecture in NLP is the transformer architecture, which does not use recurrence, unlike the earlier RNNs or LSTMs, but rather uses multi-head self attention to attend to all words of a given sentence. Transformers have achieved path-breaking success on multiple NLP tasks, and it is advisable to use pre-trained transformers when solving an NLP problem for practical purposes.