# Hate speech classification using Twitter dataset

Consists of: in-domain results and domain adaptation on movies dataset results

The class labels depict the following:

0: Normal speech, 
1: Offensive speech
2: Hate speech

#### To work with this, the following folder paths needs to be created in the directory of this notebook:

classification_reports/   : This will contain all the classification reports generated by the model

data/         : Contains twitter.csv annotation file

movies/       : contains all_movies.csv file

movies/for_training/:    contains 6 movies used for cross validation training and testing

training_checkpoints/in_domain/twitter/cp_twitter.ckpt  : for storing the weights of execution

In [1]:
! pip install transformers==2.6.0

Collecting transformers
  Downloading transformers-4.9.2-py3-none-any.whl (2.6 MB)
[K     |████████████████████████████████| 2.6 MB 7.5 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.45-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 46.5 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 23.4 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 76.4 MB/s 
[?25hCollecting huggingface-hub==0.0.12
  Downloading huggingface_hub-0.0.12-py3-none-any.whl (37 kB)
Installing collected packages: tokenizers, sacremoses, pyyaml, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully u

## Training on twitter dataset

In [2]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import re
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import os
import glob

#### Initialize bert classification model for 3 labels

In [3]:
from transformers import BertTokenizer, TFBertForSequenceClassification
from transformers import InputExample, InputFeatures

model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", 
                                                        trainable=True, 
                                                        num_labels=3)
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=536063208.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




Initialize checkpoints

In [4]:
checkpoint_path = "training_checkpoints/in_domain/twitter/cp_twitter.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)

Read hate dataset and convert it into train and test

In [5]:
df = pd.read_csv("data/twitter.csv")
df = df.drop(columns=['Unnamed: 0'])
df['tweet'] = df['tweet'].str.strip()
df.count()

tweet    24472
label    24472
dtype: int64

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24472 entries, 0 to 24471
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   tweet   24472 non-null  object
 1   label   24472 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 382.5+ KB


In [7]:
def get_dataset(df, seed, test_size):
    return train_test_split(df, test_size=test_size, random_state=seed, shuffle=True)

In [8]:
train, test = get_dataset(df, 11, 0.2)

In [9]:
train.head()

Unnamed: 0,tweet,label
11983,K.Michelle talking bout can't raise no man wel...,1
11244,I've got saltine crackers and red wine here. I...,0
14490,: Udonis talking some big trash to Lance. LeBr...,0
17351,: Your opinion is irrelevant because you are a...,1
16205,: Scott Walker investigate. Christie's Bridgeg...,0


In [10]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19577 entries, 11983 to 10137
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   tweet   19577 non-null  object
 1   label   19577 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 458.8+ KB


In [11]:
train.columns = ['DATA_COLUMN', 'LABEL_COLUMN']
test.columns = ['DATA_COLUMN', 'LABEL_COLUMN']

In [12]:
def convert_data_to_examples(train, test, DATA_COLUMN, LABEL_COLUMN): 
  train_InputExamples = train.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
                                                          text_a = x[DATA_COLUMN], 
                                                          text_b = None,
                                                          label = x[LABEL_COLUMN]), axis = 1)

  validation_InputExamples = test.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
                                                          text_a = x[DATA_COLUMN], 
                                                          text_b = None,
                                                          label = x[LABEL_COLUMN]), axis = 1)
  
  return train_InputExamples, validation_InputExamples

  train_InputExamples, validation_InputExamples = convert_data_to_examples(train, 
                                                                           test, 
                                                                           'DATA_COLUMN', 
                                                                           'LABEL_COLUMN')
  
def convert_examples_to_tf_dataset(examples, tokenizer, max_length=128):
    features = [] # -> will hold InputFeatures to be converted later

    for e in examples:
        # Documentation is really strong for this method, so please take a look at it
        input_dict = tokenizer.encode_plus(
            e.text_a,
            add_special_tokens=True,
            max_length=max_length, # truncates if len(s) > max_length
            return_token_type_ids=True,
            return_attention_mask=True,
            pad_to_max_length=True, # pads to the right by default # CHECK THIS for pad_to_max_length
            truncation=True
        )

        input_ids, token_type_ids, attention_mask = (input_dict["input_ids"],
            input_dict["token_type_ids"], input_dict['attention_mask'])

        features.append(
            InputFeatures(
                input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, label=e.label
            )
        )

    def gen():
        for f in features:
            yield (
                {
                    "input_ids": f.input_ids,
                    "attention_mask": f.attention_mask,
                    "token_type_ids": f.token_type_ids,
                },
                f.label,
            )

    return tf.data.Dataset.from_generator(
        gen,
        ({"input_ids": tf.int32, "attention_mask": tf.int32, "token_type_ids": tf.int32}, tf.int64),
        (
            {
                "input_ids": tf.TensorShape([None]),
                "attention_mask": tf.TensorShape([None]),
                "token_type_ids": tf.TensorShape([None]),
            },
            tf.TensorShape([]),
        ),
    )


DATA_COLUMN = 'DATA_COLUMN'
LABEL_COLUMN = 'LABEL_COLUMN'

In [13]:
train_InputExamples, validation_InputExamples = convert_data_to_examples(train, test, DATA_COLUMN, LABEL_COLUMN)

train_data = convert_examples_to_tf_dataset(list(train_InputExamples), tokenizer)
train_data = train_data.batch(32)

validation_data = convert_examples_to_tf_dataset(list(validation_InputExamples), tokenizer)
validation_data = validation_data.batch(32)



In [14]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-6, epsilon=1e-08, clipnorm=1.0), 
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), 
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])

In [15]:
hist = model.fit(train_data, epochs=4, validation_data=validation_data, callbacks=[cp_callback])

Epoch 1/4
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Cause: while/else statement not yet supported
Cause: while/else statement not yet supported
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.

Epoch 00001: saving model to training_checkpoints/in_domain/twitter/cp_twitter.ckpt
Epoch 2/4

Epoch 00002: saving model to training_checkpoints/in_domain/twitter/cp_twitter.ckpt
Epoch 3/4

Epoch 00003: sa

In [16]:
preds = model.predict(validation_data)



In [17]:
cr = classification_report(test['LABEL_COLUMN'],np.argmax(preds[0],axis=1),output_dict=True)

In [18]:
pd.DataFrame(cr).transpose().to_csv('classification_reports/classification_bert_twitter_indomain.csv')

#### In-domain classification report for twitter

In [19]:
pd.DataFrame(cr).transpose() #  0: Normal speech, 1: Offensive speech, 2: Hate speech

Unnamed: 0,precision,recall,f1-score,support
0,0.884754,0.905405,0.894961,814.0
1,0.940467,0.965999,0.953062,3794.0
2,0.587879,0.337979,0.429204,287.0
accuracy,0.919101,0.919101,0.919101,0.919101
macro avg,0.804367,0.736461,0.759075,4895.0
weighted avg,0.91053,0.919101,0.912686,4895.0




---



---



---



---


#### Domain Adaptation, predicting on movies with the twitter trained model on 3 labels

In [20]:
def convert_data_to_examples_valid(data, DATA_COLUMN, LABEL_COLUMN): 
  train_InputExamples = data.apply(lambda x: InputExample(guid=None, # Globally unique ID for bookkeeping, unused in this case
                                                          text_a = x[DATA_COLUMN], 
                                                          text_b = None,
                                                          label = x[LABEL_COLUMN]), axis = 1)

  
  return train_InputExamples

In [21]:
df_movies = pd.read_csv('movies/all_movies.csv')

In [22]:
df_movies.head(2)

Unnamed: 0.1,Unnamed: 0,movie_id,batch_id,majority_answer,text,movie_name,Unnamed: 6,Unnamed: 7
0,0,AmericanHistoryX(1998)_1,1566624979,0,Derek.,AmerricanHistoryX,,
1,1,AmericanHistoryX(1998)_2,1566624979,1,What the fuck are you thinking?,AmerricanHistoryX,,


In [23]:
df_movies = df_movies.rename(columns={"text": "DATA_COLUMN", "majority_answer": "LABEL_COLUMN"})
df_movies.head()

Unnamed: 0.1,Unnamed: 0,movie_id,batch_id,LABEL_COLUMN,DATA_COLUMN,movie_name,Unnamed: 6,Unnamed: 7
0,0,AmericanHistoryX(1998)_1,1566624979,0,Derek.,AmerricanHistoryX,,
1,1,AmericanHistoryX(1998)_2,1566624979,1,What the fuck are you thinking?,AmerricanHistoryX,,
2,2,AmericanHistoryX(1998)_3,1566624979,0,There's a black guy outside breaking into your...,AmerricanHistoryX,,
3,3,AmericanHistoryX(1998)_4,1566624979,0,How long has he been there?,AmerricanHistoryX,,
4,4,AmericanHistoryX(1998)_5,1566624979,0,I don't know.,AmerricanHistoryX,,


In [24]:
movie_InputExamples = convert_data_to_examples_valid(df_movies, DATA_COLUMN, LABEL_COLUMN)

In [25]:
movie_data = convert_examples_to_tf_dataset(list(movie_InputExamples), tokenizer)
movie_data = movie_data.batch(32)



In [26]:
preds_movie = model.predict(movie_data)

In [27]:
cr_movies = classification_report(df_movies['LABEL_COLUMN'], np.argmax(preds_movie[0], axis=1), output_dict=True)

In [28]:
pd.DataFrame(cr_movies).transpose().to_csv('classification_reports/bert_twitter_domain_adap_movies.csv')

#### Domain adaptation classification report from twitter on the movies dataset

In [29]:
pd.DataFrame(cr_movies).transpose() # 0: None, 1: offensive, 2:hate

Unnamed: 0,precision,recall,f1-score,support
0,0.982884,0.917351,0.948987,9014.0
1,0.628024,0.902899,0.740785,1380.0
2,0.632302,0.62585,0.62906,294.0
accuracy,0.907466,0.907466,0.907466,0.907466
macro avg,0.747737,0.815367,0.772944,10688.0
weighted avg,0.927422,0.907466,0.913304,10688.0




---



---



---
### Cross validation


#### 6-fold cross validation on movies by fine tuning on above twitter dataset

In [30]:
def convert_data_to_examples_cv(train, DATA_COLUMN, LABEL_COLUMN):
    train_InputExamples = train.apply(
        lambda x: InputExample(guid=None,  # Globally unique ID for bookkeeping, unused in this case
                               text_a=x[DATA_COLUMN],
                               text_b=None,
                               label=x[LABEL_COLUMN]), axis=1)

    return train_InputExamples


def convert_examples_to_tf_dataset_cv(examples, tokenizer, max_length=128):
    features = []  # -> will hold InputFeatures to be converted later

    for e in examples:
        # Documentation is really strong for this method, so please take a look at it
        input_dict = tokenizer.encode_plus(
            e.text_a,
            add_special_tokens=True,
            max_length=max_length,  # truncates if len(s) > max_length
            return_token_type_ids=True,
            return_attention_mask=True,
            pad_to_max_length=True,  # pads to the right by default # CHECK THIS for pad_to_max_length
            truncation=True
        )

        input_ids, token_type_ids, attention_mask = (input_dict["input_ids"],
                                                     input_dict["token_type_ids"], input_dict['attention_mask'])

        features.append(
            InputFeatures(
                input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, label=e.label
            )
        )

    def gen():
        for f in features:
            yield (
                {
                    "input_ids": f.input_ids,
                    "attention_mask": f.attention_mask,
                    "token_type_ids": f.token_type_ids,
                },
                f.label,
            )

    return tf.data.Dataset.from_generator(
        gen,
        ({"input_ids": tf.int32, "attention_mask": tf.int32, "token_type_ids": tf.int32}, tf.int64),
        (
            {
                "input_ids": tf.TensorShape([None]),
                "attention_mask": tf.TensorShape([None]),
                "token_type_ids": tf.TensorShape([None]),
            },
            tf.TensorShape([]),
        ),
    )


def train_bert(df_train, df_test, load_training = False):
    model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased",
                                                        trainable=True,
                                                        num_labels=3)
    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
    if load_training:
        model.load_weights('training_checkpoints/in_domain/twitter/cp_twitter.ckpt')
    train = df_train[['text', 'majority_answer']]
    train.columns = ['DATA_COLUMN', 'LABEL_COLUMN']

    test = df_test[['text', 'majority_answer']]
    test.columns = ['DATA_COLUMN', 'LABEL_COLUMN']

    DATA_COLUMN = 'DATA_COLUMN'
    LABEL_COLUMN = 'LABEL_COLUMN'

    train_InputExamples = convert_data_to_examples_cv(train, DATA_COLUMN, LABEL_COLUMN)
    test_InputExamples = convert_data_to_examples_cv(test, DATA_COLUMN, LABEL_COLUMN)

    train_data = convert_examples_to_tf_dataset_cv(list(train_InputExamples), tokenizer)
    train_data = train_data.batch(32)

    test_data = convert_examples_to_tf_dataset_cv(list(test_InputExamples), tokenizer)
    test_data = test_data.batch(32)

    # compile and fit
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-6, epsilon=1e-08, clipnorm=1.0),
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=[tf.keras.metrics.SparseCategoricalAccuracy('accuracy')])

    model.fit(train_data, epochs=6)

    print('predicting')
    preds = model.predict(test_data)

    # classification
    return classification_report(test['LABEL_COLUMN'], np.argmax(preds[0], axis=1), output_dict=True)

In [31]:
def load_movies_to_df(path):
    df_movies = []

    for filename in glob.glob(path + '*.csv'):
        df_movies.append(pd.read_csv(filename))

    return df_movies

In [32]:
df_movies = load_movies_to_df('movies/for_training/')
classification_reports = []
df_main = pd.DataFrame()

In [33]:
# perform cross folding
for i in range(len(df_movies)):
    df_train = pd.concat(df_movies[0:i] + df_movies[i + 1:])
    df_test = df_movies[i]

    train_movies = df_train['movie_name'].unique()
    test_movie = df_test['movie_name'].unique()
    print(','.join(train_movies))
    print(test_movie[0])
    classification_reports.append(train_bert(df_train, df_test, True))
    
    print('Train movies: ', str(','.join(train_movies)))
    print('Test movie: ', str(test_movie[0]))
    print('Classification report: \n', classification_reports[i])
    print('------------------------------------------------')

    df_cr = pd.DataFrame(classification_reports[i]).transpose()
    df_cr['movie_train'] =  str(','.join(train_movies))
    df_cr['movie_test'] = str(test_movie[0])
    df_cr.to_csv('classification_reports/'+'bert_twitter_cv_finetune_testmovie_'+str(test_movie[0])+'.csv')
    df_main = df_main.append(df_cr)

Pulp_Fiction,AmerricanHistoryX,TheWolfofWallStreet,Django_Unchained,South_Park
BlacKkKlansman


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
predicting
Train movies:  Pulp_Fiction,AmerricanHistoryX,TheWolfofWallStreet,Django_Unchained,South_Park
Test movie:  BlacKkKlansman
Classification report: 
 {'0': {'precision': 0.9695035460992908, 'recall': 0.9350205198358413, 'f1-score': 0.9519498607242339, 'support': 1462}, '1': {'precision': 0.5042735042735043, 'recall': 0.6082474226804123, 'f1-score': 0.5514018691588785, 'support': 97}, '2': {'precision': 0.5084745762711864, 'recall': 0.6976744186046512, 'f1-score': 0.5882352941176471, 'support': 86}, 'accuracy': 0.9033434650455927, 'macro avg': {'precision': 0.6607505422146605, 'recall': 0.7469807870403016, 'f1-score': 0.6971956746669199, 'support': 1645}, 'weighted avg': {'precision': 0.9179681020492494, 'recall': 0.9033434650455927, 'f1-score': 0.9093160565236225, 'support': 1645}}
------------------------------------------------
BlacKkKlansman,AmerricanHistoryX,TheWolfofWallStreet,Django_Unchained,South_Park
Pulp_Fict

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
predicting
Train movies:  BlacKkKlansman,AmerricanHistoryX,TheWolfofWallStreet,Django_Unchained,South_Park
Test movie:  Pulp_Fiction
Classification report: 
 {'0': {'precision': 0.9676691729323308, 'recall': 0.9654913728432108, 'f1-score': 0.9665790461885092, 'support': 1333}, '1': {'precision': 0.8270676691729323, 'recall': 0.8301886792452831, 'f1-score': 0.8286252354048963, 'support': 265}, '2': {'precision': 0.7692307692307693, 'recall': 0.8333333333333334, 'f1-score': 0.8, 'support': 24}, 'accuracy': 0.9414303329223181, 'macro avg': {'precision': 0.8546558704453441, 'recall': 0.8763377951406092, 'f1-score': 0.8650680938644685, 'support': 1622}, 'weighted avg': {'precision': 0.9417617005617526, 'recall': 0.9414303329223181, 'f1-score': 0.9415755585398151, 'support': 1622}}
------------------------------------------------
BlacKkKlansman,Pulp_Fiction,TheWolfofWallStreet,Django_Unchained,South_Park
AmerricanHistoryX


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
predicting
Train movies:  BlacKkKlansman,Pulp_Fiction,TheWolfofWallStreet,Django_Unchained,South_Park
Test movie:  AmerricanHistoryX
Classification report: 
 {'0': {'precision': 0.969488939740656, 'recall': 0.9746932515337423, 'f1-score': 0.9720841300191205, 'support': 1304}, '1': {'precision': 0.7867298578199052, 'recall': 0.8019323671497585, 'f1-score': 0.7942583732057417, 'support': 207}, '2': {'precision': 0.7674418604651163, 'recall': 0.6111111111111112, 'f1-score': 0.6804123711340206, 'support': 54}, 'accuracy': 0.939297124600639, 'macro avg': {'precision': 0.8412202193418925, 'recall': 0.7959122432648705, 'f1-score': 0.8155849581196275, 'support': 1565}, 'weighted avg': {'precision': 0.9383441012496179, 'recall': 0.939297124600639, 'f1-score': 0.9384993334439354, 'support': 1565}}
------------------------------------------------
BlacKkKlansman,Pulp_Fiction,AmerricanHistoryX,Django_Unchained,South_Park
TheWolfofWallStree

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
predicting
Train movies:  BlacKkKlansman,Pulp_Fiction,AmerricanHistoryX,Django_Unchained,South_Park
Test movie:  TheWolfofWallStreet
Classification report: 
 {'0': {'precision': 0.9834142394822006, 'recall': 0.9826192400970089, 'f1-score': 0.9830165790537807, 'support': 2474}, '1': {'precision': 0.9304347826086956, 'recall': 0.9114139693356048, 'f1-score': 0.9208261617900172, 'support': 587}, '2': {'precision': 0.125, 'recall': 1.0, 'f1-score': 0.2222222222222222, 'support': 2}, 'accuracy': 0.9689846555664381, 'macro avg': {'precision': 0.6796163406969654, 'recall': 0.964677736477538, 'f1-score': 0.7086883210220067, 'support': 3063}, 'weighted avg': {'precision': 0.9727006352824906, 'recall': 0.9689846555664381, 'f1-score': 0.9706015076703357, 'support': 3063}}
------------------------------------------------
BlacKkKlansman,Pulp_Fiction,AmerricanHistoryX,TheWolfofWallStreet,South_Park
Django_Unchained


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
predicting
Train movies:  BlacKkKlansman,Pulp_Fiction,AmerricanHistoryX,TheWolfofWallStreet,South_Park
Test movie:  Django_Unchained
Classification report: 
 {'0': {'precision': 0.9838082901554405, 'recall': 0.9793681495809156, 'f1-score': 0.981583198707593, 'support': 1551}, '1': {'precision': 0.632183908045977, 'recall': 0.6962025316455697, 'f1-score': 0.6626506024096386, 'support': 79}, '2': {'precision': 0.9568965517241379, 'recall': 0.9487179487179487, 'f1-score': 0.9527896995708154, 'support': 117}, 'accuracy': 0.9645105895821409, 'macro avg': {'precision': 0.8576295833085185, 'recall': 0.8747628766481447, 'f1-score': 0.8656745002293489, 'support': 1747}, 'weighted avg': {'precision': 0.9661053711038606, 'recall': 0.9645105895821409, 'f1-score': 0.9652325893735683, 'support': 1747}}
------------------------------------------------
BlacKkKlansman,Pulp_Fiction,AmerricanHistoryX,TheWolfofWallStreet,Django_Unchained
South_Pa

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
predicting
Train movies:  BlacKkKlansman,Pulp_Fiction,AmerricanHistoryX,TheWolfofWallStreet,Django_Unchained
Test movie:  South_Park
Classification report: 
 {'0': {'precision': 0.9488574537540805, 'recall': 0.9797752808988764, 'f1-score': 0.9640685461580983, 'support': 890}, '1': {'precision': 0.8547008547008547, 'recall': 0.6896551724137931, 'f1-score': 0.7633587786259542, 'support': 145}, '2': {'precision': 0.3, 'recall': 0.2727272727272727, 'f1-score': 0.28571428571428564, 'support': 11}, 'accuracy': 0.9321223709369025, 'macro avg': {'precision': 0.7011861028183116, 'recall': 0.6473859086799808, 'f1-score': 0.671047203499446, 'support': 1046}, 'weighted avg': {'precision': 0.9289816039892499, 'recall': 0.9321223709369025, 'f1-score': 0.9291117458167572, 'support': 1046}}
------------------------------------------------


In [34]:
df_main.to_csv('classification_reports/bert_crossvalid_finetune_twitter.csv')

In [35]:
print(df_main)

              precision  ...           movie_test
0              0.969504  ...       BlacKkKlansman
1              0.504274  ...       BlacKkKlansman
2              0.508475  ...       BlacKkKlansman
accuracy       0.903343  ...       BlacKkKlansman
macro avg      0.660751  ...       BlacKkKlansman
weighted avg   0.917968  ...       BlacKkKlansman
0              0.967669  ...         Pulp_Fiction
1              0.827068  ...         Pulp_Fiction
2              0.769231  ...         Pulp_Fiction
accuracy       0.941430  ...         Pulp_Fiction
macro avg      0.854656  ...         Pulp_Fiction
weighted avg   0.941762  ...         Pulp_Fiction
0              0.969489  ...    AmerricanHistoryX
1              0.786730  ...    AmerricanHistoryX
2              0.767442  ...    AmerricanHistoryX
accuracy       0.939297  ...    AmerricanHistoryX
macro avg      0.841220  ...    AmerricanHistoryX
weighted avg   0.938344  ...    AmerricanHistoryX
0              0.983414  ...  TheWolfofWallStreet


In [36]:
len(classification_reports[0])

6

In [37]:
df_main.head()

Unnamed: 0,precision,recall,f1-score,support,movie_train,movie_test
0,0.969504,0.935021,0.95195,1462.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
1,0.504274,0.608247,0.551402,97.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
2,0.508475,0.697674,0.588235,86.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
accuracy,0.903343,0.903343,0.903343,0.903343,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
macro avg,0.660751,0.746981,0.697196,1645.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman


In [38]:
def get_precision_recall_f1(category, result_df):
    precision = result_df[result_df.label==category].precision.mean()
    recall = result_df[result_df.label==category].recall.mean()
    f1 = result_df[result_df.label==category]['f1-score'].mean()
    
    return {'label': category, 'precision': precision, 'recall': recall, 'f1': f1}

In [39]:
df_cv= pd.read_csv('classification_reports/bert_crossvalid_finetune_twitter.csv')

In [40]:
df_cv = df_cv.rename(columns={'Unnamed: 0': 'label', 'b': 'Y'})
df_cv.head()

Unnamed: 0,label,precision,recall,f1-score,support,movie_train,movie_test
0,0,0.969504,0.935021,0.95195,1462.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
1,1,0.504274,0.608247,0.551402,97.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
2,2,0.508475,0.697674,0.588235,86.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
3,accuracy,0.903343,0.903343,0.903343,0.903343,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman
4,macro avg,0.660751,0.746981,0.697196,1645.0,"Pulp_Fiction,AmerricanHistoryX,TheWolfofWallSt...",BlacKkKlansman


In [41]:
normal_dict = get_precision_recall_f1('0', df_cv)
offensive_dict = get_precision_recall_f1('1',df_cv)
hate_dict = get_precision_recall_f1('2',df_cv)

#### Aggregated results of all 6 folds

In [42]:
df_result = pd.DataFrame([normal_dict, offensive_dict, hate_dict])
df_result

Unnamed: 0,label,precision,recall,f1
0,0,0.970457,0.969495,0.96988
1,1,0.755898,0.756273,0.75352
2,2,0.571174,0.727261,0.588229


In [43]:
for cr in classification_reports:
  print(cr)

{'0': {'precision': 0.9695035460992908, 'recall': 0.9350205198358413, 'f1-score': 0.9519498607242339, 'support': 1462}, '1': {'precision': 0.5042735042735043, 'recall': 0.6082474226804123, 'f1-score': 0.5514018691588785, 'support': 97}, '2': {'precision': 0.5084745762711864, 'recall': 0.6976744186046512, 'f1-score': 0.5882352941176471, 'support': 86}, 'accuracy': 0.9033434650455927, 'macro avg': {'precision': 0.6607505422146605, 'recall': 0.7469807870403016, 'f1-score': 0.6971956746669199, 'support': 1645}, 'weighted avg': {'precision': 0.9179681020492494, 'recall': 0.9033434650455927, 'f1-score': 0.9093160565236225, 'support': 1645}}
{'0': {'precision': 0.9676691729323308, 'recall': 0.9654913728432108, 'f1-score': 0.9665790461885092, 'support': 1333}, '1': {'precision': 0.8270676691729323, 'recall': 0.8301886792452831, 'f1-score': 0.8286252354048963, 'support': 265}, '2': {'precision': 0.7692307692307693, 'recall': 0.8333333333333334, 'f1-score': 0.8, 'support': 24}, 'accuracy': 0.941