# Binary Classification with RoBERTa: argumentative vs. non argumentative sentence classification
* Following Haddadan et al. (2019) methodology.

## Requirements

### Pip Installs

In [None]:
!pip install torch transformers datasets scikit-learn



### General Requirements

In [None]:
import torch
from datasets import load_dataset, Dataset
from transformers import (
    RobertaTokenizerFast,
    RobertaForSequenceClassification,
    TrainingArguments,
    Trainer,
    AutoConfig,
)
from huggingface_hub import HfFolder, notebook_login
import pandas as pd
from torch.utils.data import DataLoader
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

In [None]:
# Login in to Hugging Face
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Data

### Loading *USElecDeb60To16 v.01* Dataset
* Dataset from Haddadan et al. (2019)

In [None]:
# Loading dataset with US debates
df = pd.read_csv('sentence_db_candidate.csv')

In [None]:
# Printing datasets shape
df.shape

(29621, 18)

In [None]:
df.head()

Unnamed: 0,Text,Part,Document,Order,Sentence,Start,End,Annotator,Tag,Component,Speech,Speaker,SpeakerType,Set,Date,Year,Name,MainTag
0,"CHENEY: Gwen, I want to thank you, and I want ...",1,30_2004,0,0,2101,2221,,"{""O"": 27}",O,"Gwen, I want to thank you, and I want to than...",CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,O
1,"It's a very important event, and they've done ...",1,30_2004,1,1,2221,2304,,"{""O"": 19}",O,"It's a very important event, and they've done ...",CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,O
2,It's important to look at all of our developme...,1,30_2004,2,2,2304,2418,,"{""O"": 23}",O,It's important to look at all of our developme...,CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,O
3,"And, after 9/11, it became clear that we had t...",1,30_2004,3,3,2418,2744,,"{""O"": 16, ""Claim"": 50}",Claim,"And, after 9/11, it became clear that we had t...",CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Claim
4,And we also then finally had to stand up democ...,1,30_2004,4,4,2744,2974,,"{""O"": 4, ""Claim"": 13, ""Premise"": 25}",Premise,And we also then finally had to stand up democ...,CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Mixed


In [None]:
# Removing rows with Component "O"
df_claim_premise = df[df['Component'] != 'O']

Unnamed: 0,Text,Part,Document,Order,Sentence,Start,End,Annotator,Tag,Component,Speech,Speaker,SpeakerType,Set,Date,Year,Name,MainTag
3,"And, after 9/11, it became clear that we had t...",1,30_2004,3,3,2418,2744,,"{""O"": 16, ""Claim"": 50}",Claim,"And, after 9/11, it became clear that we had t...",CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Claim
4,And we also then finally had to stand up democ...,1,30_2004,4,4,2744,2974,,"{""O"": 4, ""Claim"": 13, ""Premise"": 25}",Premise,And we also then finally had to stand up democ...,CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Mixed
9,What we did in Iraq was exactly the right thin...,1,30_2004,9,9,3861,3916,,"{""Claim"": 12, ""O"": 1}",Claim,What we did in Iraq was exactly the right thin...,CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Claim
10,"If I had it to recommend all over again, I wou...",1,30_2004,10,10,3916,4010,,"{""Premise"": 19, ""O"": 1}",Premise,"If I had it to recommend all over again, I wou...",CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Premise
11,The world is far safer today because Saddam Hu...,1,30_2004,11,11,4010,4112,,"{""Claim"": 6, ""O"": 2, ""Premise"": 13}",Premise,The world is far safer today because Saddam Hu...,CHENEY,Candidate,TRAIN,05 Oct 2004,2004,Richard(Dick) B. Cheney,Mixed
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29615,But our long-term security depends on our deep...,162,31_2004,21,21,91332,91397,,"{""O"": 2, ""Claim"": 10}",Claim,But our long-term security depends on our deep...,BUSH,Candidate,VALIDATION,08 Oct 2004,2004,George W. Bush,Claim
29616,And we'll continue to promote freedom around t...,162,31_2004,22,22,91397,91453,,"{""O"": 2, ""Claim"": 9}",Claim,And we'll continue to promote freedom around t...,BUSH,Candidate,VALIDATION,08 Oct 2004,2004,George W. Bush,Claim
29617,Freedom is on the march.,162,31_2004,23,23,91454,91479,,"{""Claim"": 5, ""O"": 1}",Claim,Freedom is on the march.,BUSH,Candidate,VALIDATION,08 Oct 2004,2004,George W. Bush,Claim
29618,"Tomorrow, Afghanistan will be voting for a pre...",162,31_2004,24,24,91479,91533,,"{""Premise"": 9, ""O"": 1}",Premise,"Tomorrow, Afghanistan will be voting for a pre...",BUSH,Candidate,VALIDATION,08 Oct 2004,2004,George W. Bush,Premise


In [None]:
# Printing Component Column with claim and premises labels
df_claim_premise['Component']

Unnamed: 0,Component
3,Claim
4,Premise
9,Claim
10,Premise
11,Premise
...,...
29615,Claim
29616,Claim
29617,Claim
29618,Premise


In [None]:
# Getting counts of labels
df_claim_premise['Component'].value_counts()

Unnamed: 0_level_0,count
Component,Unnamed: 1_level_1
Claim,11964
Premise,10316


In [None]:
# Replacing labels for binary classifications of arguments
# Using Arg for Argumentative
# Non_arg for Non Argumentative
df['Component'] = df['Component'].str.replace('Claim','Arg')
df['Component'] = df['Component'].str.replace('Premise','Arg')
df['Component'] = df['Component'].str.replace('O','Non_arg')

In [None]:
# Getting count binary labels
df['Component'].value_counts()

Unnamed: 0_level_0,count
Component,Unnamed: 1_level_1
Arg,22280
Non_arg,7252


In [None]:
df.shape

(29621, 18)

In [None]:
# Dropping rows with missing values in column Component
df = df.dropna(subset=['Component'])

In [None]:
# Printing resulting shape
df.shape

(29532, 18)

* Splitting the dataset according to the sets used by Haddadan et al. (2019)

In [None]:
# Splitting US Dataset
df_train = df[df['Set'] == 'TRAIN']
df_val = df[df['Set'] == 'VALIDATION']
df_test = df[df['Set'] == 'TEST']

df_train = df_train[['Speech', 'Component']]
df_val = df_val[['Speech', 'Component']]
df_test = df_test[['Speech', 'Component']]

In [None]:
# Printing sizes of training, validation and testing sets
print(df_train.shape, df_val.shape, df_test.shape)

(14080, 2) (7057, 2) (8484, 2)


In [None]:
# Printing label count for test set
df_test['Component'].value_counts()

Unnamed: 0_level_0,count
Component,Unnamed: 1_level_1
Arg,6575
Non_arg,1880


* Mapping labels to numeric values

In [None]:
# Mapping label values to numeric value
label_mapping = {"Arg": 1, "Non_arg": 0}


df_train['Component'] = df_train['Component'].map(label_mapping)
df_val['Component'] = df_val['Component'].map(label_mapping)
df_test['Component'] = df_test['Component'].map(label_mapping)

In [None]:
# Printing test set label count
df_test['Component'].value_counts()

Unnamed: 0_level_0,count
Component,Unnamed: 1_level_1
1,6575
0,1880


In [None]:
# Rename 'Component' to 'label' for Hugging Face Trainer to recognize labels
df_train = df_train.rename(columns={"Component": "label"})
df_val = df_val.rename(columns={"Component": "label"})
df_test = df_test.rename(columns={"Component": "label"})

In [None]:
# Converting datasets into Hugging Face Dataset format
train_set = Dataset.from_pandas(df_train)
val_set = Dataset.from_pandas(df_val)
test_set = Dataset.from_pandas(df_test)

In [None]:
test_set

Dataset({
    features: ['Speech', 'label', '__index_level_0__'],
    num_rows: 8455
})

# Loading RoBERTa Model

In [None]:
# Seting up the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
# Not necessary to run when model has already been fine-tuned and saved
# Loading the tokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')



In [None]:
# Function for tokenization
def tokenize_function(examples):
    return tokenizer(examples['Speech'], padding="max_length", truncation=True)

In [None]:
# Tokenizing all sets (train, validation, and test)
tokenized_train_set = train_set.map(tokenize_function, batched=True)
tokenized_val_set = val_set.map(tokenize_function, batched=True)
tokenized_test_set = test_set.map(tokenize_function, batched=True)

Map:   0%|          | 0/14044 [00:00<?, ? examples/s]

Map:   0%|          | 0/7033 [00:00<?, ? examples/s]

Map:   0%|          | 0/8455 [00:00<?, ? examples/s]

In [None]:
# Loading the pre-trained model for sequence classification
model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=2).to(device)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
# Defining the training arguments
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    evaluation_strategy="epoch",     # evaluate at the end of each epoch
    learning_rate=2e-5,              # learning rate
    per_device_train_batch_size=4,  # batch size for training
    per_device_eval_batch_size=4,   # batch size for evaluation
    num_train_epochs=3,              # number of training epochs
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)



In [None]:
# Creating  Trainer
trainer = Trainer(
    model=model,                         # the pre-trained model
    args=training_args,                  # training arguments
    train_dataset=tokenized_train_set,   # tokenized training dataset
    eval_dataset=tokenized_val_set,      # tokenized validation dataset
    tokenizer=tokenizer,
)

In [None]:
# Fine-tuning the model
trainer.train()

Epoch,Training Loss,Validation Loss
1,0.6216,0.655516


Epoch,Training Loss,Validation Loss
1,0.6216,0.655516
2,0.3684,0.6087
3,0.4288,0.762293


TrainOutput(global_step=10533, training_loss=0.4983056514074487, metrics={'train_runtime': 5250.3008, 'train_samples_per_second': 8.025, 'train_steps_per_second': 2.006, 'total_flos': 1.108539498442752e+16, 'train_loss': 0.4983056514074487, 'epoch': 3.0})

In [None]:
# Save the fine-tuned model
model.save_pretrained("./fine_tuned_roberta")
tokenizer.save_pretrained("./fine_tuned_roberta")

('./fine_tuned_roberta/tokenizer_config.json',
 './fine_tuned_roberta/special_tokens_map.json',
 './fine_tuned_roberta/vocab.json',
 './fine_tuned_roberta/merges.txt',
 './fine_tuned_roberta/added_tokens.json')

In [None]:
# Saving fine tuned model in Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Saving fine tuned model
!mkdir -p /content/drive/MyDrive/fine_tuned_roberta # Replace with your path
!cp -r ./fine_tuned_roberta/* /content/drive/MyDrive/fine_tuned_roberta/ # Replace with your path

# *USElecDeb60To16 v.01* Classification

In [None]:
# Creating data loader for test set
test_loader = DataLoader(tokenized_test_set, batch_size=16)

In [None]:
tokenized_test_set.set_format(type='torch', columns=['input_ids', 'attention_mask'])

In [None]:
# Function for predictions
def predict(model, dataloader):
    predictions = []
    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)

            outputs = model(input_ids=input_ids, attention_mask=attention_mask)
            logits = outputs.logits

            predictions.append(logits.argmax(dim=-1).cpu().numpy())

    return predictions


In [None]:
# Getting predictions
predictions = predict(model, test_loader)

In [None]:
# Converting predictions to flat list
predictions = [item for sublist in predictions for item in sublist]

In [None]:
# Mapping numerical values to original label values
label_mapping = {1: "arg", 0: "Non_arg"}
predicted_labels = [label_mapping[pred] for pred in predictions]

In [None]:
# Adding predictions to test set Dataframe
df_test['Predicted_Label'] = predictions

In [None]:
df_test

Unnamed: 0,Speech,label,Predicted_Label
1413,"Thank you very much, Jim.",0,0
1414,Let me first give you a sports update.,0,0
1415,"The Braves, one; the Cardinals, nothing, early...",0,1
1416,I want to thank you and I want to thank everyb...,0,0
1417,I want to give a special thanks to my wife Eli...,0,0
...,...,...,...
27406,So I ask you to look at that.,1,0
27407,And you have to vote for somebody with a plan.,1,1
27408,That's what you have elections for.,1,1
27409,"If people would say, well, he got elected to d...",1,1


In [None]:
# Saving predictions into csv file
df_test.to_csv("test_predictions.csv", index=False)

### Evaluation on USElecDeb60To16 v.01 dataset

In [None]:
df_test

Unnamed: 0,Speech,label,Predicted_Label
1413,"Thank you very much, Jim.",0,0
1414,Let me first give you a sports update.,0,0
1415,"The Braves, one; the Cardinals, nothing, early...",0,1
1416,I want to thank you and I want to thank everyb...,0,0
1417,I want to give a special thanks to my wife Eli...,0,0
...,...,...,...
27406,So I ask you to look at that.,1,0
27407,And you have to vote for somebody with a plan.,1,1
27408,That's what you have elections for.,1,1
27409,"If people would say, well, he got elected to d...",1,1


In [None]:
# Getting true labels and predicted labels form test dataframe
y_true = df_test['label']
y_pred = df_test['Predicted_Label']

In [None]:
# Calculating metrics
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred, pos_label=1)  # assuming 'Arg'=1 is the positive class
recall = recall_score(y_true, y_pred, pos_label=1)
f1 = f1_score(y_true, y_pred, pos_label=1)
conf_matrix = confusion_matrix(y_true, y_pred)

In [None]:
# Printing the metrics
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", classification_report(y_true, y_pred))

Accuracy: 0.8567711413364872
Precision: 0.883252357816519
Recall: 0.9400760456273765
F1 Score: 0.9107787519339866
Confusion Matrix:
 [[1063  817]
 [ 394 6181]]

Classification Report:
               precision    recall  f1-score   support

           0       0.73      0.57      0.64      1880
           1       0.88      0.94      0.91      6575

    accuracy                           0.86      8455
   macro avg       0.81      0.75      0.77      8455
weighted avg       0.85      0.86      0.85      8455

