### Fake News Detection
For our experiments on fine-tuning transformers on the FNC-1 task, we use the Simple Transformers (Rajapakse, 2019) wrapper around Hugging Face Transformers library (Wolf et al., 2019b).

The model implementations provided in the library are tested to ensure they match the original author implementations’ performances on various benchmarks. A list of architectures for which reference implementations and pre-trained weights are currently provided in Transformers includes BERT, XLNet, and RoBERTa, as well as DistilBERT, GPT and GPT2.

The Simple Transformers (Rajapakse, 2019) library is built on top of the Hugging Face Transformers. The idea behind it was to make it as simple as possible, abstracting a lot of the implementation details.
Thus, with Simple Transformers on the shoulders of Hugging Face Transformers, we could access pre-trained BERT, XLNet, and RoBERTa in a unified way without a lot of pre-processing coding.

Let us first prepare the data to feed into transformers:

In [2]:
!pip install wandb
!pip install tensorboardx
!pip install simpletransformers

Collecting wandb
  Downloading wandb-0.12.9-py2.py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 7.3 MB/s eta 0:00:01
Collecting docker-pycreds>=0.4.0
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting sentry-sdk>=1.0.0
  Downloading sentry_sdk-1.5.2-py2.py3-none-any.whl (142 kB)
[K     |████████████████████████████████| 142 kB 68.7 MB/s 
[?25hCollecting pathtools
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Collecting shortuuid>=0.5.0
  Downloading shortuuid-1.0.8-py3-none-any.whl (9.5 kB)
Collecting yaspin>=1.0.0
  Downloading yaspin-2.1.0-py3-none-any.whl (18 kB)
Collecting subprocess32>=3.5.3
  Downloading subprocess32-3.5.4.tar.gz (97 kB)
[K     |████████████████████████████████| 97 kB 5.8 MB/s 
Collecting GitPython>=1.0.0
  Downloading GitPython-3.1.26-py3-none-any.whl (180 kB)
[K     |████████████████████████████████| 180 kB 58.5 MB/s 
Collecting configparser>=3.8.1
  Downloading configparser-5.2.0-py3-none-any.whl (19 kB)


In [1]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



In [3]:
import os
import csv
import pandas as pd
from tqdm import tqdm
import wandb
import logging

from sklearn.model_selection import train_test_split

def fnc(path_headlines, path_bodies):

    map = {'agree': 0, 'disagree':1, 'discuss':2, 'unrelated':3}

    with open(path_bodies, encoding='utf_8') as fb:  # Body ID,articleBody
        body_dict = {}
        lines_b = csv.reader(fb)
        for i, line in enumerate(tqdm(list(lines_b), ncols=80, leave=False)):
            if i > 0:
                body_id = int(line[0].strip())
                body_dict[body_id] = line[1]

    with open(path_headlines, encoding='utf_8') as fh: # Headline,Body ID,Stance
        lines_h = csv.reader(fh)
        h = []
        b = []
        l = []
        for i, line in enumerate(tqdm(list(lines_h), ncols=80, leave=False)):
            if i > 0:
                body_id = int(line[1].strip())
                label = line[2].strip()
                if label in map and body_id in body_dict:
                    h.append(line[0])
                    l.append(map[line[2]])
                    b.append(body_dict[body_id])
    return h, b, l

data_dir = '/content/drive/MyDrive/fnc-1'
headlines, bodies, labels = fnc(
    os.path.join(data_dir, 'train_stances.csv'),
    os.path.join(data_dir, 'train_bodies.csv')
)

list_of_tuples = list(zip(headlines, bodies, labels))
df = pd.DataFrame(list_of_tuples, columns=['text_a', 'text_b', 'labels'])
train_df, val_df = train_test_split(df)
labels_val = pd.Series(val_df['labels']).to_numpy()

headlines, bodies, labels = fnc(
    os.path.join(data_dir, 'competition_test_stances.csv'),
    os.path.join(data_dir, 'competition_test_bodies.csv')
)

list_of_tuples = list(zip(headlines, bodies, labels))
test_df = pd.DataFrame(list_of_tuples, columns=['text_a', 'text_b', 'labels'])
labels_test = pd.Series(test_df['labels']).to_numpy()



Then we create the instance on the Transformer model with Simple Transformers and train it. The *<font>TransformerModel</font>* constructor takes two parameters: model type and model name. All available model types and model names are listed on the Simple Transformers GitHub page: https://github.com/ThilinaRajapakse/simpletransformers.

We use *<font>`bert/bert-base-uncased`<font>*, *<font>`xlnet/xlnet-base-cased`<font>*, and *<font>`roberta/roberta-base`<font>* models. We set the learning rate to be 3e-5 for BERT and 1e-5 for XLNet and RoBERTa. (Use the validation set for the best hyper-parameters search.)

Let’s set maximum sequence length to be equal to 512 tokens: the maximum possible value to set given the parameters of pre-trained models, and the number of epoch to fine-tune the transformer to be 5.

In [4]:
sweep_config = {
    "method": "bayes",  # grid, random
    "metric": {"name": "train_loss", "goal": "minimize"},
    "parameters": {
        "num_train_epochs": {"values": [2, 3, 5]},
        "learning_rate": {"min": 1e-5, "max": 2e-5},
    },
}

sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: b0cdtz3w
Sweep URL: https://wandb.ai/whyhugo/Simple%20Sweep/sweeps/b0cdtz3w


In [6]:
from simpletransformers.model import TransformerModel
from simpletransformers.classification import ClassificationModel, ClassificationArgs

def train():
    # Initialize a new wandb run
    wandb.init()

    # Create a TransformerModel
    model = TransformerModel('roberta', 'roberta-base', num_labels=4, sweep_config=wandb.config, args={
        'learning_rate':1e-5,
        'num_train_epochs': 1,
        'reprocess_input_data': True,
        'overwrite_output_dir': True,
        'process_count': 10,
        'train_batch_size': 1,
        'eval_batch_size': 1,
        'max_seq_length': 32,
        'fp16': True,
        'gradient_accumulation_steps': 1,
        'tensorboard_dir': '/content/drive/MyDrive/train_',
        'wandb_project': 'fnc_roberta',
        'evaluate_during_training': True,
        'manual_seed': 4,
        'use_multiprocessing': True
    })

    # Train the model
    model.train_model(train_df, eval_df=test_df)

    # Evaluate the model
    model.eval_model(test_df)

    # Sync wandb
    wandb.join()


wandb.agent(sweep_id, train)

INFO:wandb.agents.pyagent:Starting sweep agent: entity=None, project=None, count=None
[34m[1mwandb[0m: Agent Starting Run: zqj3c70i with config:
[34m[1mwandb[0m: 	learning_rate: 1.7996801123921787e-05
[34m[1mwandb[0m: 	num_train_epochs: 5
[34m[1mwandb[0m: Currently logged in as: [33mwhyhugo[0m (use `wandb login --relogin` to force relogin)


Downloading:   0%|          | 0.00/481 [00:00<?, ?B/s]

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

Run zqj3c70i errored: ValueError("'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.")
[34m[1mwandb[0m: [32m[41mERROR[0m Run zqj3c70i errored: ValueError("'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.")
[34m[1mwandb[0m: Agent Starting Run: jl6ghy7d with config:
[34m[1mwandb[0m: 	learning_rate: 1.889735715550123e-05
[34m[1mwandb[0m: 	num_train_epochs: 2


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

Run jl6ghy7d errored: ValueError("'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.")
[34m[1mwandb[0m: [32m[41mERROR[0m Run jl6ghy7d errored: ValueError("'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.")
[34m[1mwandb[0m: Agent Starting Run: qvibr8w0 with config:
[34m[1mwandb[0m: 	learning_rate: 1.1870865164962468e-05
[34m[1mwandb[0m: 	num_train_epochs: 2


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

Run qvibr8w0 errored: ValueError("'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.")
[34m[1mwandb[0m: [32m[41mERROR[0m Run qvibr8w0 errored: ValueError("'use_cuda' set to True when cuda is unavailable. Make sure CUDA is available or set use_cuda=False.")
Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: [32m[41mERROR[0m Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: To disable this check set WANDB_AGENT_DISABLE_FLAPPING=true


In [10]:
from simpletransformers.model import TransformerModel
from simpletransformers.classification import ClassificationModel, ClassificationArgs

wandb.init()
wandb.agent(sweep_id, train_model)
model = TransformerModel('roberta', 'roberta-base', num_labels=4, sweep_config=wandb.config, args={
    'learning_rate':1e-5,
    'num_train_epochs': 1,
    'reprocess_input_data': True,
    'overwrite_output_dir': True,
    'process_count': 10,
    'train_batch_size': 1,
    'eval_batch_size': 1,
    'max_seq_length': 128,
    'fp16': True,
    'gradient_accumulation_steps': 4,
    'tensorboard_dir': '/content/drive/MyDrive/train_',
    'wandb_project': 'fnc_roberta',
    'evaluate_during_training': True,
    'manual_seed': 4,
    'use_multiprocessing': True
})

model.train_model(train_df)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

NameError: ignored

BERT model fine-tuned, now we get predictions on the test set and evaluate fine-tuning results.

In [None]:
import numpy as np
_, model_outputs_test, _ = model.eval_model(test_df)

preds_test = np.argmax(model_outputs_test, axis=1)

In [None]:
wandb.join()

Then we calculate averaged and class-wise F1 scores:

In [None]:
from sklearn.metrics import f1_score

def calculate_f1_scores(y_true, y_predicted):
    f1_macro = f1_score(y_true, y_predicted, average='macro')
    f1_classwise = f1_score(y_true, y_predicted, average=None, labels=[0, 1, 2, 3])

    resultstring = "F1 macro: {:.3f}".format(f1_macro * 100) + "% \n"
    resultstring += "F1 agree: {:.3f}".format(f1_classwise[0] * 100) + "% \n"
    resultstring += "F1 disagree: {:.3f}".format(f1_classwise[1] * 100) + "% \n"
    resultstring += "F1 discuss: {:.3f}".format(f1_classwise[2] * 100) + "% \n"
    resultstring += "F1 unrelated: {:.3f}".format(f1_classwise[3] * 100) + "% \n"

    return resultstring

calculate_f1_scores(preds_test, labels_test)

After that we can calculate FNC-1 (the metric proposed by FNC-1 organizers) and print the confusion matrix:

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib
import matplotlib.pyplot as plt

LABELS = [0, 1, 2, 3]
RELATED = [0, 1, 2]

def print_confusion_matrix(cm):
    lines = ['CONFUSION MATRIX:']
    header = "|{:^11}|{:^11}|{:^11}|{:^11}|{:^11}|".format('', *LABELS)
    line_len = len(header)
    lines.append("-"*line_len)
    lines.append(header)
    lines.append("-"*line_len)
    hit = 0
    total = 0
    for i, row in enumerate(cm):
        hit += row[i]
        total += sum(row)
        lines.append("|{:^11}|{:^11}|{:^11}|{:^11}|{:^11}|".format(LABELS[i], *row))
        lines.append("-"*line_len)
    lines.append("ACCURACY: {:.3f}".format((hit / total)*100) + "%")
    print('\n'.join(lines))

def fnc_score_cm(predicted_labels, target):
    score = 0.0
    cm = [[0, 0, 0, 0],
          [0, 0, 0, 0],
          [0, 0, 0, 0],
          [0, 0, 0, 0]]
    for i, (g, t) in enumerate(zip(predicted_labels, target)):
            if g == t:
                score += 0.25
                if g != 3:
                    score += 0.50
            if g in RELATED and t in RELATED:
                score += 0.25

            cm[g][t] += 1
    return score,  cm

fnc_score, cm_test = fnc_score_cm(preds_test, labels_test)
print("\nRelative FNC Score: {:.3f}".format(100/13204.75*fnc_score) + "% \n")
print_confusion_matrix(cm_test)

In [None]:
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=LABELS)
disp.plot() 

Let’s calculate class-wise precision and recall.

In [None]:
from sklearn.metrics import classification_report

eval_report = classification_report(labels_test, preds_test, target_names=LABELS)
print('Test report', eval_report)

In [None]:
predict_data = pd.read('')
predictions, raw_outputs = model.predict(["Sample sentence 1", "Sample sentence 2"])