# Fake News Detection Powered with BERT and Friends

- https://medium.com/@vslovik/fake-news-detection-empowered-with-bert-and-friends-20397f7e1675<br>

>- https://github.com/UKPLab/coling2018_fake-news-challenge/blob/master/data/fnc-1/corpora/FNC_ARC/combined_bodies_train.csv<br>
>- https://github.com/FakeNewsChallenge/fnc-1<br>

### BERT
BERT (Devlin et al., 2018), which stands for <b>B</b>idirectional <b>E</b>ncoder <b>R</b>epresentations from <b>T</b>ransformers, is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. BERT uses the ”masked language model” (MLM) pre-training objective, inspired by the Cloze task (Taylor, 1953). The masked language model randomly masks some of the tokens from the input, and the objective is to predict the masked word based on its context. In addition to the masked language model, BERT also uses a ”next sentence prediction” (NSP) task that jointly pre-trains text-pair representations.

For the pre-training corpus (Devlin et al., 2018) use the BooksCorpus (800M words) and English Wikipedia (2,500M words).

The self-attention mechanism in the Transformer allows BERT to model many downstream tasks — whether they involve single text or text pairs. For each task, the steps are: (1) simply plug in the task-specific inputs and outputs into
BERT and (2) fine-tune all the parameters end-to-end.

XLNet
BERT predicts all masked positions independently, meaning that during the training, it does not learn to handle dependencies between predicted masked tokens. This reduces the number of dependencies BERT learns at once, making the learning signal weaker than it could be.
Another problem with BERT is that the [MASK] token — which is at the center of training BERT — never appears when fine-tuning BERT on downstream tasks. That means that the [MASK] token is a source of train-test skew while
fine-tuning.
XLNet (Yang et al., 2019) incorporates a bidirectional context while avoiding the [MASK] tokens and independent predictions. It does this by introducing ”permutation language modeling”: instead of predicting the tokens in sequential order, it predicts tokens in some random order.
Aside from using permutation language modeling, XLNet improves upon BERT by using the Transformer XL as its base architecture.
Both BERT and XLNet can take the pair of text sequences as an input. To enable the model to distinguish between words in two different segments, BERT learns a segment embedding. In contrast, XLNet learns an embedding that represents whether two words are from the same segment. This embedding is used during attention computation between any two words.

### RoBERTa
(Liu et al., 2019) found that BERT was significantly under-trained and proposed an improved recipe for training BERT models, which they call RoBERTa (Robustly optimized BERT approach), that can match or exceed the performance of all of the post-BERT methods. The recipe includes: (1) training the model longer, with bigger batches, over more data; (2) removing the ”next sentence prediction” objective; (3) training on longer sequences; and (4) dynamically changing the masking pattern applied to the training data.

To train RoBERTa (Liu et al., 2019) use five English-language corpora of varying sizes and domains, totaling over 160GB of uncompressed text. They also demonstrate that removing the ”next sentence prediction” loss together with segment-pair input format matches or slightly improves downstream task performance.

### Fake News Detection
For our experiments on fine-tuning transformers on the FNC-1 task, we use the Simple Transformers (Rajapakse, 2019) wrapper around Hugging Face Transformers library (Wolf et al., 2019b).

The model implementations provided in the library are tested to ensure they match the original author implementations’ performances on various benchmarks. A list of architectures for which reference implementations and pre-trained weights are currently provided in Transformers includes BERT, XLNet, and RoBERTa, as well as DistilBERT, GPT and GPT2.

The Simple Transformers (Rajapakse, 2019) library is built on top of the Hugging Face Transformers. The idea behind it was to make it as simple as possible, abstracting a lot of the implementation details.
Thus, with Simple Transformers on the shoulders of Hugging Face Transformers, we could access pre-trained BERT, XLNet, and RoBERTa in a unified way without a lot of pre-processing coding.

Let us first prepare the data to feed into transformers:

In [3]:
!nvidia-smi

Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost.  Reboot the system to recover this GPU




In [2]:
import os
import csv
import pandas as pd
from tqdm import tqdm
import wandb
import logging

from sklearn.model_selection import train_test_split

def fnc(path_headlines, path_bodies):

    map = {'agree': 0, 'disagree':1, 'discuss':2, 'unrelated':3}

    with open(path_bodies, encoding='utf_8') as fb:  # Body ID,articleBody
        body_dict = {}
        lines_b = csv.reader(fb)
        for i, line in enumerate(tqdm(list(lines_b), ncols=80, leave=False)):
            if i > 0:
                body_id = int(line[0].strip())
                body_dict[body_id] = line[1]

    with open(path_headlines, encoding='utf_8') as fh: # Headline,Body ID,Stance
        lines_h = csv.reader(fh)
        h = []
        b = []
        l = []
        for i, line in enumerate(tqdm(list(lines_h), ncols=80, leave=False)):
            if i > 0:
                body_id = int(line[1].strip())
                label = line[2].strip()
                if label in map and body_id in body_dict:
                    h.append(line[0])
                    l.append(map[line[2]])
                    b.append(body_dict[body_id])
    return h, b, l

data_dir = '/programing/programing_created-acer/fnc-1/fnc-1'
headlines, bodies, labels = fnc(
    os.path.join(data_dir, 'train_stances.csv'),
    os.path.join(data_dir, 'train_bodies.csv')
)

list_of_tuples = list(zip(headlines, bodies, labels))
df = pd.DataFrame(list_of_tuples, columns=['text_a', 'text_b', 'labels'])
train_df, val_df = train_test_split(df)
labels_val = pd.Series(val_df['labels']).to_numpy()

headlines, bodies, labels = fnc(
    os.path.join(data_dir, 'competition_test_stances.csv'),
    os.path.join(data_dir, 'competition_test_bodies.csv')
)

list_of_tuples = list(zip(headlines, bodies, labels))
test_df = pd.DataFrame(list_of_tuples, columns=['text_a', 'text_b', 'labels'])
labels_test = pd.Series(test_df['labels']).to_numpy()

                                                                                

Then we create the instance on the Transformer model with Simple Transformers and train it. The *<font>TransformerModel</font>* constructor takes two parameters: model type and model name. All available model types and model names are listed on the Simple Transformers GitHub page: https://github.com/ThilinaRajapakse/simpletransformers.

We use *<font>`bert/bert-base-uncased`<font>*, *<font>`xlnet/xlnet-base-cased`<font>*, and *<font>`roberta/roberta-base`<font>* models. We set the learning rate to be 3e-5 for BERT and 1e-5 for XLNet and RoBERTa. (Use the validation set for the best hyper-parameters search.)

Let’s set maximum sequence length to be equal to 512 tokens: the maximum possible value to set given the parameters of pre-trained models, and the number of epoch to fine-tune the transformer to be 5.

In [None]:
sweep_config = {
    "method": "bayes",  # grid, random
    "metric": {"name": "train_loss", "goal": "minimize"},
    "parameters": {
        "learning_rate": {"min": 1e-5, "max": 2e-5},
    },
}

sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

In [None]:
from simpletransformers.model import TransformerModel
from simpletransformers.classification import ClassificationModel, ClassificationArgs

def train():
    # Initialize a new wandb run
    wandb.init()

    # Create a TransformerModel
    model = TransformerModel('roberta', 'roberta-base', num_labels=4, sweep_config=wandb.config, args={
        'learning_rate':1e-5,
        'num_train_epochs': 1,
        'reprocess_input_data': True,
        'overwrite_output_dir': True,
        'process_count': 10,
        'train_batch_size': 1,
        'eval_batch_size': 1,
        'max_seq_length': 32,
        'fp16': True,
        'gradient_accumulation_steps': 1,
        'tensorboard_dir': '/programing/programing_created-acer/fnc-1/train_',
        'wandb_project': 'fnc_roberta',
        'evaluate_during_training': True,
        'manual_seed': 4,
        'use_multiprocessing': True
    })

    # Train the model
    model.train_model(train_df, eval_df=test_df)

    # Evaluate the model
    model.eval_model(test_df)

    # Sync wandb
    wandb.join()


wandb.agent(sweep_id, train)

In [None]:
from simpletransformers.model import TransformerModel
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import wandb

wandb.init()
model = TransformerModel('roberta', 'roberta-base', num_labels=4, args={
    'learning_rate':1e-5,
    'num_train_epochs': 5,
    'reprocess_input_data': True,
    'overwrite_output_dir': True,
    'process_count': 10,
    'train_batch_size': 1,
    'eval_batch_size': 4,
    'max_seq_length': 256,
    'fp16': True,
    'gradient_accumulation_steps': 4,
    'tensorboard_dir': '/programing/programing_created-acer/fnc-1/train_',
    'wandb_project': 'fnc_roberta'
})

model.train_model(train_df)

BERT model fine-tuned, now we get predictions on the test set and evaluate fine-tuning results.

In [None]:
import numpy as np
_, model_outputs_test, _ = model.eval_model(test_df)

preds_test = np.argmax(model_outputs_test, axis=1)

  0%|          | 0/25413 [00:00<?, ?it/s]

Then we calculate averaged and class-wise F1 scores:

In [None]:
from sklearn.metrics import f1_score

def calculate_f1_scores(y_true, y_predicted):
    f1_macro = f1_score(y_true, y_predicted, average='macro')
    f1_classwise = f1_score(y_true, y_predicted, average=None, labels=[0, 1, 2, 3])

    resultstring = "F1 macro: {:.3f}".format(f1_macro * 100) + "% \n"
    resultstring += "F1 agree: {:.3f}".format(f1_classwise[0] * 100) + "% \n"
    resultstring += "F1 disagree: {:.3f}".format(f1_classwise[1] * 100) + "% \n"
    resultstring += "F1 discuss: {:.3f}".format(f1_classwise[2] * 100) + "% \n"
    resultstring += "F1 unrelated: {:.3f}".format(f1_classwise[3] * 100) + "% \n"

    return resultstring

calculate_f1_scores(preds_test, labels_test)

After that we can calculate FNC-1 (the metric proposed by FNC-1 organizers) and print the confusion matrix:

In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib
import matplotlib.pyplot as plt

LABELS = [0, 1, 2, 3]
RELATED = [0, 1, 2]

def print_confusion_matrix(cm):
    lines = ['CONFUSION MATRIX:']
    header = "|{:^11}|{:^11}|{:^11}|{:^11}|{:^11}|".format('', *LABELS)
    line_len = len(header)
    lines.append("-"*line_len)
    lines.append(header)
    lines.append("-"*line_len)
    hit = 0
    total = 0
    for i, row in enumerate(cm):
        hit += row[i]
        total += sum(row)
        lines.append("|{:^11}|{:^11}|{:^11}|{:^11}|{:^11}|".format(LABELS[i], *row))
        lines.append("-"*line_len)
    lines.append("ACCURACY: {:.3f}".format((hit / total)*100) + "%")
    print('\n'.join(lines))

def fnc_score_cm(predicted_labels, target):
    score = 0.0
    cm = [[0, 0, 0, 0],
          [0, 0, 0, 0],
          [0, 0, 0, 0],
          [0, 0, 0, 0]]
    for i, (g, t) in enumerate(zip(predicted_labels, target)):
            if g == t:
                score += 0.25
                if g != 3:
                    score += 0.50
            if g in RELATED and t in RELATED:
                score += 0.25

            cm[g][t] += 1
    return score,  cm

fnc_score, cm_test = fnc_score_cm(preds_test, labels_test)
print("\nRelative FNC Score: {:.3f}".format(100/13204.75*fnc_score) + "% \n")
print_confusion_matrix(cm_test)

Let’s calculate class-wise precision and recall.

In [None]:
from sklearn.metrics import classification_report

eval_report = classification_report(labels_test, preds_test, target_names=LABELS)
print('Test report', eval_report)

### Add by Me

In [None]:
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=LABELS)
disp.plot() 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, roc_auc_score, auc

fpr, tpr, threshold = roc_curve(preds_test, labels_test)
print(fpr, tpr, threshold)

auc1 = auc(fpr, tpr)

plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, color = 'orange', label = 'AUC = %0.2f' % auc1)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
print(plt.show())

In [None]:
import pickle

with open('train_roberta_0110.pkl', 'wb') as f:
    pickle.dump(model, f)

In [29]:
#predict_data = pd.read('')
predictions, raw_outputs = model.predict([["aaa", "bbb"]])

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

In [30]:
raw_outputs

array([[-0.65429688, -0.94433594, -0.08044434,  2.04296875]])

In [4]:
import pickle
with open('/programing/programing_created-acer/fnc-1/fnc-1/fnc_roberta_0116_general4.pkl', 'rb') as f:
    model = pickle.load(f)

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

In [2]:
model

<simpletransformers.classification.classification_model.ClassificationModel at 0x25c17a65070>

In [1]:
import csv
import pandas as pd

data = pd.read_csv('/programing/programing_created-acer/fnc-1/my_dataset/buzzfeed/buzzfeed_preds3.csv', index_col=0, encoding='utf-8')

preds = []
temp = []
for i in range(len(data)):
    temp = []
    temp.append(data.loc[i,'post2'])
    temp.append(data.loc[i,'article'])
    preds.append(temp)

In [3]:
predictions, raw_outputs = model.predict(preds)

  0%|          | 0/466 [00:00<?, ?it/s]

  0%|          | 0/117 [00:00<?, ?it/s]

In [4]:
raw_outputs

NameError: name 'raw_outputs' is not defined

In [20]:
predictions

[0,
 2,
 0,
 0,
 0,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 0,
 0,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 0,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 3,
 3,
 3,
 0,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 1,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,


In [4]:
agree = 0
disagree = 0
discuss = 0
unrelated = 0
for i in predictions:
    if i == 0:
        agree += 1
    elif i == 1:
        disagree += 1
    elif i == 2:
        discuss += 1
    elif i == 3:
        unrelated += 1
    else:
        print('error')

print(f'Agree: {agree}/Disagree: {disagree}/Discuss: {discuss}/Unrelated: {unrelated}')

Agree: 304/Disagree: 27/Discuss: 86/Unrelated: 49


In [23]:
# 引入套件
import tkinter as tk

# 建立主視窗和 Frame（把元件變成群組的容器）
window = tk.Tk()
top_frame = tk.Frame(window)

# 將元件分為 top/bottom 兩群並加入主視窗
top_frame.pack()
bottom_frame = tk.Frame(window)
bottom_frame.pack(side=tk.BOTTOM)

# 建立事件處理函式（event handler），透過元件 command 參數存取
def echo_hello():
    print('hello world :)')

# 以下為 top 群組
left_button = tk.Button(top_frame, text='Red', fg='red')
# 讓系統自動擺放元件，預設為由上而下（靠左）
left_button.pack(side=tk.LEFT)

middle_button = tk.Button(top_frame, text='Green', fg='green')
middle_button.pack(side=tk.LEFT)

right_button = tk.Button(top_frame, text='Blue', fg='blue')
right_button.pack(side=tk.LEFT)

# 以下為 bottom 群組
# bottom_button 綁定 echo_hello 事件處理，點擊該按鈕會印出 hello world :)
bottom_button = tk.Button(bottom_frame, text='Black', fg='black', command=echo_hello)
# 讓系統自動擺放元件（靠下方）
bottom_button.pack(side=tk.BOTTOM)

# 運行主程式
window.mainloop()

hello world :)
hello world :)
hello world :)
hello world :)


In [1]:
import pickle
with open('/programing/programing_created-acer/fnc-1/train_/fnc_roberta_0115_self1.pkl', 'rb') as f:
    model = pickle.load(f)

In [10]:
import tkinter as tk
import math
from simpletransformers.model import TransformerModel
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import pickle



window = tk.Tk()
window.title('2022TISF-190026')
window.geometry('1200x800')
window.configure(background='white')

def calculate_bmi_number():
    height = str(height_entry.get())
    weight = str(weight_entry.get())
    
    with open('D:/programing/programing_created-acer/fnc-1/train_/fnc_roberta_0115_self1.pkl', 'rb') as f:
        model = pickle.load(f)
    predictions, raw_outputs = model.predict([[height, weight]])
    
    result = ''
    if predictions == [0]:
        result = 'Agree'
    elif predictions == [1]:
        result = 'Disagree'
    elif predictions == [2]:
        result = 'Discuss'
    elif predictions == [3]:
        result = 'Unrelated'
    else:
        result = 'error'


   # result = '你的 BMI 指數為：{} {}'.format(bmi_value, get_bmi_status_description(bmi_value))
    result_label.configure(text=result)
'''
def get_bmi_status_description(bmi_value):
    if bmi_value < 18.5:
        return '體重過輕囉，多吃點！'
    elif bmi_value >= 18.5 and bmi_value < 24:
        return '體重剛剛好，繼續保持！'
    elif bmi_value >= 24 :
        return '體重有點過重囉，少吃多運動！'
'''
header_label = tk.Label(window, text='', bg='white')
header_label.pack()
header_label = tk.Label(window, text='', bg='white')
header_label.pack()
header_label = tk.Label(window, text='', bg='white')
header_label.pack()
header_label = tk.Label(window, text='', bg='white')
header_label.pack()
header_label = tk.Label(window, text='Preventing One-sided News on the Social Platform Using Deep Learning and Transfer Learning Methods')
header_label.pack()
header_label = tk.Label(window, text='', bg='white')
header_label.pack()
a = tk.Label(window, bg='white',text="")
a.pack(side=tk.TOP)
height_frame = tk.Frame(window, height=8)
height_frame.pack(side=tk.TOP)
height_label = tk.Label(height_frame, font=("Lucida Grande", 20), text='Sentence A')
height_label.pack(side=tk.LEFT)
height_entry = tk.Entry(height_frame, width=70)
height_entry.pack(side=tk.LEFT)
a = tk.Label(window, bg='white',text="")
a.pack(side=tk.TOP)
weight_frame = tk.Frame(window)
weight_frame.pack(side=tk.TOP)
weight_label = tk.Label(weight_frame, font=("Lucida Grande", 20), text='Sentence B')
weight_label.pack(side=tk.LEFT)
weight_entry = tk.Entry(weight_frame, width=70)
weight_entry.pack(side=tk.LEFT)

result_label = tk.Label(window, bg='white')
result_label.pack()

calculate_btn = tk.Button(window, text='RUN!!!', command=calculate_bmi_number)
calculate_btn.pack()

window.mainloop()

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

In [2]:
!nvidia-smi

Wed Jan 26 03:25:26 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 471.11       Driver Version: 471.11       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P8     4W /  N/A |    134MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [1]:
import torch
print(torch.__version__)
print(torch.cuda.is_available())

1.8.0
True
