In [None]:
! pip install --user datasets

# Fine-tuning FinBERT for sentiment analysis of financial news

FinBERT is a BERT model pre-trained on financial communication text. It has been shown that FinBERT outperforms traditional machine learning models on several financial NLP tasks [1]. The model is trained on a total corpora size of 4.9B tokens, and is available in the following flavours (all hosted at Huggingface 🤗):

* FinBERT-Pretrained: The pretrained FinBERT model on large-scale financial text.
* FinBERT-Sentiment: for sentiment classification task.
* FinBERT-ESG: for ESG classification task.
* FinBERT-FLS: for forward-looking statement (FLS) classification task.

This notebook uses code from [FinBERT.AI](https://finbert.ai/) to showcase the use of pre-trained models in Domino and to demonstrate the process of GPU-accelerated fine-tuning using Nvidia GPUs. We also use the [Sentiment Analysis for Financial News](https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news) dataset [2], which provides 4,837 samples of sentiments for financial news headlines from the perspective of a retail investor.

*[1] Yi Yang and Mark Christopher Siy UY and Allen Huang, FinBERT: A Pretrained Language Model for Financial Communications, 2020, [2006.08097](https://arxiv.org/abs/2006.08097)*

*[2] Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.*



## Simple demonstration of FinBERT

Let's start by loading the libraries that are needed for acessing and fine-tuning FinBERT.

In [None]:
import nvidia
import torch
import transformers

import numpy as np
import os
import pandas as pd 

from transformers import BertTokenizer, Trainer, BertForSequenceClassification, TrainingArguments, pipeline
from transformers import enable_full_determinism

from datasets import Dataset

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [None]:
cuda_install_dir = '/'.join(nvidia.__file__.split('/')[:-1]) + '/cuda_runtime/lib/'
os.environ['LD_LIBRARY_PATH'] =  cuda_install_dir

Let's make sure GPU acceleration is available.

In [None]:
if torch.cuda.is_available():
    print("GPU acceleration is available!")
else:
    print("GPU acceleration is NOT available! Training, fine-tuning, and inference speed will be adversely impacted.")
    
enable_full_determinism(True)

Let's now load FinBERT and classify a handful of test statments. The NLP pipeline produces a label and a prediction score.

In [None]:
model = BertForSequenceClassification.from_pretrained("yiyanghkust/finbert-tone",num_labels=3)
tokenizer = BertTokenizer.from_pretrained("yiyanghkust/finbert-tone")

nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["there is a shortage of capital, and we need extra financing",  
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]
results = nlp(sentences)

for sample in zip(sentences, results):
    print(sample)

## Financial news headlines dataset

Let's now load the Financial news dataset. The dataset has two attributes:

* **sentence** - the news headline
* **label** - sentiment, which we will encode as follows:
    * neutral  : 0
    * positive : 1
    * negative : 2
    
Let's process the dataset and show the first 5 samples.

In [None]:
# Load from CSV
df = pd.read_csv("all-data.csv", delimiter=",", encoding="latin-1", header=None).fillna("")
df = df.rename(columns=lambda x: ["label", "sentence"][x])

# Encode labels
df["label"] = df["label"].replace(["neutral","positive","negative"],[0,1,2]) 

# Print first 5
df.head()

Next, we check for missing values.

In [None]:
df.isnull().values.any()

It appears that there are no missing value in the data. We can now proceed with splitting it into a training, test, and validation sets.

In [None]:
df["sentence"].map(len).max()

### Preparing training, test, and validation subset

Next, we split the training dataset into a training, test, and validation subsets.

In [None]:
df_train, df_test, = train_test_split(df, stratify=df["label"], test_size=0.1, random_state=42)
df_train, df_val = train_test_split(df_train, stratify=df_train["label"],test_size=0.1, random_state=42)
print("Samples in train      : {:d}".format(df_train.shape[0]))
print("Samples in validation : {:d}".format(df_val.shape[0]))
print("Samples in test       : {:d}".format(df_test.shape[0]))

Now let's score the validation set using only the pretrained model.

In [None]:
sentences = df_test["sentence"].to_list()
results = nlp(sentences)

We can build a DataFrame with the ground truth and the prediction and see how the pretrained model is doing in terms of model performance.

In [None]:
results_df = pd.DataFrame.from_dict(results)
results_df["label"] = results_df["label"].replace(["Neutral", "Positive", "Negative"],[0,1,2]) 
results_df.columns = ["pred", "score"]
results_df.reset_index(drop=True, inplace=True)

results_df = pd.concat([df_test[["sentence", "label"]].reset_index(drop=True), results_df], axis=1)

results_df["Correct"] = results_df["label"].eq(results_df["pred"])

results_df.head()

We can calculate the accuracy of the predictions:

In [None]:
accuracy = len(results_df[results_df["Correct"] == True]) / len(results_df)

print("Accuracy : {:.2f}".format(accuracy))

We need to keep in mind that this is an imbalanced dataset, so it is good to look at the counts of the classes and the respective accuracy too: 

In [None]:
accuracy_df = pd.concat([results_df["label"].value_counts(), results_df.groupby("label")["Correct"].mean().mul(100).round(2)], axis=1)
accuracy_df = accuracy_df.reset_index()
accuracy_df.columns = ["Label", "Count", "Accuracy"]
accuracy_df.head()

## Model Fine-tunning

The fine-tunning process takes the pretrained model (FinBERT) and performs additional training, tweaking it towards a more specialized use-case. Here, we'll use the training subset of the Sentiment Analysis for Financial News. This transfer learning approach will enables us to produce a more accurate model with a smaller training time.

### Datasets preparation

First, we need to prepare the three datasets (training, validation, and test) by tokenizing them and by setting the dataset format to be compatible with PyTorch.

In [None]:
dataset_train = Dataset.from_pandas(df_train)
dataset_val = Dataset.from_pandas(df_val)
dataset_test = Dataset.from_pandas(df_test)

dataset_train = dataset_train.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", max_length=315), batched=True)
dataset_val = dataset_val.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length", max_length=315), batched=True)
dataset_test = dataset_test.map(lambda e: tokenizer(e["sentence"], truncation=True, padding="max_length" , max_length=315), batched=True)

dataset_train.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])
dataset_val.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])
dataset_test.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", "label"])


### Setting up and training

Next, we define the training metrics and some additional customization points like training epochs, size of batches etc.

In [None]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy" : accuracy_score(predictions, labels)}

args = TrainingArguments(
        output_dir = "temp/",
        evaluation_strategy = "epoch",
        learning_rate=0.00001,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=32,
        num_train_epochs=1,
        weight_decay=0.01,
        metric_for_best_model="accuracy",
        save_total_limit = 2,
        save_strategy = "no",
        load_best_model_at_end=False,
        report_to = "none",
        optim="adamw_torch")

trainer = Trainer(
        model=model,
        args=args,
        train_dataset=dataset_train,
        eval_dataset=dataset_val,
        compute_metrics=compute_metrics)

We can now perform the training.

**Note that you will need a hardware tier with sufficient memory and compute, ideally a HW tier which provides GPU acceleration. Otherwise the training process can take a substantial amount of time or crash due to not having access to enough system memory**

In [None]:
trainer.train()  

### Model evaluation

We can now test the accuracy of the model using the test set.

In [None]:
accuracy_test = trainer.predict(dataset_test).metrics["test_accuracy"]
print("Accuracy on test: {:.2f}".format(accuracy_test))

### Saving the fine-tuned model

Finally, we can save the fine-tuned model and used it for online predictions via a [Model API](https://docs.dominodatalab.com/en/latest/user_guide/8dbc91/host-models-as-rest-apis/).

In [None]:
'''
Please change this location accordingly. You might want to change this depending on whether you are using a git based project
or a DFS based project and if you want to use this model
''' 
trainer.save_model("/mnt/artifacts/finbert-sentiment/")