FinBERT (Financial Bidirectional Encoder Representations from Transformer) is a pretrained transformer model trained on financial corpora. Some of the objective of the Fin-BERT: Sentiment analysis, financial question-answering, and financial document classification.

The typical large dataset of financial documents are news article, datasets of financial documents, regulatory filings, and earning reports.

The model is developed by NVIDIA that is specifically designed to analyse financial text

Reference: [link](https://medium.com/codex/stocks-news-sentiment-analysis-with-deep-learning-transformers-and-machine-learning-cdcdb827fc06#:~:text=Sentiment%20Analysis%20and%20Transformers,model%20trained%20on%20financial%20corpora.)

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch
import pandas as pd

In [2]:
# dummy dataframe of headline for sentiment result
df = pd.read_csv('../data/check_sentiment.csv', delimiter=';')

# create a tokenizer object
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")

# fetch the pretrained model 
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")

In [3]:
df.head()

Unnamed: 0,No,headline
0,1,Barclays Faces Criticism Over Financing of Fos...
1,2,Lloyds Banking Group to Invest £1 Billion in S...
2,3,NatWest Group Faces Shareholder Criticism Over...
3,4,Standard Chartered Publishes First ESG Report ...
4,5,Virgin Money UK Introduces Gender Pay Gap Targ...


In [4]:
# A headline to be used as input 
headline = "Microsoft fails to hit profit expectations"

# Pre-process input phrase
input = tokenizer(headline, padding = True, truncation = True, return_tensors='pt')
# Run inference on the tokenized phrase
output = model(**input)

# Pass model output logits through a softmax layer.
sentim_scores = torch.nn.functional.softmax(output.logits, dim=-1)
sentim_scores

tensor([[0.0341, 0.9329, 0.0330]], grad_fn=<SoftmaxBackward0>)

In [5]:
def sentim_analyzer(df, tokenizer, model):
    ''' Given a df that contains a column 'headline' with article healine texts, it runs inference on the healine with the 'model' (FinBert) 
       and inserts output sentiment features into the dataframe in the respective columns (Positive_sentim, Negative_sentim, Neutral_sentim)
       
        Parameters :
          df : A dataframe that contains headlines in a column called 'headline' . 
          tokenizer(AutoTokenizer object) : A pre-processing tokenizer object from Hugging Face lib. 
          model (AutoModelForSequenceClassification object) : A hugging face transformer model.     
          
          returns df : The initial dataframe with the 3 sentiment features as columns for each headline'''
    
    for i in df.index:
        try:
            headline = df.loc[i, 'headline']
        except:
            return print(' \'headline\' column might be missing from dataframe')
        # Pre-process input phrase
        input = tokenizer(headline, padding = True, truncation = True, return_tensors='pt')
        # Estimate output
        output = model(**input)
        # Pass model output logits through a softmax layer.
        predictions = torch.nn.functional.softmax(output.logits, dim=-1)
        df.loc[i, 'Positive'] = predictions[0][0].tolist()
        df.loc[i, 'Negative'] = predictions[0][1].tolist()
        df.loc[i, 'Neutral']  = predictions[0][2].tolist()
    # rearrange column order
    try:
        df = df[['date', 'stock', 'Open', 'Close', 'Volume',  'headline', 'Positive', 'Negative', 'Neutral','Price_change']]
    except:
        pass
    return df

In [6]:
sentim_analyzer(df, tokenizer, model)

Unnamed: 0,No,headline,Positive,Negative,Neutral
0,1,Barclays Faces Criticism Over Financing of Fos...,0.011071,0.965152,0.023777
1,2,Lloyds Banking Group to Invest £1 Billion in S...,0.579048,0.008594,0.412359
2,3,NatWest Group Faces Shareholder Criticism Over...,0.009035,0.96541,0.025555
3,4,Standard Chartered Publishes First ESG Report ...,0.087343,0.019687,0.89297
4,5,Virgin Money UK Introduces Gender Pay Gap Targ...,0.07424,0.037758,0.888003
5,6,Secure Trust Bank to Phase Out Funding for Fos...,0.16895,0.053644,0.777406
6,7,Barclays Sets Ambitious Climate Targets as Par...,0.083503,0.012481,0.904016
7,8,Lloyds Banking Group Commits to Net Zero by 20...,0.714302,0.009147,0.276551
8,9,NatWest Group Joins Net Zero Banking Alliance ...,0.091474,0.009597,0.898929
9,10,Standard Chartered Partners with UNICEF to Pro...,0.630756,0.007854,0.36139
