# Corporate FinTech Assignment 2.

Marton Nemeth - manem18@student.sdu.dk

Michal Minarcik - mimin18@student.sdu.dk

Nicolas Hamann Hiemstra von Arenstorff - niare21@student.sdu.dk

Toheed Adeola Musa - amusa18@student.sdu.dk

Zalan Taller - zatal21@student.sdu.dk

## Exercise 1.

Textual analysis or natural language processing (NLP) is a type of qualitative analysis increasingly applied more in finance research. In the field of behavioral finance, the researchers have been studying how sentiment impacts individual investors, institutions, and markets. Two types of sentiment have been identified: the investor sentiment and text-based or textual sentiment. Our focus falls solely on the latter, which measures the degree of positivity or negativity in texts. An interesting use case of the qualitative information obtained from textual analysis is the inclusion into equity asset pricing models, as it provides another perspective and potentially complementary information to quantitative information measures in the price formation process (Kearney and Liu, 2014).

The qualitative information used by textual sentiment researchers in finance, comes predominantly from media articles, public corporate disclosures, and social media platforms.
To measure the tone of the documents, the researchers commonly use either the dictionary-based approach or machine learning.

The dictionary-based approach uses a mapping algorithm in which a computer program reads text and classifies words, phrases, or sentences into groups based on pre-defined dictionary categories (Li (2010)). One of the difficulties of this approach is that English words often have many meanings and a word classifier developed for one discipline might not produce good results for another. Loughran and McDonald (2011) show that the commonly used source for word classifications, the Harvard-IV-4 TagNeg (H4N) list, misclassifies words when assessing the tone in financial applications. The other common issue is how each word in the word list should be weighted.

The machine learning approach relies on statistical techniques to infer the content of documents and to classify them based on statistical inference (Li, 2010). This method works as follows. A part of the text to be analyzed is chosen as the “training set”. Each word from this set is manually classified as ‘positive’, ‘negative’, or some other dimension of sentiment. A sentiment analysis algorithm, such as Naïve Bayesian algorithm, is then trained on this training set. The algorithm learns the sentiment classification rules from the pre-classified data set and applies these rules out-of-sample for the whole text to derive textual sentiment scores (Kearney and Liu, 2014).

The Naive Bayesian has numerous advantages, Firstly, it can be seen as one of the most aged, well-established procedures to examine tests. secondly, With the use of machine algorithms, large corpuses of the data can be easily included in the test analysis, Thirdly, after the rules of gauging the test is specified, no supplementary researcher subjectivity impacts the measuring of tone in the business communication document while the weakness of Naive Bayesian is the complexity/numerous ways of establishing the gauging rules to measure the context of the document that may cause the difficulty of other users to replicate the results.

The early use of the Naive Bayes approach dates back to 2004 when Antweiler and Frank examine 1.5 million stock messages posted on Yahoo Finance. They find that the number of posted messages is subsequently linked with the stock return volatility ( Antweiler and Frank [2004]).
Another similar approach is Das and Chen [2007] which use NLP to measure sentiment in message postings for 24 high tech stocks. This research finds that stock message board postings are related to stock market levels, trading volume and volatility.

A further approach of the Naïve Bayes method was by Li [2010b] that examine the content of the forward-looking statement in the MD&A section of the 10-K. The research used trained naive Bayes learning algorithms that were 30,000 randomly selected sentences coded manually to finds that the average tone of the FLS is positively linked with subsequent earnings.

## Exercise 2.

In [198]:
import pandas as pd
import json
from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier
import random

For every non-missing rating review in the ICO dataset “icoData_19092018.json” and all ICOs that have ended (the ICO end date < 19-September-2018) we delete all ICO observations that have no ICO end date.

In [199]:
filename = 'icoData_19092018.json'

with open(filename) as json_data:
    icoData = json.load(json_data)

# Deleting the rows without any information.
icoData = [i for i in icoData if not len(i) == 1]

In [200]:
d_dates = [d['dates'] for d in icoData]
d_ratings = [d['ratings'][0] for d in icoData]

df_d_dates = pd.DataFrame.from_dict(d_dates)
df_d_ratings = pd.DataFrame.from_dict(d_ratings)

In [201]:
df = pd.concat([df_d_dates, df_d_ratings], axis = 1)

df_icoData = pd.DataFrame(icoData)
df['overall_rating'] = df_icoData['rating']

df['icoEnd'] = pd.to_datetime(df['icoEnd'], errors = 'coerce')
df = df[(df['icoEnd'] < '2018-09-19')]
nan_value = float('NaN')
df.replace('', nan_value, inplace = True)
df.dropna(subset = ['review'], inplace = True)
df.dropna(subset = ['icoEnd'], inplace = True)

In [202]:
reviews = df['review']

In [203]:
reviews_str = str(reviews)

### 2.a.)
We calculate the polarity scores using the pre-trained TextBlob classifier. We map polarity scores [-1, -0.25] to class “negative”, [-0.25, 0,25] to class “neutral”, and [0.25, 1] to class “positive” We report summary statistics for the resulting classes (mean, 25%-quartile, median, 50%-quartile, minimum, and maximum, and report the total number of observations), then we interpret the results!

In [204]:
overall_sentiment = TextBlob(reviews_str).sentiment
overall_sentiment

Sentiment(polarity=0.28750000000000003, subjectivity=0.6433333333333333)

In [205]:
# Function to get the polarity scores.

def getPolarity(reviews):
    return TextBlob(reviews).sentiment.polarity

In [206]:
# Creating a new column called 'Polarity'.

df['Polarity'] = df['review'].apply(getPolarity)

In [207]:
# Function to label each review basis the polarity score.

def getAnalysis(score):
    if score <= -0.25:
        return 'Negative'
    elif score <= 0.25:
        return 'Neutral'
    else:
        return 'Positive'

In [208]:
# Creating a new column called 'Sentiment'.

df['Sentiment'] = df['Polarity'].apply(getAnalysis)

In [209]:
sentiment_grouped = df.groupby('Sentiment')

In [210]:
# Custom function to report summary statistics for the resulting classes.

def describe(df):
    return pd.concat([df.describe(), df.median().rename('median')], axis = 1)

In [211]:
summary_statistics = describe(sentiment_grouped['Polarity'])
summary_statistics

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max,median
Sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Negative,12.0,-0.40447,0.105419,-0.5,-0.5,-0.436198,-0.295312,-0.25,-0.436198
Neutral,284.0,0.081064,0.11213,-0.24375,0.0,0.1,0.166167,0.25,0.1
Positive,234.0,0.501137,0.191846,0.250213,0.360625,0.435,0.616667,1.0,0.435


The pre-trained TextBlob classifier mapped the polarity scores of 530 reviews. We classified each review as negative, neutral or positive basis the polarity score. For each of the resulting classes we have the above descriptive statistics.

The negative class ranges from -0.5 to -0.28 where the mean and the median are similar, although only 10 reviews were classified negative.
The neutral class ranges from -0.25 to 0.249 where the mean and the median are also similar. The majority of the reviews were classified neutral.
The positive ranges from 0.25 to 1 where the mean and the median are also similar. 240 reviews were classified positive.

Assuming that the pre-trained TextBlob classifier is a trustworthy, we could argue that people do not write negative reviews in general, as only 10 reviews were classified negative, the majority of the reviews were classified neutral, and the number of positive reviews is 19x the number of negative ones.

### 2.b.)
We randomly draw 300 non-missing rating reviews and labelws them into three classes: “positive”, “neutral”, and “negative”. We used the first two thirds of the observations as training and the remaining third as test dataset. We used our training dataset to train a Naive Bayes classifier using TextBlob. We then used this classifier to classify the rating reviews in the test dataset and calculate the accuracy (precision, recall, and F1) metric for the test dataset. We interpreted the results.

In [212]:
reduced = df.loc[:, ['Sentiment', 'review']].copy()
reduced.rename(columns = {'Sentiment' : 'target'}, inplace = 1)

Creating the train and test datasets from altogether 300 reviews. For the TextBlob classifier we need a list of doubles (string, target).

In [213]:
reviews_zipped = [(s, t) for s, t in zip(reduced.review, reduced.target)]

In [214]:
# Randomly sampling 200 reviews for the train dataset.
random.seed(5)

train = random.sample(reviews_zipped, 200)

In [215]:
reviews_without_train = [review for review in reviews_zipped if review not in train]

In [216]:
# Randomly sampling 100 reviews for the test dataset.
random.seed(5)

test = random.sample(reviews_without_train, 100)

In [217]:
# Training the Naive Bayes Classifier on 200 reviews.

classifier = NaiveBayesClassifier(train)
classifier

<NaiveBayesClassifier trained on 200 instances>

In [218]:
# Calculating the accuracy metric for the test dataset.

accuracy_test = classifier.accuracy(test)
accuracy_test

0.53

In [219]:
predicted_classifications = []

def make_predictions():
    for i in range(len(test)):
        classification = classifier.classify(test[i][0])
        predicted_classifications.append(classification)

make_predictions()

def accuracy():
    countCorrect = 0
    for i in range(len(test)):
        if predicted_classifications[i] == test[i][1]:
            countCorrect += 1
    accuracy = int((countCorrect / len(predicted_classifications)) * 100)
    print('Classifier is correct', accuracy, 'of the time.')

accuracy()

Classifier is correct 53 of the time.


In [220]:
def return_incorrect():
    incorrectIndexes = []
    for i in range(len(test)):
        if predicted_classifications[i] != test[i][1]:
            incorrectIndexes.append(i)
    print('Classifier is incorrect', len(incorrectIndexes), 'of the time.' )

return_incorrect()


Classifier is incorrect 47 of the time.


In [221]:
predicted = pd.Series(predicted_classifications)

In [222]:
actual = pd.Series([x[1] for x in test])
crosstab = pd.crosstab(actual, predicted, rownames = ['Actual'], colnames = ['Predicted'], margins = True)
crosstab

Predicted,Negative,Neutral,Positive,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Negative,0,0,2,2
Neutral,1,14,44,59
Positive,0,0,39,39
All,1,14,85,100


In [223]:
precision = crosstab.iloc[2].iloc[2] / crosstab.iloc[3].iloc[2]
recall = crosstab.iloc[2].iloc[3] / (crosstab.iloc[2].iloc[3] + crosstab.iloc[1].iloc[0])
f1 = 2 * ((precision * recall) / (precision + recall))

In [224]:
evaluation = pd.DataFrame(data = [precision, recall, f1]).T
evaluation.columns = ['Precision', 'Recall', 'F1']
evaluation

Unnamed: 0,Precision,Recall,F1
0,0.458824,0.975,0.624


Precision = TP / (TP + FP) = 39 / (39 + 44 + 2) = 39 / 85 = 0.459 = 45.9%

Recall = TP / (TP + FN) = 39 / (39 + 1) = 39 / 40 = 0.975 = 97.5%

F1 = 2 * ((Precision * Recall) / (Precision + Recall)) = 2 * ((0.459 * 0.975) / (0.459 + 0.975)) = 2 * (0.447525 / 1.434) = 2 * 0.312= 0.624 = 62.4%

### 2.c.)
We used our classifier from b.) to classify all non-missing rating reviews (that are not used as training and test data) and reported the same summary statistics as in a.).
We interpreted the results.

In [225]:
remaining_reviews = [review for review in reviews_without_train if review not in test]

In [226]:
# Calculating the accuracy metric for the rest of the reviews.

accuracy_remaining_reviews = classifier.accuracy(remaining_reviews)
accuracy_remaining_reviews

0.5963302752293578

In [227]:
df_remaining_reviews = pd.DataFrame(remaining_reviews, columns = ['review', 'target'])

In [228]:
remaining_reviews_str = str(df_remaining_reviews['review'])

In [229]:
sentiment = TextBlob(remaining_reviews_str).sentiment
sentiment

Sentiment(polarity=0.45173076923076927, subjectivity=0.5815384615384616)

In [230]:
df_remaining_reviews['Polarity'] = df_remaining_reviews['review'].apply(getPolarity)

In [231]:
df_remaining_reviews['Sentiment'] = df_remaining_reviews['Polarity'].apply(getAnalysis)

In [232]:
new_sentiment_grouped = df_remaining_reviews.groupby('Sentiment')

In [233]:
new_summary_statistics = describe(new_sentiment_grouped['Polarity'])
new_summary_statistics

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max,median
Sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Negative,5.0,-0.45625,0.097828,-0.5,-0.5,-0.5,-0.5,-0.28125,-0.5
Neutral,112.0,0.090806,0.103465,-0.17,0.0,0.095402,0.173256,0.25,0.095402
Positive,101.0,0.494325,0.18747,0.26,0.35,0.436667,0.55,1.0,0.436667


The pre-trained TextBlob classifier mapped the polarity score of the 227 reviews that are neither in the train nor in the test dateset. We classified each review as negative, neutral or positive basis the polarity score. For each of the resulting classes we have the above descriptive statistics.

The negative class ranges from -0.5 to -0.3 where the mean and the median are similar, although only 6 reviews were classified negative.
The neutral class ranges from -0.24 to 0.24 where the mean and the median are almost the same. The majority of the reviews were classified neutral.
The positive ranges from 0.25 to 1 where the mean and the median are more or less similar. 101 reviews were classified positive.

Once again assuming that the pre-trained TextBlob classifier is trustworthy, we could argue that people do not write negative reviews in general, as only 6 reviews were classified negative, the majority of the reviews were classified neutral, and the number of positive reviews is almost 17x the number of negative ones.

### 2.d.)
We compared the pre-trained classification from a.) with our classification in b.) for the 300 observations in b.). We calculated the accuracy (precision, recall, and F1) metric for the pre-trained classifier. We interpreted the results.

In [234]:
predicted_classifications_remaining_reviews = []

def make_predictions():
    for i in range(len(remaining_reviews)):
        classification = classifier.classify(remaining_reviews[i][0])
        predicted_classifications_remaining_reviews.append(classification)

make_predictions()

def accuracy():
    countCorrect = 0
    for i in range(len(remaining_reviews)):
        if predicted_classifications_remaining_reviews[i] == remaining_reviews[i][1]:
            countCorrect += 1
    accuracy = int((countCorrect / len(predicted_classifications_remaining_reviews)) * 100)
    print('Classifier is correct', accuracy, 'of the time.')

accuracy()

Classifier is correct 59 of the time.


In [235]:
predicted_remaining_reviews = pd.Series(predicted_classifications_remaining_reviews)

In [236]:
actual_remaining_reviews = pd.Series([x[1] for x in remaining_reviews])
crosstab_remaining_reviews = pd.crosstab(actual_remaining_reviews, predicted_remaining_reviews, rownames = ['Actual'], colnames = ['Predicted'], margins = True)
crosstab_remaining_reviews

Predicted,Negative,Neutral,Positive,All
Actual,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Negative,0,0,5,5
Neutral,2,37,73,112
Positive,0,8,93,101
All,2,45,171,218


In [237]:
precision_remaining_reviews = crosstab_remaining_reviews.iloc[2].iloc[2] / crosstab_remaining_reviews.iloc[3].iloc[2]
recall_remaining_reviews = crosstab_remaining_reviews.iloc[2].iloc[2] / (crosstab_remaining_reviews.iloc[2].iloc[2] + crosstab_remaining_reviews.iloc[1].iloc[0])
f1_remaining_reviews = 2 * ((precision_remaining_reviews * recall_remaining_reviews) / (precision_remaining_reviews + recall_remaining_reviews))

In [238]:
evaluation_remaining_reviews = pd.DataFrame(data = [precision_remaining_reviews, recall_remaining_reviews, f1_remaining_reviews]).T
evaluation_remaining_reviews.columns = ['Precision', 'Recall', 'F1']
evaluation_remaining_reviews

Unnamed: 0,Precision,Recall,F1
0,0.54386,0.978947,0.699248


Precision = TP / (TP + FP) = 93 / (93 + 73 + 5) = 93 / 171 = 0.5438 = 54.38%

Recall = TP / (TP + FN) = 93 / (93 + 2) = 93 / 95 = 0.9789 = 97.89%

F1 = 2 * ((Precision * Recall) / (Precision + Recall)) = 2 * ((0.5438 * 0.9789) / (0.5438 + 0.9789)) = 2 * (0.5323 / 1.5227) = 2 * 0.34957= 0.699 = 69.9%