Reference papers/sources

https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.html
https://www.codingninjas.com/studio/library/nrc-lexicon-in-python
https://github.com/metalcorebear/NRCLex
https://github.com/Priya22/EmotionDynamics

for mapping the emotions
https://www.researchgate.net/publication/265596754_The_Emotionality_of_Sonic_Events_Testing_the_Geneva_Emotional_Music_Scale_GEMS_for_Popular_and_Electroacoustic_Music

documentation
https://pypi.org/project/NRCLex/

## Installing Dependencies

In [None]:
!pip3 install tensorflow
!pip3 install NRCLex
!pip3 install opencv-python

Collecting NRCLex
  Downloading NRCLex-4.0-py3-none-any.whl (4.4 kB)
INFO: pip is looking at multiple versions of nrclex to determine which version is compatible with other requirements. This could take a while.
  Downloading NRCLex-3.0.0.tar.gz (396 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m396.4/396.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: NRCLex
  Building wheel for NRCLex (setup.py) ... [?25l[?25hdone
  Created wheel for NRCLex: filename=NRCLex-3.0.0-py3-none-any.whl size=43310 sha256=910026477edf883fabf821a52e7cb6f10926cc9deb015200d4b048594499ee54
  Stored in directory: /root/.cache/pip/wheels/d2/10/44/6abfb1234298806a145fd6bcaec8cbc712e88dd1cd6cb242fa
Successfully built NRCLex
Installing collected packages: NRCLex
Successfully installed NRCLex-3.0.0


In [None]:
# Import the Required Packages

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np

## Step 1 – Data Loading and Processing

### 1.1 Data Loading

In [None]:
# Load the song file
import pandas as pd
df = pd.read_csv('SingleLabel.csv')

In [None]:
df.head()

Unnamed: 0,artist,genre,title,album,year,lyrics,label
0,Nirvana,Rock,You Know You’re Right,Nirvana,2002.0,I will never bother you\nI will never promise ...,Sadness
1,Damian Marley,Reggae,Here We Go,Stony Hill,2017.0,Here we go\nMy big ego is gonna get me in trou...,Tension
2,The Mission UK,Rock,Jade,Another Fall from Grace,2016.0,She came as Lolita dressed as Venus\nAnd adorn...,Tenderness
3,UB40,Reggae,Food For Thought,Signing Off,1980.0,"Ivory Madonna, dying in the dust\nWaiting for ...",Sadness
4,Johnny Cash,Country,I’ve Been Everywhere,American II: Unchained,1996.0,I was totin' my pack along the dusty Winnemucc...,Sadness


In [None]:
# load the dataset
# texts = df.loc[:, df.columns != 'label']
lyrics = df.loc[:,'lyrics'].values

In [None]:
df.loc[:,'label'].unique()

array(['Sadness', 'Tension', 'Tenderness'], dtype=object)

In [None]:
emotion_dictionary = {'Sadness': 0, 'Tension': 1, 'Tenderness': 2}

In [None]:
df['new_label'] = df['label'].replace(emotion_dictionary)

In [None]:
labels = df.loc[:,'new_label'].values

In [None]:
labels

array([0, 1, 2, ..., 1, 2, 0])

In [None]:
lyrics

array(["I will never bother you\nI will never promise to\nI will never follow you\nI will never bother you\nNever speak a word again\nI will crawl away for good\nI will move away from here\nYou won't be afraid of fear\nNo thought was put into this\nAnd always knew it would come to this\nThings have never been so swell\nI have never failed to fail\n\nHe-eee-eee-eeey\nHe-eee-eee-eey\nHe-eee-eee-ey\nYou know you're right\nYou know you're right\nYou know you're right\n\nI'm so warm and calm inside\nI no longer have to hide\nLet's talk about someone else\nSteaming soup against her mouth\nNothing really bothers her\nShe just wants to love himself\nI will move away from here\nYou won't be afraid of fear\nNo thought was put into this\nAlways knew it'd come to this\nThings have never been so swell\nI have never failed to fail\n\nHe-eee-eee-eey\nHe-eee-eee-eey\nHe-eee-eee-eey\nHe-eee-eee-eey\nHe-eee-eee-eey\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're rig

In [None]:
np.unique(labels)

array([0, 1, 2])

In [None]:
# Check the maximum length of texts
max_len = -1
num_songs = 0
total_len = 0
for example in lyrics:
    num_songs += 1
    total_len += len(example.split())
    if len(example.split()) > max_len:
        max_len = len(example.split())

print('Average len of songs is ', total_len/num_songs)
print('the maximum length of the lyrics inputs is ', max_len)

Average len of songs is  347.5155172413793
the maximum length of the lyrics inputs is  2061


### 1.2 Data Processing


In [None]:
# Convert the texts and labels into numeric tensors

maxlen = 500
max_words = 50000  # We will only consider the top 10,000 words (vocabulary) in the dataset (dictionary)

tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(lyrics)
sequences = tokenizer.texts_to_sequences(lyrics)

word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))

data = pad_sequences(sequences, maxlen=maxlen)

# labels = np.asarray(labels)
# print('Shape of data tensor:', data.shape)
# print('Shape of label tensor:', labels.shape)
print()

Found 17125 unique tokens.



## Step 2 – Detecting emotion using NRCLex


### Installing further dependencies for NRCLex


In [None]:
import pandas as pd
import gzip
import json
import nltk
from nrclex import NRCLex
from nltk.tokenize import sent_tokenize, word_tokenize
import csv
from nltk.stem import WordNetLemmatizer
import copy
import math
import sys
import matplotlib.pyplot as plt
import numpy as np
import cv2

In [None]:
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

### Categorisation Converting and Weightage
- "amazement" - surprise 0.5, joy 0.3, trust 0.1, positive 0.1
- "calmness" - trust 0.6, anticip 0.2, joy 0.1, positive 0.1
- "joyful activation" - joy 0.5, trust 0.3, surprise 0.1, positive 0.1
- "solemnity" - sadness 0.6, trust 0.2, fear 0.1, negative 0.1
- "nostalgia" - sadness 0.3, joy 0.3, trust 0.3, positive 0.1
- "power" - fear 0.4, anticip 0.3, trust 0.2,  positive 0.1
- "tenderness" - trust 0.6, anticip 0.2, joy 0.1, positive 0.1
- "tension" - anger 0.5, anticip 0.3, disgust 0.1, negative 0.1
- "sadness" - sadness 0.5, disgust 0.3, fear 0.1, negative 0.1

EMOTIONS=["fear", "anger", "anticip", "trust", "surprise", "positive", "negative", "sadness", "disgust", "joy"]

In [None]:
EMOTIONS=["fear", "anger", "anticip", "trust", "surprise", "positive", "negative", "sadness", "disgust", "joy"]
# converted based on my own intuition lol
# also matched based on steph's research paper
# https://www.researchgate.net/publication/265596754_The_Emotionality_of_Sonic_Events_Testing_the_Geneva_Emotional_Music_Scale_GEMS_for_Popular_and_Electroacoustic_Music
converting_emotions = {
    "amazement": {
        "surprise": 0.5,
        "joy": 0.3,
        "trust": 0.1,
        "positive": 0.1,
    },
    "calmness": {
        "trust": 0.5,
        "anticip": 0.2,
        "joy": 0.1,
        "positive": 0.1,
    },
    "joyful activation": {
        "joy": 0.5,
        "trust": 0.3,
        "surprise": 0.1,
        "positive": 0.1,
    },
    "solemnity": {
        "sadness": 0.6,
        "trust": 0.2,
        "fear": 0.1,
        "negative": 0.1,
    },
    "nostalgia": {
        "sadness": 0.3,
        "joy": 0.3,
        "trust": 0.3,
        "positive": 0.1,
    },
    "power": {
        "fear": 0.4,
        "anticip": 0.3,
        "trust": 0.2,
        "positive": 0.1,
    },
    "tenderness": {
        "trust": 0.6,
        "anticip": 0.2,
        "joy": 0.1,
        "positive": 0.1,
    },
    "tension": {
        "anger": 0.5,
        "anticip": 0.3,
        "digust": 0.1,
        "negative": 0.1,
    },
    "sadness": {
        "sadness": 0.5,
        "disgust": 0.3,
        "fear": 0.1,
        "negative": 0.1,
    },
}

converted_emo = list(converting_emotions.keys())
converted_emo

['amazement',
 'calmness',
 'joyful activation',
 'solemnity',
 'nostalgia',
 'power',
 'tenderness',
 'tension',
 'sadness']

### Function for getting emotion from text

In [None]:
'''
takes in text ands output values for each emotion fear, anger, anticipation, trust, surprise, positive, negative, sadness, disgust, joy
also handels a bug in the library where anticipation may be 0.0 and the true value is given at the end of the tuplr
'''
def get_emotion(text):
    # Create object
    emotion = NRCLex(text)

    # Classify emotion
    emotions_values_dic=emotion.affect_frequencies
    #print('\n\n', text[i], ': ', emotions_values)
    # top_emotions = emotion.top_emotions

    emotion_val_list=[]
    new_emotion_dict = {}

    for i in range(10):
        emotion_val_list.append(emotions_values_dic[EMOTIONS[i]])

    #handle bug
    if len(emotions_values_dic)>10:
        #print("bug")
        emotion_val_list[2]=emotions_values_dic["anticipation"]

    for i in range(len(emotion_val_list)):
      new_emotion_dict[EMOTIONS[i]] = emotion_val_list[i]

    sorted_emotions = sorted(new_emotion_dict.items(), key=lambda x:x[1], reverse=True)
    converted_dict = dict(sorted_emotions)

    return converted_dict


e=get_emotion("I will never bother you\nI will never promise to\nI will never follow you\nI will never bother you\nNever speak a word again\nI will crawl away for good\nI will move away from here\nYou won't be afraid of fear\nNo thought was put into this\nAnd always knew it would come to this\nThings have never been so swell\nI have never failed to fail\n\nHe-eee-eee-eeey\nHe-eee-eee-eey\nHe-eee-eee-ey\nYou know you're right\nYou know you're right\nYou know you're right\n\nI'm so warm and calm inside\nI no longer have to hide\nLet's talk about someone else\nSteaming soup against her mouth\nNothing really bothers her\nShe just wants to love himself\nI will move away from here\nYou won't be afraid of fear\nNo thought was put into this\nAlways knew it'd come to this\nThings have never been so swell\nI have never failed to fail\n\nHe-eee-eee-eey\nHe-eee-eee-eey\nHe-eee-eee-eey\nHe-eee-eee-eey\nHe-eee-eee-eey\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\nYou know you're right\n(He-eee-eee-eey)\nYou know your rights\n(He-eee-eee-eey)\nYou know your rights\n(He-eee-eee-eey)\nYou know your rights\n(He-eee-eee-eey)")
print(EMOTIONS)
print(e)
len(e)

['fear', 'anger', 'anticip', 'trust', 'surprise', 'positive', 'negative', 'sadness', 'disgust', 'joy']
{'positive': 0.21212121212121213, 'negative': 0.21212121212121213, 'fear': 0.15151515151515152, 'anticip': 0.09090909090909091, 'trust': 0.09090909090909091, 'joy': 0.09090909090909091, 'anger': 0.06060606060606061, 'surprise': 0.06060606060606061, 'disgust': 0.030303030303030304, 'sadness': 0.0}


10

In [None]:
import random

def get_top_emotions(emotion_dict):
  new_dict = {}
  e_dict = emotion_dict
  if e_dict["positive"] > e_dict["negative"]:
    e_dict.pop("negative")
  elif e_dict["positive"] < e_dict["negative"]:
    e_dict.pop("positive")
  else:
    r = random.randint(0,1)
    if r == 0:
      e_dict.pop("positive")
    else:
      e_dict.pop("negative")

  for k, v in e_dict.items():
    if len(new_dict) > 4:
      continue
    else:
      new_dict[k] = v
  return new_dict

top_emotions = get_top_emotions(e)

We will be randomising the positive and negative weightage if both are equal

In [None]:
def matching_emotions(top_emotions_dict):
  single_label = ""
  max_count = 0
  for k, v in converting_emotions.items():
    count = 0
    for vk, vv in v.items():
      if vk in top_emotions_dict:
        count += 1
    if count > max_count:
      single_label = k
      max_count = count
    count = 0
  return single_label

print(top_emotions)
matching_emotions(top_emotions)

{'positive': 0.21212121212121213, 'fear': 0.15151515151515152, 'anticip': 0.09090909090909091, 'trust': 0.09090909090909091, 'joy': 0.09090909090909091}


'calmness'

## Data Sampling


In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

all_lyrics = list(df["lyrics"])
actual_emotion_listdf = list(df["label"])
actual_emotion_list = []
predicted_emotion_list = []
stop_words = set(stopwords.words('english'))

for lyr in all_lyrics:
  word_tokens = word_tokenize(lyr)
  # converts the words in word_tokens to lower case and then checks whether
  filtered_sentence = [w for w in word_tokens if not w.lower() in stop_words]
  new_lyr = " ".join(filtered_sentence)

  e_dict = get_emotion(new_lyr)
  top_emo = get_top_emotions(e_dict)
  pe = matching_emotions(top_emo)
  predicted_emotion_list.append(pe)

for lyr in actual_emotion_listdf:
  actual_emotion_list.append(lyr.lower())
print(actual_emotion_list)
print(predicted_emotion_list)

['sadness', 'tension', 'tenderness', 'sadness', 'sadness', 'sadness', 'tenderness', 'sadness', 'tenderness', 'tenderness', 'sadness', 'sadness', 'sadness', 'tension', 'sadness', 'sadness', 'sadness', 'tenderness', 'sadness', 'sadness', 'tenderness', 'sadness', 'sadness', 'tension', 'tenderness', 'sadness', 'sadness', 'tension', 'sadness', 'tenderness', 'tension', 'sadness', 'tenderness', 'sadness', 'tenderness', 'tension', 'sadness', 'tension', 'sadness', 'tenderness', 'sadness', 'sadness', 'tension', 'tenderness', 'tension', 'sadness', 'sadness', 'sadness', 'tension', 'sadness', 'tension', 'tension', 'tenderness', 'sadness', 'tenderness', 'sadness', 'sadness', 'sadness', 'tenderness', 'tenderness', 'tension', 'tenderness', 'sadness', 'tension', 'sadness', 'tension', 'tenderness', 'tenderness', 'tension', 'tenderness', 'tension', 'sadness', 'sadness', 'tenderness', 'sadness', 'tension', 'sadness', 'tenderness', 'sadness', 'tenderness', 'tension', 'sadness', 'sadness', 'sadness', 'tende

### Evaluation Metric
Calculating accuracy and confusion matrix

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score

In [None]:
# pre stop word removal
accuracy = accuracy_score(actual_emotion_list, predicted_emotion_list)
accuracy

0.06637931034482758

In [None]:
# after stop word removal
accuracy = accuracy_score(actual_emotion_list, predicted_emotion_list)
accuracy

0.06637931034482758

In [None]:
confusion = confusion_matrix(actual_emotion_list, predicted_emotion_list)
confusion

array([[  0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0],
       [106, 134,  28,  41,  59, 173,   0,  28],
       [  0,   0,   0,   0,   0,   0,   0,   0],
       [108, 118,  22,  12,  14,  50,   0,   2],
       [ 30,  45,  11,  19,  61,  81,   0,  18]])

In [None]:
f1 = f1_score(actual_emotion_list, predicted_emotion_list, average="weighted")
f1

0.10860953175855503

In [None]:
from sklearn.metrics import classification_report
# Print the classification report
eval_report = classification_report(actual_emotion_list, predicted_emotion_list)
print("\nEvaluation Report:")
print(eval_report)


Evaluation Report:
              precision    recall  f1-score   support

   amazement       0.00      0.00      0.00         0
    calmness       0.00      0.00      0.00         0
   nostalgia       0.00      0.00      0.00         0
       power       0.00      0.00      0.00         0
     sadness       0.44      0.10      0.17       569
   solemnity       0.00      0.00      0.00         0
  tenderness       0.00      0.00      0.00       326
     tension       0.38      0.07      0.12       265

    accuracy                           0.07      1160
   macro avg       0.10      0.02      0.04      1160
weighted avg       0.30      0.07      0.11      1160



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


As seen from the results, the accuracy of NRCLex is 6.7%, which is very low.
This may be due to:
* Only accessing the scores of approximately 27k words individually. It also does not take into account of the overall context of the sentence, and thus might label each word differently.

* Some of the words in the package are not tagged to a specific label

* The categories which are provided from NRCLex differ from the categories which are derived from the emotion paper - which might cause the emotion analysis to be incorrect

* Does not handle negations directly unless it detects negation words like "not" - Hence, cannot identify sarcasm or any cultural variations of the input sentence.

* Currently only identifying single-label emotion analysis. Multi-label emotion recognition might derive a more accurate output.
