# GoEmotions: Trial Using biLSTM

This implementation is based on tensorflow. We use the tutorial released by the authors of GoEomotions as a reference for processing the data. We adapt some of their helper methods. [Reference Link](https://github.com/tensorflow/models/blob/master/research/seq_flow_lite/demo/colab/emotion_colab.ipynb)

As for modeling part, we build our model based on Keras' APIs. We also include the embedding weights from GloVe. We will give a detailed explanation of the pre-trained embedding in the corresponding sections.

In [1]:
# Load libraries
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

Here, we directly load the GoEmotions dataset from Tensorflow Dataset. The dataset is the one originally released by the authors. [Link to Description](https://www.tensorflow.org/datasets/catalog/goemotions)

In [None]:
train_ds = tfds.load('goemotions', split='train')
val_ds = tfds.load('goemotions', split='validation')
test_ds = tfds.load('goemotions', split='test')

In [3]:
# Check the format of the tensorflow dataset.
for element in train_ds.take(5):
  print(element)

{'admiration': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'amusement': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'anger': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'annoyance': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'approval': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'caring': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'comment_text': <tf.Tensor: shape=(), dtype=string, numpy=b"It's just wholesome content, from questionable sources">, 'confusion': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'curiosity': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'desire': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'disappointment': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'disapproval': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'disgust': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'embarrassment': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'excitement': <tf.Tensor: shape=(), dtype=bool, numpy=False>, 'fear': <tf.Tensor: shape=(

We consider the full taxonomy. There are 27 emotions plus a neutral class. 

In [4]:
# the 28 emotions in our dataset
LABELS = [
    'admiration',
    'amusement',
    'anger',
    'annoyance',
    'approval',
    'caring',
    'confusion',
    'curiosity',
    'desire',
    'disappointment',
    'disapproval',
    'disgust',
    'embarrassment',
    'excitement',
    'fear',
    'gratitude',
    'grief',
    'joy',
    'love',
    'nervousness',
    'optimism',
    'pride',
    'realization',
    'relief',
    'remorse',
    'sadness',
    'surprise',
    'neutral',
]

In the section below, we define a helper method that take in the original label and return an one-hot vector.

In [5]:
# Construct training dataset & validation dataset needed for the biLSTM model
def process_label(features):
  '''
  Preprocess function. This will be an entry for the map function of tf.data.dataset 
  We create the label vector first.
  input:
    features, each entry in the dataset
  output:
    A dictionary (like the original input)
  '''
  text = features['comment_text'] # Text's key in GoEmotions
  label = tf.stack([features[label] for label in LABELS], axis=-1)
  label = tf.cast(label, tf.float32)
  model_features = (text, label)
  return model_features

# This was used for debugging (Check our process_label method is correct)
# tf.dataset's map function just apply the callback / function to each element in the dataset.
trial_ds = train_ds.map(process_label, num_parallel_calls=tf.data.experimental.AUTOTUNE, deterministic=False)

# Create Vocabulary
# We use at most 50000 words
MAX_FEATURES = 50000
# Restrict the length of sentences to 256
MAX_LENGTH = 300

# We use keras' text vectorization layer
# like the Vocab class we used in assignment 2
vectorized_layer = tf.keras.layers.TextVectorization(max_tokens=MAX_FEATURES, output_sequence_length=MAX_LENGTH)
# Add all the training vocabularies
vectorized_layer.adapt(train_ds.map(lambda text: text['comment_text']))

In [6]:
# sanity check, the elements with most appearance
glance_vocab = np.array(vectorized_layer.get_vocabulary())
glance_vocab[:10]

array(['', '[UNK]', 'the', 'i', 'to', 'a', 'you', 'and', 'is', 'that'],
      dtype='<U633')

In [7]:
# Create the training set, validation set
BUFFER_SIZE = 60000
BATCH_SIZE = 64
trial_train_ds = train_ds.map(process_label, num_parallel_calls=tf.data.experimental.AUTOTUNE, deterministic=False)
trial_val_ds = val_ds.map(process_label, num_parallel_calls=tf.data.experimental.AUTOTUNE, deterministic=False)
train_dataset = trial_train_ds.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
val_dataset = trial_val_ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

In [None]:
# sanity check
for example, label in val_dataset.take(1):
  print('texts: ', example)
  print()
  print('labels: ', label)

# Load Embeddings

We use the embeddings from GLoVe. To be more specific, we choose  glove.twitter.27B with embedding dimension $200$. In this section, we load the embeddings from Google Drive to the embedding layer.

In [9]:
import os 
from google.colab import drive
drive.mount('/content/drive')
embedding_file = open('/content/drive/MyDrive/glove.twitter.27B.200d.txt')
embedding_dict = {}
for row in embedding_file:
  values = row.split()
  word = values[0]
  embedding_vec = np.asarray(values[1:], dtype='float32')
  embedding_dict[word] = embedding_vec
embedding_file.close()
print(len(embedding_dict))

Mounted at /content/drive
1193514


In [10]:
# Coordinate with the previous vocabulary
current_words = vectorized_layer.get_vocabulary()
EMBEDDING_DIM = 200
embedding_weights = np.zeros((len(current_words), EMBEDDING_DIM))
for i, words in enumerate(current_words):
  embedding_word = embedding_dict.get(words)
  if embedding_word is not None:
    embedding_weights[i] = embedding_word


# Build the model

In the following section, we define the biLSTM architecture. The model starts with the tokenization layer and embedding layer. We set the dimension of embedding layer as $200$. The weights are loaded from GLoVe and are trainable. Then, we include two biLSTM units with hidden dimension $256$. 

The output of the biLSTM units are fed to two fully connected layers. Between the fully connected layers, we add a dropout layer with dropout probability $0.7$.

In [44]:
NUM_OF_CLASSES = 28


In [45]:
biLSTM_model = tf.keras.Sequential([
                           vectorized_layer,
                           tf.keras.layers.Embedding(
                               input_dim=len(vectorized_layer.get_vocabulary()),
                               output_dim=200,
                               mask_zero = True,
                               weights = [embedding_weights],
                               trainable=True
                           ),
                           tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256, return_sequences=True)),
                           tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256)),
                           tf.keras.layers.Dense(512, activation='relu'),
                           tf.keras.layers.Dropout(0.7),
                           tf.keras.layers.Dense(NUM_OF_CLASSES)
                           ])

# Train the model

We choose Adam as the optimizer and a learning rate of $0.001$. Since we are considering multi-class multi-label predictions, we use the binary cross entropy loss.  

In [43]:
lr = 1e-3
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)

In [46]:
biLSTM_model.compile(loss=loss,
              optimizer=optimizer,
              metrics=['accuracy'])

In [47]:
# Use a callback to store best models.
checkpoint_dir = '/content/drive/MyDrive/cs769-project/output'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_dir,
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

In [None]:
history = biLSTM_model.fit(train_dataset, epochs=15,
                    validation_data=val_dataset,
                    validation_steps=30,
                    callbacks=[model_checkpoint_callback])

# Evaluation

We compute recall, precision and F-1 scores.

In [50]:
pred_result = biLSTM_model.predict(val_dataset)

We record the true labels and the predicted labels. This could be a separate helper method though. We will modularize the notebook later. 

In [51]:
pred_y = []
actual_y = []

In [52]:
i = 0 
for _, labels in val_dataset:
  for label in labels.numpy():
    pred_label = tf.cast(tf.math.sigmoid(pred_result[i]) > 0.5, tf.float32).numpy()
    i += 1
    pred_y.append(pred_label)
    actual_y.append(label)

Here, we are showing the average F-1 score on the validation set.

In [54]:
from sklearn.metrics import f1_score, recall_score, precision_score
f1_score_list = f1_score(actual_y, pred_y, average=None)
recall_score_list = recall_score(actual_y, pred_y, average=None)
precision_score_list = precision_score(actual_y, pred_y, average=None)
print(np.mean(f1_score(actual_y, pred_y, average=None)))
print(recall_score(actual_y, pred_y, average=None))
print(precision_score(actual_y, pred_y, average=None))

0.4177191178798451
[0.64754098 0.78547855 0.34871795 0.28052805 0.26448363 0.33986928
 0.23684211 0.33064516 0.35064935 0.17177914 0.18493151 0.43298969
 0.45714286 0.21875    0.44444444 0.87150838 0.07692308 0.51744186
 0.79761905 0.23809524 0.50717703 0.26666667 0.21259843 0.05555556
 0.54411765 0.53146853 0.43410853 0.52718007]
[0.60652591 0.72340426 0.5112782  0.24781341 0.28767123 0.33548387
 0.31034483 0.31417625 0.52941176 0.19178082 0.32335329 0.43298969
 0.66666667 0.27272727 0.61538462 0.91764706 0.5        0.54938272
 0.71530249 0.33333333 0.56084656 0.66666667 0.16071429 0.2
 0.64912281 0.47798742 0.46280992 0.55949519]


In [55]:
# Store into Google Drive
with open("/content/drive/MyDrive/cs769-project/output/evaluation_val.txt", "w") as outfile:
  for scores in [f1_score_list, recall_score_list, precision_score_list]:
    for value in scores:
      outfile.write("%.4f\t" % value)
    outfile.write('\n')

In [60]:
trial_test_ds = test_ds.map(process_label, num_parallel_calls=tf.data.experimental.AUTOTUNE, deterministic=False)
test_dataset = trial_test_ds.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

Here, we are showing the average F-1 score on the test set.

In [61]:
pred_test_result = biLSTM_model.predict(test_dataset)

In [None]:
pred_y = []
actual_y = []

In [63]:
i = 0 
for _, labels in test_dataset:
  for label in labels.numpy():
    pred_label = tf.cast(tf.math.sigmoid(pred_test_result[i]) > 0.5, tf.float32).numpy()
    i += 1
    pred_y.append(pred_label)
    actual_y.append(label)

In [64]:
f1_score_list = f1_score(actual_y, pred_y, average=None)
recall_score_list = recall_score(actual_y, pred_y, average=None)
precision_score_list = precision_score(actual_y, pred_y, average=None)
print(np.mean(f1_score(actual_y, pred_y, average=None)))
print(recall_score(actual_y, pred_y, average=None))
print(precision_score(actual_y, pred_y, average=None))

0.4273184637262085
[0.59722222 0.76136364 0.28282828 0.2875     0.26780627 0.3037037
 0.29411765 0.35211268 0.28915663 0.24503311 0.19850187 0.37398374
 0.21621622 0.27184466 0.61538462 0.89204545 0.33333333 0.52795031
 0.77731092 0.26086957 0.41935484 0.3125     0.20689655 0.09090909
 0.57142857 0.46794872 0.41134752 0.55008394]
[0.57333333 0.73897059 0.448      0.24731183 0.28398792 0.29927007
 0.32142857 0.35335689 0.5        0.25694444 0.29120879 0.56097561
 0.42105263 0.34146341 0.6        0.91812865 1.         0.49418605
 0.7312253  0.28571429 0.53424658 0.71428571 0.19607843 0.25
 0.59259259 0.48993289 0.56862745 0.57687793]


In [65]:
with open("/content/drive/MyDrive/cs769-project/output/evaluation_test.txt", "w") as outfile:
  for scores in [f1_score_list, recall_score_list, precision_score_list]:
    for value in scores:
      outfile.write("%.4f\t" % value)
    outfile.write('\n')