#SemEval 2018 - Affect in Tweets
https://competitions.codalab.org/competitions/17751
http://saifmohammad.com/WebDocs/semeval2018-task1.pdf
#### Task E-c: Detecting Emotions (multi-label classification) -- This is a traditional Emotion Classification Task

**Given:**
*   a tweet

**Task:** classify the tweet as 'neutral or no emotion' or as one, or more, of eleven given emotions that best represent the mental state of the tweeter:

*   anger (also includes annoyance and rage) 
*   anticipation (also includes interest and vigilance) 
*   disgust (also includes disinterest, dislike and loathing)
*   fear (also includes apprehension, anxiety, concern, and terror) 
*   joy (also includes serenity and ecstasy) 
*   love (also includes affection) 
*   optimism (also includes hopefulness and confidence) 
*   pessimism (also includes cynicism and lack of confidence) 
*   sadness (also includes pensiveness and grief) 
*   suprise (also includes distraction and amazement) 
*   trust (also includes acceptance, liking, and admiration) 

**Data was downloaded into google drive from:**
https://competitions.codalab.org/competitions/17751#learn_the_details-datasets

In [1]:
from google.colab import drive
drive.mount('/content/gdrive',force_remount=True)

Mounted at /content/gdrive


In [2]:
!pip install -q transformers==3.5.1
!pip install -q tf-models-official==2.3.0
!pip install -q emojis
!pip install -q -U sklearn

[K     |████████████████████████████████| 1.3MB 18.1MB/s 
[K     |████████████████████████████████| 890kB 52.8MB/s 
[K     |████████████████████████████████| 2.9MB 57.3MB/s 
[K     |████████████████████████████████| 1.1MB 52.5MB/s 
[?25h  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 849kB 16.0MB/s 
[K     |████████████████████████████████| 102kB 14.8MB/s 
[K     |████████████████████████████████| 37.6MB 85kB/s 
[K     |████████████████████████████████| 358kB 57.2MB/s 
[K     |████████████████████████████████| 174kB 64.6MB/s 
[?25h  Building wheel for py-cpuinfo (setup.py) ... [?25l[?25hdone


In [3]:
import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import TFRobertaModel, RobertaTokenizer
from unicodedata import normalize
import emojis
import re
from sklearn import metrics 
from sklearn.utils import class_weight

print("TF Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print(tf.config.list_physical_devices('GPU'))

TF Version:  2.4.0
Eager mode:  True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [4]:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
roberta_layer = TFRobertaModel.from_pretrained('roberta-base')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898823.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=481.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=657434796.0, style=ProgressStyle(descri…




Some layers from the model checkpoint at roberta-base were not used when initializing TFRobertaModel: ['lm_head']
- This IS expected if you are initializing TFRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFRobertaModel were initialized from the model checkpoint at roberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaModel for predictions without further training.


### Read Data

In [5]:
trainDF = pd.read_csv('gdrive/My Drive/Sentiment Discovery/emotions_data/2018-E-c-En-train.txt',delimiter='\t')
devDF = pd.read_csv('gdrive/My Drive/Sentiment Discovery/emotions_data/2018-E-c-En-dev.txt',delimiter='\t')
testDF = pd.read_csv('gdrive/My Drive/Sentiment Discovery/emotions_data/2018-E-c-En-test-gold.txt',delimiter='\t')

trainDF.head()

Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2017-En-21441,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1
1,2017-En-31535,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0
2,2017-En-21068,@Max_Kellerman it also helps that the majorit...,1,0,1,0,1,0,1,0,0,0,0
3,2017-En-31436,Accept the challenges so that you can literall...,0,0,0,0,1,0,1,0,0,0,0
4,2017-En-22195,My roommate: it's okay that we can't spell bec...,1,0,1,0,0,0,0,0,0,0,0


### Print a few examples

In [6]:
for a in trainDF['Tweet'][:5]:
  print(a)
  print('-'*50)

“Worry is a down payment on a problem you may never have'.  Joyce Meyer.  #motivation #leadership #worry
--------------------------------------------------
Whatever you decide to do make sure it makes you #happy.
--------------------------------------------------
@Max_Kellerman  it also helps that the majority of NFL coaching is inept. Some of Bill O'Brien's play calling was wow, ! #GOPATS
--------------------------------------------------
Accept the challenges so that you can literally even feel the exhilaration of victory.' -- George S. Patton 🐶
--------------------------------------------------
My roommate: it's okay that we can't spell because we have autocorrect. #terrible #firstworldprobs
--------------------------------------------------


### Clean Data

In [7]:
def clean_tweet(s):
    """ Accepts a tweet and cleans data for deep learning"""
    s = normalize('NFKD',s) # remove weird encodings
    s = s.replace('#','') # remove hashtags
    s = emojis.decode(s) # decode emojis
    s = re.sub('@[^\s]+','',s) # remove username
    s = s.lower() # convert to lower case
    return s

def df_to_bert(df,max_seq_len,model=True):
    """ Input: 
        df - a dataframe with Tweet as the sentence column and the anget to trust as the 11 emotion columns.
        max_seq_len - maximum number of tokens in the sentence. Pad with zeros if Tweet is too short otherwise truncate to max_seq_len
        model - True/False whether to create an output label or not
    """
    output_dict = tokenizer([clean_tweet(tweet) for tweet in df['Tweet']],padding='max_length',max_length=max_seq_len,truncation=True)
    ids = np.array(output_dict['input_ids'],dtype=np.int32)
    att = np.array(output_dict['attention_mask'],dtype=np.int32)
    #tok = np.array(output_dict['token_type_ids'],dtype=np.int32)

    if model:
      y = np.int32(df.loc[:,'anger':'trust'].values)
    else:
      y = []

    return [ids,att],y

In [8]:
x_train,y_train = df_to_bert(trainDF,64)
x_dev,y_dev = df_to_bert(devDF,64)
x_test,y_test = df_to_bert(testDF,64)

## Model
This is a small dataset. We are therefore going to use the fine-tunning approach. This includes 2 steps:

1.   Freeze the roberta layer and let the other parameters train. The learning rate is higher in this step.
2.   Let all the parameters train, including the roberta parameters, but use a much smaller learning rate

In order to pool the roberta output we are using 3 different 1D convolution layers followed by global max pooling. Each of the 3 convolution layers has a different window size. We then concatenate the layers to get 96 features.



In [24]:
n_conv = 32

# freeze the roberta layer
roberta_layer.trainable = False

ids = tf.keras.layers.Input((64,), dtype=tf.int32)
att = tf.keras.layers.Input((64,), dtype=tf.int32)

roberta_inputs = [ids, att]

sequence_output,pooled_output = roberta_layer.roberta(roberta_inputs)

# unigram
x1 = tf.keras.layers.Conv1D(n_conv,1,activation='relu')(sequence_output)
x1 = tf.keras.layers.GlobalMaxPool1D()(x1)

# bigram
x2 = tf.keras.layers.Conv1D(n_conv,2,activation='relu')(sequence_output)
x2 = tf.keras.layers.GlobalMaxPool1D()(x2)

# trigram
x3 = tf.keras.layers.Conv1D(n_conv,3,activation='relu')(sequence_output)
x3 = tf.keras.layers.GlobalMaxPool1D()(x3)

concat = tf.keras.layers.Concatenate()([x1,x2,x3])
concat = tf.keras.layers.Dropout(0.5)(concat)

outputs = tf.keras.layers.Dense(11, activation='sigmoid')(concat)

model = tf.keras.Model(inputs=roberta_inputs, outputs=outputs)

# use the default Adam learning rate (1e-3)
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "model_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_11 (InputLayer)           [(None, 64)]         0                                            
__________________________________________________________________________________________________
input_12 (InputLayer)           [(None, 64)]         0                                            
__________________________________________________________________________________________________
roberta (TFRobertaMainLayer)    ((None, 64, 768), (N 124645632   input_11[0][0]                   
                                                                 input_12[0][0]                   
__________________________________________________________________________________________________
conv1d_15 (Conv1D)              (None, 64, 32)       24608       roberta[5][0]              

Because the classes are unbalanced we are going to use class weights. This will help insure that the less common classes have a better outcome.

In [25]:
class_weights = {i:v for i,v in enumerate(1/y_train.sum(axis=0)*len(y_train)/11)}

### Train the model
For step 2 (fine-tunning), we added a linear learning rate schedule to decrease the learning rate at each epoch.

In [26]:
def scheduler(epoch,lr):
  " linear learning rate decay"
  return lr*(1-epoch/10)

# model step 1
model.fit(x_train,y_train,batch_size=32,epochs=10,validation_data=(x_dev,y_dev),class_weight=class_weights)

# model step 2
roberta_layer.trainable = True

# starting learning rate is smaller (1e-5)
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

print('-'*50,'fine tune','-'*75)

my_callbacks = [tf.keras.callbacks.LearningRateScheduler(scheduler)]
model.fit(x_train,y_train,batch_size=32,epochs=10,validation_data=(x_dev,y_dev),callbacks=my_callbacks,class_weight=class_weights)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
-------------------------------------------------- fine tune ---------------------------------------------------------------------------
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fbd69ca3940>

### Check Performance (Dev dataset)



In [28]:
# dev dataset
x_dev_predict = np.round(model.predict(x_dev))
print('Jaccard index:',np.round(metrics.jaccard_score(y_dev,x_dev_predict, average='samples'),3))
print('f1-micro:',np.round(metrics.f1_score(y_dev,x_dev_predict,average='micro'),3))
print('f1-macro:',np.round(metrics.f1_score(y_dev,x_dev_predict,average='macro'),3))

Jaccard index: 0.569
f1-micro: 0.691
f1-macro: 0.561


  _warn_prf(average, modifier, msg_start, len(result))


### Retrain with both train + dev datasets
The model seem to do well. Before we check the test data we can retrain the model with both the train and the dev data. This will allow us to increase the data size and since we don't seem to overfit it will likely help improve the model.

In [36]:
train_dev_DF = pd.concat([trainDF,devDF],axis=0)
train_data,train_labels = df_to_bert(train_dev_DF,64)

In [37]:
class_weights = {i:v for i,v in enumerate(1/train_labels.sum(axis=0)*len(train_labels)/11)}

n_conv = 32

# freeze the roberta layer
roberta_layer.trainable = False

ids = tf.keras.layers.Input((64,), dtype=tf.int32)
att = tf.keras.layers.Input((64,), dtype=tf.int32)

roberta_inputs = [ids, att]

sequence_output,pooled_output = roberta_layer.roberta(roberta_inputs)

# unigram
x1 = tf.keras.layers.Conv1D(n_conv,1,activation='relu')(sequence_output)
x1 = tf.keras.layers.GlobalMaxPool1D()(x1)

# bigram
x2 = tf.keras.layers.Conv1D(n_conv,2,activation='relu')(sequence_output)
x2 = tf.keras.layers.GlobalMaxPool1D()(x2)

# trigram
x3 = tf.keras.layers.Conv1D(n_conv,3,activation='relu')(sequence_output)
x3 = tf.keras.layers.GlobalMaxPool1D()(x3)

concat = tf.keras.layers.Concatenate()([x1,x2,x3])
concat = tf.keras.layers.Dropout(0.5)(concat)

outputs = tf.keras.layers.Dense(11, activation='sigmoid')(concat)

model = tf.keras.Model(inputs=roberta_inputs, outputs=outputs)

# use the default Adam learning rate (1e-3)
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

def scheduler(epoch,lr):
  " linear learning rate decay"
  return lr*(1-epoch/10)

# model step 1
model.fit(train_data,train_labels,batch_size=32,epochs=10,class_weight=class_weights)

# model step 2
roberta_layer.trainable = True

# starting learning rate is smaller (1e-5)
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

print('-'*50,'fine tune','-'*75)

my_callbacks = [tf.keras.callbacks.LearningRateScheduler(scheduler)]
model.fit(train_data,train_labels,batch_size=32,epochs=10,callbacks=my_callbacks,class_weight=class_weights)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
-------------------------------------------------- fine tune ---------------------------------------------------------------------------
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fbbd7f8ac50>

### Check Performance (Test dataset)

In [38]:
# get predictions for the test dataset
ypred = model.predict(x_test)

In [39]:
print('Jaccard index:',np.round(metrics.jaccard_score(y_test, np.round(ypred), average='samples'),3))
print('f1-micro:',np.round(metrics.f1_score(y_test,np.round(ypred),average='micro'),3))
print('f1-macro:',np.round(metrics.f1_score(y_test,np.round(ypred),average='macro'),3))

Jaccard index: 0.592
f1-micro: 0.711
f1-macro: 0.571


  _warn_prf(average, modifier, msg_start, len(result))


In [40]:
results = []
for i,c in enumerate(testDF.loc[:,'anger':'trust'].columns):
  results.append([c,metrics.f1_score(y_test[:,i], np.where(ypred[:,i]>0.5,1,0))])

pd.DataFrame(results,columns=['Emotion','f1_score'])

Unnamed: 0,Emotion,f1_score
0,anger,0.780676
1,anticipation,0.334728
2,disgust,0.746896
3,fear,0.750273
4,joy,0.86383
5,love,0.634398
6,optimism,0.739889
7,pessimism,0.396825
8,sadness,0.713214
9,surprise,0.189573


**Conclusions:**
 
*   If we look at the compitition results (evaluation period), this model would place 1st.
*   Most classes have good performance but there are a couple that are more  (trust, surprise, anticipation, pessimism). These classes can be improved if we had a larger dataset with more examples of those classes. 




### Save Model

In [41]:
model.save('gdrive/My Drive/Sentiment Discovery/sentiment_model_roberta')



INFO:tensorflow:Assets written to: gdrive/My Drive/Sentiment Discovery/sentiment_model_roberta/assets


INFO:tensorflow:Assets written to: gdrive/My Drive/Sentiment Discovery/sentiment_model_roberta/assets


In [42]:
# if model is not in memory yo can load it:
# model = tf.keras.models.load_model('gdrive/My Drive/Sentiment Discovery/sentiment_model_roberta')

### Generate Text and Predict

In [43]:
fake_data = pd.DataFrame([['I hate this company. Ugh!', 
                           'This is a terrible product. Do not buy!!',
                           'I had a great experience', 
                           "I must say I LOVE THEM. They work great the range on the headphone are amazing. I left my phone playing upstairs and I walked all the way to my kitchen and they stayed connected. I love how they look and feel. They stay in my ears very well too. I must say over all with these are amazing. Would buy again.",
                           "The biggest issue I have with these are quality and price point. The first thing I noticed was the awful sound quality very tinny and quiet, vocals are loud while all the instruments are drowned out. The second thing I noticed is build quality, it feels cheap and it is plastic, but for $80 I expected something that felt less easily breakable. I would not recommend these to anyone.",
                           "First day I had them they kept falling out and I could not keep them in my ear and I tried using all of the ear buds that were provided. I didn't have any luck with these even while on my plane they kept falling out. Not very well made but I guess you shouldn't expect much from the price."]
                          ],index=['Tweet']).T
x = df_to_bert(fake_data,64,model=False)
res = model.predict(x)

In [44]:
pd.DataFrame(res,columns=testDF.loc[:,'anger':'trust'].columns)

Unnamed: 0,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,0.999758,0.00089,0.995151,0.017635,0.00432,5.7e-05,0.000355,0.031227,0.270644,0.000691,1e-06
1,0.994033,0.005895,0.973455,0.067285,0.02461,0.001225,0.008981,0.097319,0.396507,0.004633,0.000113
2,0.001558,0.233124,0.002559,0.023387,0.998632,0.476778,0.829878,0.00096,0.007082,0.11135,0.222798
3,0.005064,0.021421,0.001985,0.000163,0.998357,0.856511,0.700919,0.000105,0.002432,0.01305,0.042719
4,0.986875,0.003697,0.969549,0.024113,0.001417,1.6e-05,0.000186,0.162355,0.839506,0.008616,3e-06
5,0.782565,0.040073,0.792175,0.045197,0.017082,0.001908,0.016059,0.225784,0.773814,0.026226,0.000435
