Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Pre-CODE: Exploiting Unsupervised Data for Emotion Recognition in Conversations

Implementation of our paper "Exploiting Unsupervised Data for Emotion Recognition in Conversations" to appear in the Findings of EMNLP 2020. [paper] [old version]

Brief Introduction

Emotion Recognition in Conversations (ERC) aims to predict the emotion state of speakers in conversations, which is essentially a text classification task. Unlike the sentence-level text classification problem, the available supervised data for the ERC task is limited, which potentially prevents the models from playing their maximum effect. In this paper, we propose a novel approach to leverage unsupervised conversation data, which is more accessible. Specifically, we propose the Conversation Completion (ConvCom) task, which attempts to select the correct answer from candidate answers to fill a masked utterance in a conversation. Then, we Pre-train a basic COntext-Dependent Encoder (Pre-CODE) on the ConvCom task. Finally, we fine-tune the Pre-CODE on the datasets of ERC. Experimental results demonstrate that pre-training on unsupervised data achieves significant improvement of performance on the ERC datasets, particularly on the minority emotion classes.

Figure 1: The framework of Pre-CODE.

Conversation Completion (ConvCom) We exploit the self-supervision signal in conversations to construct our pre-training task. Formally, given a conversation, U={ u1, u2, ..., uL }, we mask a target utterance ul as U\u={ ..., ul-1, [mask] , ul+1, ... } to create a question, and try to retrieve the correct utterance ul from the whole training corpus. Since the choice of filling the mask involves all possible utterances, which are countless, formulating the task into a multi-label classification task with softmax is infeasible. We instead simplify the task into a response selection task using negative sampling, which is a variant of noise-contrastive estimation (NCE). To achieve so, we sample N-1 noise utterances elsewhere, along with the target utterance, to form a set of N candidate answers. Then the goal is to select the correct answer, i.e., ul, from the candidate answers to fill the mask, conditioned on the context utterances. We term this task "Conversation Completion", abbreviated as ConvCom. Figure 2 shows an example, where the utterance u4 is masked out from the original conversation and two noise utterances are sampled elsewhere together with u4 to form the candidate answers.

Figure 2: An Example of the ConvCom Task.

Code Base


Please find the datasets via the following links:

  • Friends: Friends comes from the transcripts of Friends TV Sitcom, where each dialogue in the dataset consists of a scene of multiple speakers.
  • EmotionPush: EmotionPush comes from private conversations between friends on the Facebook messenger collected by an App called EmotionPush.
  • IEMOCAP: IEMOCAP contains approximately 12 hours of audiovisual data, including video, speech, motion capture of face, text transcriptions.


  • Python v3.6
  • Pytorch v0.4.0-v0.4.1
  • Pickle

Data Preprocessing

Preprocess the OpenSubtitle dataset as:

python -datatype opsub -min_count 2 -max_seq_len 60

Preprocess one of the emotion dataset as:

python -datatype emo -emoset Friends -min_count 2 -max_seq_len 60

The arguments -datatype, -emoset, -min_count, and -max_length represent the type of data (i.e., pre-training data or emotion data), the dataset name, the minimum frequency of words when building the vocabulary, and the max_length for padding or truncating sentences.

PreCODE storage includes the raw data and preprocessed data of OpSub and Friends, and the pre-trained models with hidden sizes of 300.

Pre-trained Word Embeddings

To reproduce the results reported in the paper, please adopt the pre-trained word embeddings for initialization. You can download the 300-dimentional embeddings from below:

Decompress the file and re-name it glove300.txt.


  1. Pre-train the context-dependent encoder on the ConvCom task.
  • You can change the parameters in the script.
# Var assignment
echo ========= lr=$LR ==============
for iter in 1
echo --- $Enc - $Dec $iter ---
python \
-lr $LR \
-gpu $GPU \
-d_hidden_low 300 \
-d_hidden_up 300 \
-sentEnc gru2 \
-layers 1 \
-patience 3 \
-data_path \
-vocab_path \
-embedding \
-dataset OpSub
  1. Fine-tune the Pre-CODE on the emotion datasets.
  • You can change the parameters in the script.
# Var assignment
echo ========= lr=$LR ==============
for iter in 1 2 3 4 5
echo --- $Enc - $Dec $iter ---
python -load_model \
-lr $LR -gpu $GPU \
-d_hidden_low $du -d_hidden_up $dc \
-patience 6 -report_loss 720 \
-data_path \
-vocab_path \
-emodict_path \
-tr_emodict_path \
-dataset Friends \

Public Impact


Please kindly cite our paper if you find it useful or highly related to your research:

      title={Exploiting Unsupervised Data for Emotion Recognition in Conversations}, 
      author={Wenxiang Jiao and Michael R. Lyu and Irwin King},


Implementation of our paper "Exploiting Unsupervised Data for Emotion Recognition in Conversations" in the Findings of EMNLP-2020.




No releases published


No packages published