# Overview of CASME Dataset

In [1]:
import pandas as pd
from dotenv import load_dotenv
import os
from pathlib import Path
import numpy as np

load_dotenv()
casme_path = os.getenv('CASME_PATH')

## Previewing Encodings

In [2]:
encodings_path = Path('CASME-coded-20190721.xls')
encoding_df = (pd
               .read_excel(Path(casme_path).joinpath(encodings_path))
               .drop(columns=["Unnamed: 2", "Unnamed: 7"])
               .replace('\\', np.nan))

In [3]:
encoding_df.head()

Unnamed: 0,Subject,Filename,OnsetF,ApexF1,ApexF2,OffsetF,Onset,Total,AU,Emotion
0,1,EP01_12,73,81,,91,150.0,316.666667,4,tense
1,1,EP01_13,63,69,,77,116.666667,250.0,4,tense
2,1,EP01_5,113,121,125.0,133,150.0,350.0,12,happiness
3,1,EP01_8,67,75,,81,150.0,250.0,14,repression
4,1,EP03_1,79,91,95.0,105,216.666667,450.0,4+17,repression


In [4]:
encoding_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 10 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Subject   193 non-null    int64  
 1   Filename  193 non-null    object 
 2   OnsetF    193 non-null    int64  
 3   ApexF1    193 non-null    int64  
 4   ApexF2    125 non-null    float64
 5   OffsetF   193 non-null    int64  
 6   Onset     193 non-null    float64
 7   Total     193 non-null    float64
 8   AU        193 non-null    object 
 9   Emotion   193 non-null    object 
dtypes: float64(3), int64(4), object(3)
memory usage: 15.2+ KB


## Encoding Feature Descriptions

### Subject

Numer (or id of sorts) given to a particular participant.

### Filename

The names of the video clips or directories containing the associated frames as images.

### OnsetF

The first frame for the micro-expression.

### Apex1

The first frame of the apex phase of the micro-expression.

### Apex2

The last frame of the apex phase of the micro-expression.

### OffsetF

The last frame of the micro expression.

### Onset

The duration from onset to apex 1 (the first frame of the apex phase of the micro-expression).

### Total

The duration from onset to offset.

### AU

Action units present in the video. Used to code a given emotion to each instance.

Emotion labeling is based **partly** on the AUs because micro-expressions are typically partial when given in low intensity. The self reports of the participants and the content of the video episodes were also used in labelling to account for htis.

Criteria for labelling using action units can be found in the `CASME.pdf` document in table 4.

### Emotion

The estimated emotion.

NOTE:

1. Amusement (in FG2013 paper) was replaced with happiness.
2. AU4 alone is difficult to judge which emotion it conveys.
It may indicate disgust, anger or attention/interest; we thus label "AU4" as "tense"
(for the moment). There are only very few micro-expression for some categories, so it
is also plausible to remove some categories for training and test, such as fear.