# A Priori Approach

#### INTRODUCTION

In this chapter we will explain one of the possible approach to the problem of identifing emotion in a photo or in a video.

As soon as we recived the task of the project we started trying to understand the domain in which we where going to work; during our researches we bumped into an interesting article talking about emotion recognition, written by Ruiz, Van de Weijer and Binefa “From Emotions to Action Units with Hidden and Semi-Hidden-Task Learning”, International Conference on Computer Vision (ICCV), 2015.

In this paper, they investigate how the use of large databases labelled according to the 7 universal facial expressions (anger, disgust, fear, happiness, sadness, surprise, neutral), can increase the generalization ability of Action Unit classifiers. Using FACS (Facial Action Coding System), a system to taxonomize human facial movements by their appearance on the face, it is possible to code nearly any anatomically possible facial expression, deconstructing it into the specific Action Units (AU) that produced the expression. During the reading, we found the following object:
<img src="tab.png">

This table shows Action Units  activation probability for each emotion according to the paper: 
P. Gosselin, G. Kirouac, and F. Y. Dor´e. 
“Components and recognition of facial expression in the communication of emotion by actors.” Journal of personality and social psychology, 1995.

We decided consequently to exploit this table information for our project of identify emotions from video and/or pictures, but with a little variation on the concept.  Our idea is that the matter: “an AU has a probability to activate given an emotion” can be approximated by thinking that each basic emotion has a certain pattern that we, as humans, tend to recognize on another person by spotting some typical features of the pattern. The approximation lies in considering the most active features as the most recognizable patterns and consequently the most higlighted ones by a person: a person would tend on purpose or unintentionally to stress out more than other features, the ones that we associate with the emotions, i.e. the most activated ones. For example, for happiness AU6 and AU12 have the highest probability to activate based on the table. These action units are respectively cheek raiser and lip corner puller. These two features movements are the ones that make a person immediately associate them with happiness. Therefore, we will approximate that by assuming that the intensity of these two Aus will be generally higher than the other ones, due to the fact that they are most recognizable features of the emotion and people would tend to stress them more.


#### OPENFACE API

We used Openface API to extract information about the AUs for each picture or video that we wanted to analyze. 
AUs can be described in two ways:

Presence - if AU is visible in the face (for example AU01_c)
Intensity - how intense is the AU (minimal to maximal) on a 5 point scale
OpenFace provides both of these scores. For presence of AU 1 the column AU01_c in the output file would encode 0 as not present and 1 as present. For intensity of AU 1 the column AU01_r in the output file would range from 0 (not present), 1 (present at minimum intensity), 5 (present at maximum intensity), with continuous values in between.
Unfortunately, the intensity and presence predictors have been trained separately and on slightly different datasets, this means that the predictions of both is not always consistent.

Openface gives as output also a lot of other information such that the position of the face or in which direction the eyes are looking, as future development of this project one could try to exploit even that kind of information; in our approach we will use just the intensity of action units present in the matrix mentioned above.


#### THE DATA 

The data we will use come from the Openface elaboration of a famous dataset, much used in a lot of works we read about, CK+.

In this dataset there are plenty of imagies of faces, in particular in the dataset there are 123 subject, for each subjet we have at least one sequence, 592 in total, and each sequence is made by at least 4 imagies in sequence where the first one is labeled as neutral and the last one as the maximum intensity of the emotion.
This means that for every emotion we have the evolution of the face from the neutral  state to the emotion passing through different degree of emotion intensity.

Anyway we will not use the whole imagies from the dataset but just the first, labeled as "neutral", and the last one of each sequence. 

A detailed report on the dataset we use for our analisis is given in "Data_Analysis_Images_AUs".



#### OUR APPROACH

In the aricle g about emotion recognition, written by Ruiz, Van de Weijer and Binefa “From Emotions to Action Units with Hidden and Semi-Hidden-Task Learning”, form which the matrix above come from, there is no explanation of how the matrix could be used to go from AUs to emotion; it is presented just as a correlation matrix resulting from their studies.
Anyway we tryed to use it to make the converse way: from the AUs to the emotions. 

Our idea is to consider the matrix not as a correlation matrix but as a matrix made by 7 rows, each one express one emotion, but not in terms of probabilities of having the corresponding AUs activated, but as a real state in which the AUs should be to perfectly represent the emotion. So we consider each row as a vector, a point in $R^{14}$; now we are able to compute the distance between the "theorical emotion vector" and our AUs vector.

We will lable each frame as the less distant emotion.



#### ELABORATION PROCESS

An output frame from a video is of the form:

<img src="out.png">

For every frame we compute:

* Normalize it: Every AU_r goes from 0 to 5, we scale it to [0,1], in order to be in the same scale of the matrix vectors.

* Then we compute the vectoral distance from the frame to the vectoral configuration of each emotion (i.e. For example from the aforementioned table we have that sadness is associated with the vector [0.22, 0.01, 0.25, 0.00, 0.03, 0.39, 0.00, 0.05, 0.05, 0.00, 0.09,  0.17, 0.07, 0.00, 0.14, 0.20] etc..)

* After computing the distance, we have as a result a number for each emotion, i.e. the distance of each emotion. The bigger it is, the more the emotion we are detecting is “far” from the emotion of the person corresponding to the frame we are considering.

* We assume that the emotion in the frame is the one with matrix vectorial configuration less distant.


#### IMPLEMENTATION

We implemented the algorithm using Python.


In [4]:
import pandas as pd
import numpy as np
import scipy

--------------------------------

* ** Data manipulation **

We start implementing the "a priori matrix" and importing the dataset

In [5]:
#matrix data
anger=pd.Series([0.17,0.10,0.33,0.25,0.03,0.05,0.00,0.1,0,0.05,0.06,0.4,0.31,0.49])
disgust=pd.Series([0.01,0.01,0.35,0.01,0.06,0.36,0.06,0.21,0,0,0,0.25,0.32,0.4])
fear=pd.Series([0.12,0.01,0.33,0.55,0,0.29,0.0,0.03,0,0.04,0.04,0.25,0.20,0.75])
happyness=pd.Series([0.07,0.09,0.01,0.05,0.94,0.01,0.05,0.0,0.92,0.0,0.0,0.02,0.34,0.55])
sadness=pd.Series([0.22,0.01,0.25,0.0,0.03,0.39,0.0,0.05,0.05,0.09,0.17,0.07,0.14,0.20])
surprise=pd.Series([0.15,0.19,0.08,0.76,0.0,0.02,0.0,0.10,0.04,0.0,0.04,0.09,0.26,0.72])
neutral=pd.Series([0,0,0,0,0,0,0,0,0,0,0,0,0,0])


path="../data/labeled_fromvids.csv"  #path to the data 
dat=pd.read_csv(path)#opening the data

FileNotFoundError: File b'../data/labeled_fromvids.csv' does not exist

As one can see from the matrix, the emotion are encoded in only 14 AUs, so we need to throw part of ours AUs

In [3]:
#form the csv extracting the interest columns
nam=dat.columns.values
ok=nam[2:18]
ok= ok[ok!="AU14_r"]
ok= ok[ok!="AU23_r"]
opend=dat.iloc[:,2:18]
del opend["AU14_r"]
del opend["AU23_r"]

NameError: name 'dat' is not defined

After exploring the data we noticed that there were some AUs which never take values different from zero, or other AUs take values different from zero just in few cases.
In order to avoid distortion of the results we deleted the AUs which were always equal to zero or positive in less then 40% of the frame.

In [1]:
openda=opend.loc[:,((( opend>0).sum())/len(dat.iloc[:,0]) <0.60) ]
opendat=openda.loc[:,openda.max()>0]
prova=np.array(opendat)

NameError: name 'opend' is not defined

To finish the data manipulation we need to scale all our data to [0,1]. We will put all observation bigger than 3 to 3, this is to avoid the presence of outlier and to "stretch" the data in the interval [0,1], otherwise the presence of few observation bigger than 3 would affect badly our results. 
Finaly we normalize each AUs to its maximum, in this way we avoid to have only observation near to zero where 3 is not reached.

In [None]:
prova=np.array(opendat)
prova[(prova>3)]=3
dati=prova/prova.max(axis=0)

We steel have to adjust the matrix to the modification we made

In [None]:
mat=np.zeros((7,14))
mat[:]=[anger,disgust,fear,happyness,sadness,surprise,neutral]
matrix=pd.DataFrame(mat, columns=ok )

matrix=matrix.loc[:,((opend>0).sum())/len(dat.iloc[:,0]) <0.60]
matrix=matrix.loc[:,openda.max()>0]

---------------------------

* ** Distace **

Torun some experiments and understand which kind of distance would work better in term of accurancy we tested 200 different kind of distances. 

We used the Minkowski distance as reference and we tryed to predict the emotion of a frame using from 1 to 201 degree of the distance and we iterated that to the whole dataset.

In [None]:
#This is given as a comment because it takes some times to finish
# to run, but it is how we took our conclusions.
'''
lis_ne=[]
for v in range(200):
    for i in range(len(dati[:,0])):
        photo=dati[i,:]
        lis_parz=[]
        for u in range(7):
            emo=matrix.loc[u,:]
            dist=scipy.spatial.distance.minkowski(emo,photo,p=v+1)
            lis_parz.append(dist)
            
        lis_ne.append(lis_parz)   
 
 
ris_m=pd.DataFrame(lis_ne)        


ris_m.T /ris_m.sum(axis=1)





ris_m.columns=["anger","disgust","fear","happyness","sadness","surprise","neutral"]

temp=[1,3,4,5,6,7,0]
emotions=pd.DataFrame(temp)



clas=[]
for i in range(len(ris_m.iloc[:,0])):
    
    m=ris_m.loc[i,:]==min(ris_m.loc[i,:])
    rev=m.tolist()
    rr=emotions[rev][0].tolist()[0]
    clas.append(rr)

papa_m=pd.Series(clas)



derr=[]
for i in range(200):
    temp=papa_m[i*len(dati[:,0]):len(dati[:,0])+i*len(dati[:,0])]
    temp.index=range(len(dati[:,0]))
    a=(dat["emotion"]==temp).sum() / len(dati[:,0])
    derr.append(a)
trend=pd.Series(derr)
trend
'''

What we discovered running this script is that the one which works better is the Minkowski with 2 degrees, the Eucliden distance.

So we will use the Euclidean distance to assign a label to a given frame.

In [None]:
#EUCLIDEA

#create a dataframe with the distances computed with the euclidean 
lis_fin=[]
for i in range(len(dati[:,0])):
    photo=dati[i,:]
    lis_parz=[]
    for u in range(7):
        emo=matrix.loc[u,:]
        dist=np.sqrt(sum((emo-photo)*(emo-photo)))
        lis_parz.append(dist)
        
    lis_fin.append(lis_parz)   
    
ris=pd.DataFrame(lis_fin)        



ris.columns=["anger","disgust","fear","happyness","sadness","surprise","neutral"]


temp=[1,3,4,5,6,7,0]
emotions=pd.DataFrame(temp)

a=ris.iloc[325,:].tolist()
a.sort()
c=np.array(temp)
c[a[0]==ris.iloc[325,:]][0]

#selecting the minimal distance between each photo and the emotions
clas=[]
for i in range(len(ris.iloc[:,0])):
    a=ris.iloc[i,:].tolist()
    a.sort()
    c=np.array(temp)
    q=c[a[0]==ris.iloc[i,:]][0]

    clas.append(q)

papa=pd.Series(clas)



Finaly we can say that with this approach we reached an accurancy of more than 42%.
Thinking that giving at random a label we should have an accurancy of 14% we can consider it a grat results.

In [None]:
#percentage of right clissified emotions
((dat["emotion"]==papa).sum())/327

#### RESULTS AND COMMENTS


We reached an accuracy of 42.51% , so we can assume that the table describes quite well the activation of AUs with respect of the emotions and our approximation has some true foundation. In the other hand, the accuracy reached is very sensitive to the dataset we used due to the approach we had. As explained before, the implementation included some editing based on the dataset conformation.

#### FUTURE POSSIBLE IMPROVEMENTS

Due to the accuracy being sensitive to the dataset in our case, it would be advisable to try this Matrix Approach to multiple datasets and foresee if the matrix and the approximation idea can be considered as good and useful general tools for emotions detection. Other possible improvements can be try even more different distances or other scaling approaches. Finally, Openface was not producing values for all AUs, so it could be useful to try the algorithm to other APIs that produce even more AUs and check how the accuracy may change.
