# Affective Computing - Programming Assignment 3

### Objective

Your task is to use the feature-level method to combine the facial expression features and audio features. A multi-modal emotion recognition system is constructed to recognize happy versus sadness facial expressions (binary-class problem) by using a classifier training and testing structure.

The original data is based on lab1 and lab2, from ten actors acting happy and sadness behaviors. 
* Task 1: Subspace-based feature fusion method: In this case, z-score normalization is utilized. Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and learn how to use subspace-based feature fusion method for multi-modal system.

* Task 2: Based on Task1, use Canonical Correlation Analysis to calculate the correlation coefficient of facial expression and audio features. Finally, use CCA to build a multi-modal emotion recognition system. The method is described in one conference paper “Feature fusion method based on canonical correlation analysis and handwritten character recognition”
* Optional task: Use feature-level method (Task 2) on 10-fold cross-validation estimate of the emotion recognition system performance

To produce emotion recognition case, Support Vector Machine (SVM) classifiers are trained.  50 videos from 5 participants are used to train the emotion recognition, use spatiotemporal features. The rest of the data (50 videos) is used to evaluate the performances of the trained recognition systems.

## Task 1. Subspace-based method
Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and apply their framework for the exercise. We use Support Vector Machine (SVM) with linear kernel for classification.


### Setting up the environment 

First, we need to import the basic modules for loading the data and data processing

In [8]:
import sys
sys.path.append('../')
import dlib
from skimage import io
from skimage import transform
from skimage import color
from skimage import img_as_ubyte
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import sklearn
import scipy.io as sio


### Loading data 

We load the facial expression data (training data, training class, testing data, testing class) and audio data (training data, testing data)

In [9]:
mdata = sio.loadmat('lab3_data.mat')
#print mdata

#facial expression training and testing data, training and testing class
training_data = mdata['training_data']
testing_data =  mdata['testing_data']
training_class =  mdata['training_class']
testing_class =  mdata['testing_class']

#audio training and testing data
training_data_proso =  mdata['training_data_proso']
testing_data_proso =  mdata['testing_data_proso']


### Extract the subspace for facial expression feature and audio features. 
Extract the subspace for facial expression feature and audio features using principal component analysis through using __[`sklearn.decomposition.PCA()`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)__ function.
ReducedDim is the dimensionality of the reduced subspace.
Set ReducedDim to 20 and 15 for facial expression feature and audio feature, respectively.

In [10]:
from sklearn.decomposition import PCA 

#set ReducedDim for facial expression feature and audio feature, respectively.
reducedDim_v = 20;
reducedDim_a = 15;

#Extract subspace for facial expression feature though PCA
#set n_components
pca_v=PCA(n_components=reducedDim_v, whiten=True)
pca_v.fit(training_data)

#Transform training_data and testing data respectively
pca_face_train =pca_v.transform(training_data)
pca_face_test =pca_v.transform(testing_data) 

print len(pca_face_test[0])
print len(pca_face_train[0])

#Extract subspace for audio features though PCA
pca_a=PCA(n_components=reducedDim_a, whiten=True)
pca_a.fit(training_data_proso)

#Transform training_data and testing data respectively
pca_audio_train  =pca_a.transform(training_data_proso)
pca_audio_test  =pca_a.transform(testing_data_proso)  


#Concatenate ‘video training_data’ and ‘audio training_data’ into a new feature ‘combined_trainingData’
sample_train = np.concatenate((pca_face_train, pca_audio_train),axis =1)

#Concatenate ‘video testing_data’ and ‘audio testing_data2 into a new feature ‘combined_testingData’.
sample_test = np.concatenate((pca_face_test, pca_audio_test),axis =1)



20
20


### Feature classification
Use the __[`SVM`](http://scikit-learn.org/stable/modules/svm.html)__ function to train Support Vector Machine (SVM) classifiers.
Construct an SVM using the ‘combined_trainingData’ and linear kernel. The ‘training_class’ group vector contains the class of samples: 1 = happy, 2 = sadness, corresponding to the rows of the training data matrices.

Then, calculate average classification performances for both training and testing data. The correct class labels corresponding with the rows of the training and testing data matrices are in the variables ‘training_class’ and ‘testing_class’, respectively.

In [11]:
from sklearn import svm

# Train SVM classifier
clf = svm.SVC(kernel  = 'linear', cache_size = 5000)
clf.fit(sample_train, training_class)  

#The prediction results of training data and testing data respectively
prediction_train = clf.predict(sample_train)
prediction = clf.predict(sample_test)

#Calculate and Print the training accuracy and testing accuracy. 
correct_num = 0.
for i in range(len(training_class)):
    if prediction_train[i]==training_class[i]:
        correct_num+=1   
Acc_train = correct_num/len(training_class)
print Acc_train

#    2.2 calculate the accuracy when classifying the test data
correct_num = 0.
for i in range(len(testing_class)):
    if prediction[i]==testing_class[i]:
        correct_num+=1   
Acc_test = correct_num/len(testing_class)
print Acc_test


1.0
0.98


Compute the confusion matrix through __[`sklearn.metrics.confusion_matrix()`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)__function for training data and testing data respectively

In [12]:
from sklearn.metrics import confusion_matrix

confusion_train = confusion_matrix(training_class, prediction_train)
print confusion_train

confusion_test = confusion_matrix(testing_class, prediction)
print confusion_test

[[25  0]
 [ 0 25]]
[[25  0]
 [ 1 24]]


## Task 2. 
Based on Task1, use Canonical Correlation Analysis to calculate the correlation coefficient of facial expression and audio features. Finally, use CCA to build a multi-modal emotion recognition system.


Use (__[`sklearn.cross_decomposition.CCA()`](http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html)__) function to calculate the correlation coefficient of facial expression and audio features.

In [13]:
from sklearn.cross_decomposition import CCA
import numpy as np

#Use CCA to construct the Canonical Projective Vector (CPV)
cca = CCA(15)
cca.fit(training_data, training_data_proso)

#Construct Canonical Correlation Discriminant Features (CCDF) for training data and testing data
cca_face_train, cca_audio_train  = cca.transform(training_data, training_data_proso)
cca_face_test, cca_audio_test  = cca.transform(testing_data, testing_data_proso)


# Concatenate multiple feature for training data and testing data respectively
training_CCDF = np.concatenate((cca_face_train, cca_audio_train),axis =1)
testing_CCDF = np.concatenate((cca_face_test, cca_audio_test),axis =1)


Train SVM classifiers through  'linear' kernel, print the training and testing accuracy and compute the confusion matrix.

In [14]:
#Train svm classifier 
clf = svm.SVC(kernel  = 'linear', cache_size = 5000)
clf.fit(training_CCDF, training_class)  

#The prediction results of training data and testing data respectively
prediction_train = clf.predict(training_CCDF)
prediction = clf.predict(testing_CCDF)

#Calculate and Print the training accuracy and testing accuracy. 
correct_num = 0.
for i in range(len(training_class)):
    if prediction_train[i]==training_class[i]:
        correct_num+=1   
Acc_train = correct_num/len(training_class)
print Acc_train

#    2.2 calculate the accuracy when classifying the test data
correct_num = 0.
for i in range(len(testing_class)):
    if prediction[i]==testing_class[i]:
        correct_num+=1   
Acc_test = correct_num/len(testing_class)
print Acc_test

confusion_train = confusion_matrix(training_class, prediction_train)
print confusion_train

confusion_test = confusion_matrix(testing_class, prediction)
print confusion_test

1.0
0.82
[[25  0]
 [ 0 25]]
[[25  0]
 [ 9 16]]


## Optional task: 
Use feature-level method (Task 2) on 10-fold cross-validation estimate of the emotion recognition system performance
* Join the training/testing data matrices and the class vectors. Combine also the ‘training_data_personID’ and ‘testing_data_personID’ vectors that are needed to make the CV folds.
* Construct the CV folds by training ten SVMs. For each SVM nine persons’ data is used as the training set (i.e. 90 samples) and one persons’ samples are kept as the test set (i.e. 10 samples) for the respective fold (i.e. each SVM has different persons’ samples excluded from the training set). Test each ten trained SVMs by using the corresponding one held-out persons’ samples and then calculate the average classification performances for each fold.
* Calculate the mean and SD of the ten CV fold performances to produce the final CV performance estimate of the emotion recognition system. 