# Face PCA

This code is to generate the Face PCA as well as to export such database

# Preliminaries

Locate the corresponding folder and load the database

In [2]:
%Folders
folder.codePath = 'C:\Users\tzarzar\Box Sync\Research\GeneralCode\Matlab Code';
addpath(genpath(folder.codePath));
folder.dataPath16 = 'R:\ShriverLab\Facial features input files and databases\PSU_KU_WFS2016';
folder.databases  = 'C:\Users\tzarzar\Box Sync\Research\FacialSD\DataBases';
folder.results    = 'C:\Users\tzarzar\Box Sync\Research\FacialSD\Results\FacePCA';
cd(folder.dataPath16);
load PENNDATA_AccumulatedShapeData; %loading faces (Data)
load RefScan; %loading AM (RefScan)
load NewMaskIndex; %loading AM trimming index (MaskIndex)
crop(RefScan, 'VertexIndex', MaskIndex); %Trimming the RefScan
cd(folder.databases);
Covariates = readtable('Covariates.csv');





Get the intersection between Covariates and face data

In [4]:
[keep1, keep2] = GetIntersection(Data, Covariates);
Data           = reduceData(Data, keep1);
Covariates     = Covariates(keep2, :);
nFaces         = length(Data.Names)
nCovariates    = length(Covariates.ID)


nFaces =

        5939


nCovariates =

        5939





## PCA Analysis and export

Generalized Procrustes Analysis

In [5]:
TotalShape = [Data.Shape, Data.NormShape]; %Concatenation of original faces (Data.Shape) and their reflection (Data.Norm)
model      = shapePCA; %Creating an empty shape space object
model.RefScan = clone(RefScan); %Defining the AM that was used to create the shape space
AlignedData   = LSGenProcrustes(model, TotalShape, true, 3, RefScan);

Starting parallel pool (parpool) using the 'local' profile ...
connected to 4 workers.




Decomposing faces into components of symmetry and asymmetry

In [6]:
OrigHead = AlignedData(:, 1:nFaces);
ReflHead = AlignedData(:, nFaces+1:end);
SymHead  = (OrigHead + ReflHead)/2; %facial symmetry component
AsymHead = (OrigHead - ReflHead) + mean(SymHead, 2); %facial asymmetry component





Running the Face PCA. We will retain the components that explain 98% of the variance

In [7]:
getAverage(model, SymHead); %Compute the average head
getModel(model, SymHead); 
means = mean(SymHead, 2);
stripPercVar(model, 98);
model


model = 

  shapePCA with properties:

    AvgVertices: [3×6790 double]
         AvgVec: [20370×1 double]
        RefScan: [1×1 meshObj]
        Average: [1×1 meshObj]
         EigVal: [87×1 double]
         EigVec: [20370×87 double]
         Tcoeff: [5939×87 double]
       AvgCoeff: [87×1 double]
      Centering: 1
           nrEV: 87
         EigStd: [87×1 double]
      Explained: [87×1 double]
              n: 5939
              U: [5939×87 double]
              S: [87×1 double]
              V: [20370×87 double]
           Type: 'shapePCA'





# Removing outliers

We will use the mahalanobis distance with respect to the origin as a measure of similarity

In [9]:
origin    = zeros(1, size(model.Tcoeff, 2)); 
mahaldist = sqrt(sum(( (model.Tcoeff ./ model.EigStd') - origin ) .^ 2, 2 ));
size(mahaldist)


ans =

        5939           1





We will define outliers if they are 3 scaled median absolute deviation (MAD) away from the median of the mahaldist distribution.
Below you can see the individual IDs identified as outliers

In [10]:
Covariates.ID(isoutlier(mahaldist))


ans =

  213×1 cell array

    '131203'
    '131239'
    '132046'
    '132047'
    '132067'
    '140056'
    '140103'
    '140219'
    '140258'
    '140478'
    '140490'
    '140518'
    '140697'
    '140713'
    '140739'
    '140909'
    '141181'
    '141183'
    '141188'
    '141204'
    '141211'
    '141248'
    '141309'
    '141358'
    '141378'
    '141399'
    '141469'
    '141502'
    '141551'
    '141574'
    '141956'
    '142007'
    '143026'
    '143293'
    '143470'
    '143534'
    '143551'
    '143552'
    '143561'
    '50238'
    '50239'
    '50243'
    '50248'
    '50250'
    '50259'
    '50275'
    '50286'
    '50310'
    '50313'
    '50324'
    '50326'
    '50347'
    '50380'
    '50606'
    '50630'
    '50656'
    '50657'
    '50670'
    '50692'
    '50759'
    '50791'
    '50838'
    '50841'
    '50910'
    '50920'
    '50942'
    '60032'
    '60068'
    '60081'
    '60141'
    '60177'
    '60178'
    '60190'
    '60198'
    '60251'
    '60252'
    '60261'
    '6026



# Creating database for export

Exporting database with the identified outliers removed. 
Also, we will compute BMI

In [19]:
PCnames        = strseq('PC', 1:model.nrEV);
Covariates.BMI = Covariates.Weight ./ ( (Covariates.Height ./ 100) .^2);
coeffs         = [Covariates, array2table(model.Tcoeff, 'VariableNames', PCnames)];
cd(folder.results)
csvwrite('eigenvalues.csv', model.EigVal);
csvwrite('eigenvectors.csv', model.EigVec);
csvwrite('means.csv', means);
csvwrite('facets.csv', model.Average.Faces');
writetable(coeffs(~isoutlier(mahaldist), :), 'coeffs.csv')



