# Abstract

##### Facial recognition has long remained one of Data Science's most difficult areas to approach. Whereas other types of data have easy to define features and relative simplicity, facial data includes a great deal of hidden or noisy information. Due to this, facial recognition remains a daunting field with no single approach guaranteed to achieve the desired result. And although the human brain excels at instinctively deriving difficult to define features at a glance, programs still struggle to extract something as basic as gender. Yet, should a model be developed that performs as well as humans in all conditions, it would  vastly increase efficiency in all sorts of fields. Basic examples include, medical diagnoses based on facial features, the removal of the need for identification documents, the increased ease of entering one's favorite sites etc. In light of this, the task was to perform exploratory analysis on a number of preprocessing techniques, combined with an analysis of the best performing, and the best performing hyperparameter for said models. Out of four preprocessing techniques (Label Balancing, SIFT,  PCA, RFS) we determined that Label Balancing with oversampling was the best for generalization, while the other techniques lowered training time in exchange for a far greater error rate. Out of four models explored (GBC, CNN, RFS, SVM), GBC and CNN were chosen for similar levels of high accuracy in addition to differing training methods. We then determined the best hyperparameters for each model and visualized how each model functioned at peak performance.

# Introduction

# Background

# Data

# Methods

## Preprocessing

### Balance via Oversampling

### SIFT - (Scale Invariant Feature Transform)

##### A technique for simplifying the complexity of an image by transforming it into a histogram of commonly found features. The features within an image are defined as keypoints within the SIFT algorithm. A key point is defined as a local extrema within an image that is found by comparing a pixel with its neighbors for drastic shifts in pixel values. Next, a descriptor is taken of the local area around each key point which consists of a 128 bin feature vector. This vector describes the local area and a direction, allowing the keypoint to be applicable despite rotation. 128 bin descriptors are collected from training images and clustered via K-means to produce common descriptors. Then each image can be transformed into a histogram with each bin representing the number of times a common feature was detected within the image.

### PCA

### RFS

## Models

### Neural Network

##### Each neural network was run with a maximum of 10 epochs with the optimizer adam and sparse categorical cross entropy loss. A callback was implemented with a patience of 5 and monitored the validation accuracy. This was so that the model would return the weights for the best validation accuracy should the model run for 5 epochs without improvement. Testing will be done on the age variable due to it having the most unbalanced and varied classes.
##### Metrics used will be base accuracy, macro f1-score, macro recall, and macro-precision. This is so that we may compare how the model is doing on the entire validation dataset as well as whether it has equal metrics across all classes.

#### Preprocessing Results

<table>
<tr width = "200">
    <td>
        <figure>
            <img src="../Reports/figures/NeuralNetwork/PreProcessingResults.PNG"/>
            <figcaption>Fig.14 Results matrix for preprocessing on validation</figcaption>
        </figure>
    </td>
</tr>
</table>

#### Balanced versus Unbalanced
##### While the unbalanced dataset had greater accuracy than the balanced dataset, the balanced dataset had superior macro precision, recall, and f1-score. Balanced datasets would be chosen from then on.
#### Normalized versus Raw data
##### The normalized dataset had worse results in addition to requiring additional memory to store float64 values instead of int8. Raw data should be chosen from then on.
#### Balanced versus RFS, PCA, SIFT
##### All metrics resulting from the preprocessing techniques were worse than corresponding metrics in the non preprocessed balanced dataset. Thus, no preprocessing techniques would be utilized.
#### Verdict:
##### The non-normalized, balanced, non-preprocessed dataset has the best performance out of all iterations with an accuracy of 0.42.

### Model Selection

##### Based on initial results, GBC was to be chosen for its high level of accuracy. However, a better Neural Network Structure was discovered which gave an accuracy of 0.48. This is 0.02 above the GBC accuracy of 0.46. Thus, Neural Networks will be utilized for hyperparameter tuning. The selected structure is listed below.

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ModelSummary.PNG"/>
            <figcaption>Fig.x Neural Network Structure</figcaption>
        </figure>
    </td>
</tr>
</table>

# Evaluation 

## Neural Network

##### After concluding that a balanced dataset yielded the best performance and that the model itself was the 2nd most accurate, further hyperparameter tuning was needed to achieve a global maximum of performance. Five hyperparameters were tuned for the following runs: Dropout, L1 Regularization, L2 Regularization, Learning Rate, and Number of samples. A hyperparameter would only be considered if its accuracy was greater than the base model by more than 0.01. This threshold is to ensure that improvements are not due to changes in samplinga and are instead because the model genuinely improved.
#### Base Model's Hyperparameters
##### - Dropout: 0
##### - L1: 0
##### - L2: 0
##### - Lr: 0.001
##### - Sampling Size: 5000

### Hyperparameter tuning results

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/L2Report.PNG"/>
            <figcaption>Fig.15 L2 hyperparameter tuning values</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/L1Report.PNG" />
            <figcaption>Fig.16 L1 hyperparameter tuning values</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/DropoutReport.PNG" />
            <figcaption>Fig.17 Dropout hyperparameter tuning values</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/LrReport.PNG" />
            <figcaption>Fig.18 Learning Rate hyperparameter tuning values</figcaption>
        </figure>
    </td>
</tr>
</table>

##### Based on hyperparameter tuning on all models, L1: 0.001 and Lr: 0.0001 were the best candidates since their accuracy was beyond 0.01 of the base accuracy of 0.485552.

### Sample Size Testing

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/SampleAcc.PNG"/>
            <figcaption>Fig.27 Sample Size hyperparameter tuning versus accuracy</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/SampleLoss.PNG" />
            <figcaption>Fig.28 Sample Size hyperparameter tuning versus loss</figcaption>
        </figure>
    </td>
</tr>
</table>

##### Sample size amounts were tested from 500 to 8000 in intervals of 500. It was found that 5000 had the highest accuracy. Like before, the model loss seems to inversely correlate with the model's accuracy. Although the model was most accurate on 5000 samples, we will be using the 8000 as using more data is best practice.

### Post Hyperparameter testing

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/8000TestRaceReport.PNG"/>
            <figcaption>Fig.30 Classification report for the race test set</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/8000TestReport.PNG" />
            <figcaption>Fig.31 Classification report on the age test set</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/8000TestGenderReport.PNG" />
            <figcaption>Fig.32 Classification report on the gender test set</figcaption>
        </figure>
    </td>
</tr>
</table>

##### After some basic runs, it was discovered that LR: 0.0001 had the best validation accuracy at 0.61 and a test accuracy of 0.62. The classification report on age shows far better results than that of the control model. In addition, the race and gender model using the same neural network structure achieved better results than the dummy models. Thus, this model was to be considered satisfactory. The only thing left to do was to visualize the weakpoints of the model.

### Feature extraction by analyzing mislabeled classifications

#### Age

<table>
<tr width = "2000">
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Age/C15.PNG"/>
            <figcaption>Age Bracket: 5</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Age/C21.PNG" />
            <figcaption>Age Bracket: 1</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Age/C38.PNG" />
            <figcaption>Age Bracket: 8</figcaption>
        </figure>
    </td>
</tr>
<tr width = "2000">
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Age/IC14.PNG"/>
            <figcaption>Correct Label: 4 Predicted Label: 5</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Age/IC20.PNG" />
            <figcaption>Correct Label: 0 Predicted Label: 1</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Age/IC35.PNG" />
            <figcaption>Correct Label: 5  Predicted Label: 8</figcaption>
        </figure>
    </td>
</tr>    
</table>

#### Race

<table>
<tr width = "2000">
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Race/C13.PNG"/>
            <figcaption>Race: Indian</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Race/C22.PNG" />
            <figcaption>Race: Asian</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Race/C31.PNG" />
            <figcaption>Race: Black</figcaption>
        </figure>
    </td>
</tr>
<tr width = "2000">
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Race/IC10.PNG"/>
            <figcaption>Correct Label: White Predicted Label: Indian</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Race/IC24.PNG" />
            <figcaption>Correct Label: Other(All other races in this category) Predicted Label: Asian</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Race/IC33.PNG" />
            <figcaption>Correct Label: Indian  Predicted Label: Black</figcaption>
        </figure>
    </td>
</tr>    
</table>

### Gender

<table>
<tr width = "2000">
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Gender/C11.PNG"/>
            <figcaption>Gender: Female</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Gender/C20.PNG" />
            <figcaption>Gender: Male</figcaption>
        </figure>
    </td>
</tr>
<tr width = "2000">
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Gender/IC10.PNG"/>
            <figcaption>Correct Label: Male Predicted Label: Female</figcaption>
        </figure>
    </td>
    <td width="200">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/ClassificationIssues/Gender/IC21.PNG" />
            <figcaption>Correct Label: Female Predicted Label: Male</figcaption>
        </figure>
    </td>
</tr>    
</table>

#### Features extracted

##### How the model performs feature extraction can be analyzed by comparing the misclassifications. 
##### On the age model, it tends to use features associated with the facial edges or facial shape. This is evidenced by the similarity of wrinkles on the third pair, the smoothness of face on the first pair, and the round facial shape of the second pair.
##### On the race model, it mostly seems to use facial shape when determining racial features. Interestingly enough, skin color does not seem to be a factor in determining race for the model.
##### On the gender model, it is difficult to tell what the model is using to determine gender. The second pair disproves that the model is using facial shape as the main feature. In addition, both image pairs have different tones and different winkle edges.

### Final Comparison between Neural Networks, Dummy Model, Simple Models, and Baseline

##### Based on the age metric, the hyperparameter tuned neural network with Lr: 0.0001 outperforms all other models. The final age accuracy for the neural network was 0.62. If we compare to the simple models, KNN has a testing accuracy of 0.27 and Logistic Regression has a testing accuracy of 0.46. When we compare to the dummy model, it has an accuracy of 0.14.

# Conclusion

##### In all metrics, the Neural Network with hypertuning outperformed all other models. There was enough data to train each model on. However, the lack of memory within the testing machine limited the images trained to 5000 instead of potentially 60000 images in the fully balanced dataset. Based on the image misclassifications, the model seems to mainly use facial shapes, wrinkles, and smoothness as features. If there is any future work to be done, it would be to find a way to simplify the dataset to train more samples while retaining most of the information.

# Attribution

<table>
<tr width = "2000">
        <figure>
            <img src="../Reports/figures/GitCommitsandlines.PNG"/>
            <figcaption>Commits/additions/deletions per person</figcaption>
        </figure>
</tr>
<tr width = "2000">
    <td width="500">
    ofirsov000 : Oxana Firsova
    </td>
    <td width="500">
    jvivar2383 : Jenifer Vivar
    </td>
    <td width="500">
    usersblock : Thomas Ly
    </td>
    <td width="500">
    yxiang001 : Yinzi Xiang
    </td>
</tr>    
</table>

#### Oxana Firsova

##### Balanced all labels by applying SMOTE Technique. Splitting age label into 9 stages based on the Human Life Cycle.  Implement SVM algorithm. Applied masks on SVM and test accuracy score. Compare the accuracy on SVM with Sift and without. 

#### Jenifer Vivar

##### Added pre-processing techniques like Hair Cascade and several images filters (later discarded) for apply to the images before using them for the classier. I also, implemented a Random Forest algorithm and test the accuracy scores on the process and preprocess data to later compared with other models my group members were working on. 


#### Thomas Ly

##### Implemented the SIFT slgorithm to extract image features and simplify dimensions. Implemented the Convolutional Neural Network and preprocessed the dataset to find the best dataset for Neural Network. Hyperparameter tuned the Neural Network and performed feature extraction on the images based on misclassified examples.

#### Yinzi Xiang

##### Explore the feature extraction PCA and feature selection Random Forest feature selection. Explore, set and study the model of Gradient Boosting Classifier. Applied this model with feature extraction like PCA, shift and RFS under balanced and unbalanced labels.

# Bibliography

# Appendix