# Abstract

##### Facial recognition has long remained one of Data Science's most difficult areas to approach. Whereas other types of data have easy to define features and relative simplicity, facial data includes a great deal of hidden or noisy information. Due to this, facial recognition remains a daunting field with no single approach guaranteed to achieve the desired result. And although the human brain excels at instinctively deriving difficult to define features at a glance, programs still struggle to extract something as basic as gender. Yet, should a model be developed that performs as well as humans in all conditions, it would  vastly increase efficiency in all sorts of fields. Basic examples include, medical diagnoses based on facial features, the removal of the need for identification documents, the increased ease of entering one's favorite sites etc. In light of this, the task was to perform exploratory analysis on a number of preprocessing techniques, combined with an analysis of the best performing, and the best performing hyperparameter for said models. Out of four preprocessing techniques (Label Balancing, SIFT,  PCA, RFS) we determined that Label Balancing with oversampling was the best for generalization, while the other techniques lowered training time in exchange for a far greater error rate. Out of four models explored (GBC, CNN, RFS, SVM), GBC and CNN were chosen for similar levels of high accuracy in addition to differing training methods. We then determined the best hyperparameters for each model and visualized how each model functioned at peak performance.

# Introduction

Image classification techniques are widely apply in the machine learning community and well understood by many. Today's world rely more and more on advance features to protect our identities online and to protect many sensitive information such as SSN and banck accounts. While basic features, such as eye color, nose, etc. are easiily recognisable to the models there are many others that remain still abstract and not easily recognized by the algorithms. Some of this features include age, gender and race.Many models attempt to solve this problems but the criteria and actual performance are still inadequate. Adding these features to the basic features could potencially increase the level of security in face recognision applications. The combinations of unique physical features combined with other variables like age, gender and race could add an extra layer of security protection to any application that needs this for sensible online transaction and for our right to privacy while online. 
<br>
<br>
Based on previous workd done on the field it is clear that there is no one way to solve this problem. There is a diversity of algorithms apply to solve this problem as well as a diversity of benchmarks to asses test performace. In this project the focus was on two particular models, Gradient Boosting and Neural Networks. Moreover, an emphasis was applied to data engineering and various pre-processing techniques.


<hr>




# Background

# Data

##### The Dataset was taken from Kaggle and could be uploaded in this link. The Faces dataset contains 20000+ cropped & aligned facial images with age, race and gender labels. Age label modified to 9 stages based on The Stages  of Human Life Cycle. Project has 6000 testing examples 3500 validation and 10500 training. After balancing training examples, we took 5000 samples proportionally. The pixel values are integer between 0 and 255. To normalize data, we divide it by 255. Each image has a shape of (200,200,3). Feature extracted by PCA.
The UTKFACE data set consist of 20000 labeled images. The images's labels consist of the target variables of age, gender and race. The age in the set ranges from 0-116 years old. The gender is 0 for make and 1 for female, and race from 0-4.
<ul>
    <li>[age] is an integer from 0 to 116, indicating the age</li>
    <li>[gender] is either 0 (male) or 1 (female)</li>
<li>[race] is an integer from 0 to 4, denoting White, Black, Asian, Indian, and Others (like Hispanic, Latino, Middle Eastern).</li>
 </ul>
 
The data set consits of mainly two categories, "in the wild image set" which are pictures of people with different backgrounds and settings and "cropped images" which are images were the face of each person was cropped to exclued as much of the bacground as possible. In this project the latter was used to train and test the models.

[ADD IMAGE INBALANCE]



# Methods

#### Balancing Labels
Balanced labels provide as more accuracy prediction. Unbalanced labels could be a case of undersampling or oversampling as well as incorrect  minority class classification could be a cause of huge issues.
On the pic. 1 and 2 you can see how labels looks before and after balancing. Late Childhood (Ages 9-11)has the least samples. Class Early Adulthood (Ages 21-35) and v has the most samples. To implement the balancing we use the SMOTE Technique. The main idea of the SMOTE algorithm is to analyze the minority samples and artificially synthesize new samples based on the minority samples to add to the dataset the flow of the algorithm is as follows:
 1. For each sample x in the minority class, use the Euclidean distance as a standard to calculate the distance from it to all samples in the sample set of the minority class to get its k nearest neighbors.

 2. Set the sampling factor to determine the sample increase N according to the sample imbalance factor. For each minority sample x, randomly select several samples from the k nearest neighbors, assuming that the chosen nearest neighbor is xn.

 3. For each randomly chosen neighbor xn, create a new sample with the original sample according to the following formula. During the balancing the classes move to (-1) stage.


  - #1 infancy (0-2)                                    #0 infancy (0-2)
  - #2 Early Childhood (Ages 3-5)                       #1 Early Childhood (Ages 3-5)
  - #3 Middle Childhood (Ages 6-8)                      #2 Middle Childhood (Ages 6-8
  - #4 Late Childhood (Ages 9-11)                       #3 Late Childhood (Ages 9-11)
  - #5 Adolescence (Ages 12-20)                         #4 Adolescence (Ages 12-20)
  - #6 Early Adulthood (Ages 21-35)                     #5 Early Adulthood (Ages 21-35)
  - #7 Midlife (Ages 36-50)                             #6 Midlife (Ages 36-50)
  - #8 Mature Adulthood (Ages 51-79)                    #7 Mature Adulthood (Ages 51-79)
  - #9 Late Adulthood (Age 80+)                         #8 Late Adulthood (Age 80+)




### Neural Network

##### Each neural network was run with a maximum of 10 epochs with the optimizer adam and sparse categorical cross entropy loss. A callback was implemented with a patience of 5 and monitored the validation accuracy. This was so that the model would return the weights for the best validation accuracy should the model run for 5 epochs without improvement. Testing will be done on the age variable due to it having the most unbalanced and varied classes.
##### Metrics used will be base accuracy, macro f1-score, macro recall, and macro-precision. This is so that we may compare how the model is doing on the entire validation dataset as well as whether it has equal metrics across all classes.

#### Preprocessing Results

<table>
<tr width = "200">
    <td>
        <figure>
            <img src="../Reports/figures/NeuralNetwork/PreProcessingResults.PNG"/>
            <figcaption>Fig.14 Results matrix for preprocessing on validation</figcaption>
        </figure>
    </td>
</tr>
</table>

#### Balanced versus Unbalanced
##### While the unbalanced dataset had greater accuracy than the balanced dataset, the balanced dataset had superior macro precision, recall, and f1-score. Balanced datasets would be chosen from then on.
#### Normalized versus Raw data
##### The normalized dataset had worse results in addition to requiring additional memory to store float64 values instead of int8. Raw data should be chosen from then on.
#### Balanced versus RFS, PCA, SIFT
##### All metrics resulting from the preprocessing techniques were worse than corresponding metrics in the non preprocessed balanced dataset. Thus, no preprocessing techniques would be utilized.
#### Verdict:
##### The non-normalized, balanced, non-preprocessed dataset has the best performance out of all iterations with an accuracy of 0.42.

# Evaluation 

## Neural Network

##### After concluding that a balanced dataset yielded the best performance and that the model itself was the 2nd most accurate, further hyperparameter tuning was needed to achieve a global maximum of performance. Five hyperparameters were tuned for the following runs: Dropout, L1 Regularization, L2 Regularization, Learning Rate, and Number of samples. A hyperparameter would only be considered if its accuracy was greater than the base model by more than 0.01. This threshold is to ensure that improvements are not due to changes in samplinga and are instead because the model genuinely improved.
#### Base Model's Hyperparameters
##### - Dropout: 0
##### - L1: 0
##### - L2: 0
##### - Lr: 0.001
##### - Sampling Size: 5000

### Hyperparameter tuning results

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/L2Report.PNG"/>
            <figcaption>Fig.15 L2 hyperparameter tuning values</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/L1Report.PNG" />
            <figcaption>Fig.16 L1 hyperparameter tuning values</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/DropoutReport.PNG" />
            <figcaption>Fig.17 Dropout hyperparameter tuning values</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/LrReport.PNG" />
            <figcaption>Fig.18 Learning Rate hyperparameter tuning values</figcaption>
        </figure>
    </td>
</tr>
</table>

##### Based on hyperparameter tuning on all models, L1: 0.001 and Lr: 0.0001 were the best candidates since their accuracy was beyond 0.01 of the base accuracy of 0.485552.

### Sample Size Testing

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/SampleAcc.PNG"/>
            <figcaption>Fig.27 Sample Size hyperparameter tuning versus accuracy</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/SampleLoss.PNG" />
            <figcaption>Fig.28 Sample Size hyperparameter tuning versus loss</figcaption>
        </figure>
    </td>
</tr>
</table>

##### Sample size amounts were tested from 500 to 8000 in intervals of 500. It was found that 5000 had the highest accuracy. Like before, the model loss seems to inversely correlate with the model's accuracy. Although the model was most accurate on 5000 samples, we will be using the 8000 as using more data is best practice.

### Post Hyperparameter testing

<table>
<tr width = "2000">
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/8000TestRaceReport.PNG"/>
            <figcaption>Fig.30 Classification report for the race test set</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/8000TestReport.PNG" />
            <figcaption>Fig.31 Classification report on the age test set</figcaption>
        </figure>
    </td>
    <td width="500">
        <figure>
            <img src="../Reports/figures/NeuralNetwork/HyperparameterTuning/8000TestGenderReport.PNG" />
            <figcaption>Fig.32 Classification report on the gender test set</figcaption>
        </figure>
    </td>
</tr>
</table>

##### After some basic runs, it was discovered that LR: 0.0001 had the best validation accuracy at 0.61 and a test accuracy of 0.62. The classification report on age shows far better results than that of the control model. In addition, the race and gender model using the same neural network structure achieved better results than the dummy models. Thus, this model was to be considered satisfactory. The only thing left to do was to visualize the weakpoints of the model.

# Conclusion

# Attribution

# Bibliography

# Appendix