## ChatGPT Face Detector
Below are two programs about face recognition obtained from ChatGPT using various techniques of machine learning. 

### Background Context
##### K-Nearest Neighbors (KNN)
A popular supervised machine learning algorithm used in regression and classification. The model prediction is based on the similarities between the unseen data from the test set and its k nearest neighbors in the training set.<br>
- **Advantage:** simple and easy to implement <br>
- **Disadvantage:** lazy learner (train while making prediction) so slower and more costly (memory)

##### Convolutional Neural Network (CNN)
A type of deep learning neural network architecture used for image and speech processing. By using multiple interconnected layers, they can extract useful features from the input data and use them to make predictions<br>
- **Advantage:** multiple layers enable capture and recognize variations of data<br>
- **Disadvantage:** high complexity (expensive to train and use)<br>

### About the Dataset
The file contains two set of data: training and testing in the `train` and `val` folders respectively. Each of them contains two set of randomly chosen images displaying two different type of facial expressions: happy and sad. 

Data size: 
|  | train | test |
| --- | --- | --- |
| happy men | 25 | 5 |
| happy women | 25 | 5 |
| sad men | 25 | 5 |
| sad women | 25 | 5 |

[Image Source](https://stock.adobe.com/)

### Your tasks: 
KNN Model:
- Explain what does `Accuracy` tells you.
- Compute `precision` and `recall` and explain what they mean.
<br>

CNN Model:
- Train the model with the given dataset. What could do potentially improves `accuracy` of the model?
- Explain what does `loss` and `accuracy` tells you.
- Compute TP, TN, FP, FN
- Compute Precision and Recall
- How much of the True Positive were Male? Female?
- How much of the True Negative were Male? Female?
- Create a bar chart to show these proportion in terms of percentage.

### Format:
- For questions that require justification, include all your answers in a (one) Markdown cell after each program.
- For questions that require programming output, make sure it's clear what each output is. 

In [4]:
#use KNN
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Load the LFW dataset
# downloads and returns the images of people's face and labels
# only include people with at least 70 images in the dataset
lfw = datasets.fetch_lfw_people(min_faces_per_person=70)

print(lfw)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(lfw.data, lfw.target, test_size=0.2, random_state=42)

# Train a K-Nearest Neighbors classifier
# n_neighbors: use the 5 nearest neighbors to make the predictions
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = knn.predict(X_test)

# Calculate the accuracy of the classifier
# comparing the predicted labels (y_pred) with the true labels (y_test)
accuracy = accuracy_score(y_test, y_pred)

# Print the accuracy of the classifier
print("\nAccuracy:", accuracy)

# accuracy gives the proportion of accurately predicted sample from the total number of sample
# the closer to 1 the better

{'data': array([[0.99477124, 0.99607843, 0.99477124, ..., 0.38169935, 0.38562092,
        0.38169935],
       [0.14771242, 0.16078432, 0.21437909, ..., 0.44836605, 0.45098042,
        0.58300656],
       [0.34117648, 0.3503268 , 0.4366013 , ..., 0.7176471 , 0.72156864,
        0.7163399 ],
       ...,
       [0.35816994, 0.3503268 , 0.31895426, ..., 0.21568628, 0.21568628,
        0.17777778],
       [0.19346406, 0.21045752, 0.29150328, ..., 0.6875817 , 0.6575164 ,
        0.5908497 ],
       [0.12418301, 0.09673203, 0.10849673, ..., 0.12941177, 0.16209151,
        0.29150328]], dtype=float32), 'images': array([[[0.99477124, 0.99607843, 0.99477124, ..., 0.26797387,
         0.23137255, 0.20130719],
        [0.9973857 , 0.9973857 , 0.99607843, ..., 0.275817  ,
         0.24052288, 0.20915033],
        [0.98692805, 0.9751634 , 0.96732026, ..., 0.27058825,
         0.24183007, 0.21960784],
        ...,
        [0.33594772, 0.2771242 , 0.20522876, ..., 0.41045752,
         0.39869282, 0.37

In [None]:
#use CNN
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define the directory paths for the training and validation datasets
train_dir = 'train'
val_dir = 'val'

# Define the number of classes (happy and sad)
num_classes = 2

# Define the input shape of the images
input_shape = (160, 160, 1)

# Define the batch size for the data generators
batch_size = 5

# Define the data generators for the training and validation datasets
train_datagen = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

val_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=input_shape[:2],
    color_mode='grayscale',
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

val_generator = val_datagen.flow_from_directory(
    val_dir,
    target_size=input_shape[:2],
    color_mode='grayscale',
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

# Define the model architecture
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=input_shape),
    tf.keras.layers.MaxPooling2D((2,2)),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples//batch_size,
    epochs=10,
    validation_data=val_generator,
    validation_steps=val_generator.samples//batch_size
)

# Save the model
model.save('face_classification_model.h5')

**Accuracy:** Proportion of correct prediction out of total prediction of the training/testing data.
- The higher the better
<br>

**Loss:** Measures the difference between the predicted and true output of the training/testing data. 
- The lower the better