In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("midterm2_coding.ipynb")

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, Model, Sequential, losses, optimizers, metrics

from helper import plot_df, resample, data_generator, data_generator2
from helper import decode_image, load_and_preprocess_image, load_tfrecord_dataset,\
                   parse_example

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## Introduction

In this coding exercise you will build a model that predicts 3 properties of an individual from a picture of their face: age, ethnicity, and gender.

You should run this notebook on the OnDemand Jupyter environment for the course. That environment has all required libraries so you won't need to use the included requirements.txt file. The model you will build is very small so this can run on either the CPU or GPU onstance.

This Dataframe gives us an idea of what we are working with.

In [None]:
# Read the dataframe
df = pd.read_csv('data/face_data.csv')
df.head()

In [None]:
# map integer encodings to class labels
# label ordering based on prevelance and taken from the Kaggle dataset release:
# https://www.kaggle.com/datasets/nipunarora8/age-gender-and-ethnicity-face-data-csv
names_ethnicity = ['white', 'black', 'asian', 'indian', 'other']
name_genders = ['male', 'female']

In [None]:
plot_df(df)

We'll use a helper function to `resample` to make sure the ethnicities are equally represented in a new Dataframe.\
Please don't change `OBSERVATION_PER_CLASS` from the default of 7500.

In [None]:
OBSERVATIONS_PER_CLASS = 7500
df_resampled = resample(df, unbalanced_col='ethnicity', n_per_class=OBSERVATIONS_PER_CLASS)

In [None]:
plot_df(df_resampled)

That looks a little better, though not perfect. We'll settle for that for now.

### Train / Validation / Test Split

Using your resampled data and `train_test_split`, create a test set consisting of 20% of the resampled dataset. Then set aside 10% of what was left over from the first split as a validation set. Whatever remains will be your train set.

You should stratify your splits on both 'ethnicity' and 'gender' and use a random state of 109.

Store the results in `train_df`, `val_df`, and `test_df`.

**Hint:** The `stratify` argument can be passed a DataFrame with one or more columns.

In [None]:
# Split the dataframe into train and validation sets
train_df_temp, test_df = ...
train_df, val_df = ...

In [None]:
grader.check("data_split")

**Creating Datasets**

For the sake of consistency, to minimize mememory and storage space, and to make training as fast as possible, we will provide pre-made train, validation, and test datasets in the form of TF Dataset pipelines.

These have been balanced for both ethnicity and gender within each ethnicity, similar to what you attempted above in the splitting.

Run the code below to create train, validation, and test datasets. 

Later you may want to play with the batch size, but remember to stay conservative to prevent the kernel from dying.

In [None]:
BATCH_SIZE = 32 # num images in a generator batch
IMG_DIM = (32, 32) # orignal images are 48x48 dataset pipeline resizes to 32x32
train_tfrecord_file = 'data/faces_train_balanced.tfrecord'
val_tfrecord_file = 'data/faces_val_balanced.tfrecord'
test_tfrecord_file = 'data/faces_test_balanced.tfrecord'

train_dataset = load_tfrecord_dataset(train_tfrecord_file, IMG_DIM, BATCH_SIZE)
val_dataset = load_tfrecord_dataset(val_tfrecord_file, IMG_DIM, BATCH_SIZE)
test_dataset = load_tfrecord_dataset(test_tfrecord_file, IMG_DIM, BATCH_SIZE)

We can see what the datasets produce with this code:

In [None]:
# inspect what the datasets produce
train_dataset.element_spec

The **inputs** are 32x32 grayscale images of faces 
(the single channel has been duplicated twice to make them 32x32x3 to work well with models trained on color images.)

The **target** variables match the order of the columns in the Dataframe above.
- The first target is the **age** variable (continous).
- The second target is **ethnicity** (categorical)
- The third target is **gender** (binary in this dataset)

### Transfer Learning with BabyNet

The base of your model will be a very tiny CNN classifier pre-trained on the CIFAR-10 dataset (~80% test accuracy). The model is 10% the size of MobileNetV3Small; We call it `BabyNet`. 

Use keras to load this pre-trained model from the `BabyNet.keras` file and store the model in a variable called `babynet`. Then inspect the model summary.

In [None]:
# Load the 'BabyNet' model
babynet = ...
babynet.summary()

In [None]:
grader.check("load_babynet")

### Chop Off the Top!

We want to leverage the feature extraction ability of the convolutional section of the pre-trained network (conv2d & pooling) since we think these features will generalize to our own task of predicting age, ethnicity, and gender from pictures of faces, (even though the CIFAR-10 dataset contains no human faces!)

But we *do not care* about the original 'top' of BabyNet since these final layers are more related to the specific CIFAR-10 classification task.

Create a new model, `base_model`, with the same `inputs` as `babynet` but whose outputs come from the final layer in the convolutional section of `babynet` (i.e., before the dimensions get flattened). Be sure to also give your new model the name attribute 'base_model' when you construct it.

Printing the summary is always helpful!

In [None]:
# your code here
base_model = ...

base_model.summary()

In [None]:
grader.check("choptop")

### Freeze Weights

In our initial round of training we don *not* want to mangle the pre-trained weights because we think they will be useful. So you should **set the base model's weight to be not trainable**.


In [None]:
# your code here
# Freeze the base model weights for transfer learning
base_model.trainable = ...
base_model.summary()

In [None]:
grader.check("freeze")

### Data Augmentation

Create a small model to perform data augmentation. It should have one layer to perform a random, horizontal 'flip' followed by another layer to add a small amount of GaussianNoise. Store this model in the variable `data_augmentation` and give the the name attribute "data_aug".

**NOTE:** Do *not* change the name attributes of any of the layers as the default names will be used for comparison in the tests.

**Hint:**
- See what layers are available in `tf.keras.layers`
- It maybe be easier for you if you also use an input "layer"
- Images are already normalized to be between 0 and 1 (what BabyNet expects) so the amount of noise you want to add should be very small.

In [None]:
# your code here
data_augmentation = ...

In [None]:
# display data augmentation model summary
data_augmentation.summary()

In [None]:
grader.check("data_aug")

Run the code below to visualize some augmentation examples.

In [None]:
# generate variants of the first image from the training dataset
for images, labels in train_dataset.take(1):
    plt.figure(figsize=(4, 4))
    first_image = images[0]
    for i in range(4):
        ax = plt.subplot(2, 2, i + 1)
        # apply the transformation layers
        augmented_image = data_augmentation(
            tf.expand_dims(first_image, 0), training=True
        )
        # plt.imshow(augmented_image[0].numpy().astype("int32"))
        plt.imshow(augmented_image[0])
        plt.axis("off")
plt.tight_layout()

### Building the Model

Construct your model and store in the variable called `model`. It's structure should look like this:
- input "layer"
- data augmentation
- base_model
- new intermediate layer(s)
- 3 output layers, give them the name attributes 'age_output', 'ethnicity_output', 'gender_output'

**Hint:**
- The output of the base model is 3D (64 8x8 feature maps). Your new intermediate layer(s) will need too convert this output to 1D before it can be passed on to your output layers. We saw *at least 2 different ways* of doing this in class. Again, see what `tf.keras.layers` has to offer.
- You can experiment with the architecture you add to the base modelm *start out small and simple*. You can pass all the tests in this notebook with less than 100k total parameters in your full network.


In [None]:
# your code here
model = ...
...

In [None]:
# display summary
model.summary()

In [None]:
grader.check("build_model")

### Compile the Model

When dealing with multiple outputs it is useful to pass dictionaries for both the `loss` and `metrics` arguments of `compile`. The dictionary keys are the layer names and their values are the losses or metrics themselves.

Choose appropriate losses for each of the 3 outputs. 

You should monitor the following metrics:
- age: mean absolute error
- ethnicity: accuracy
- gender: accuracy

Create the dictionaries `loss` and `metrics` to pass to the `compile function`.

**Note:** The keys should be *strings* representing the relevant loss or metric (Keras understands these). These strings are expected to be lowercase and any spaces should be replaced by underscores.

For the **optimizer**, we recomment the Adam optimizer, though you might want to try something slighlty higher than the default learning rate.

Save your optimizer as `optimizer1`.

You can experiment with the `loss_weights` argument of the optimizer if you like, but it should not be necessary to pass all tests.

In [None]:
# your code here
optimizer1 = ...
loss = ...
metrics = ...
...

In [None]:
grader.check("compile_model")

### Initial Training

Train the new layers you added to the base model. You'll want to do at least 3 epochs. More would likely help, but start small while you're still testing things.

In [None]:
# your code here
initial_epochs = ...

history1 = model.fit(train_dataset,
                    epochs=initial_epochs,
                    validation_data=val_dataset)

In [None]:
grader.check("initial_training")

### Unfreeze Base Model & Recompile with New Optimizer

Now we will fine-tune the base model. First, you'll need to unfreeze its weights. Remember that you need to recompile the model for the change to take effect. When you print the summary the model should now show many more trainable parameters in the summary.

When recompiling in preparation for fine tuning you should choose a sensible **learning rate** for the new optimizer. Call this new optimizer, `optimizer2`.

In [None]:
# unfreeze the base model and compile
# your code here
optimizer2 = ...

...

In [None]:
# display summary
model.summary()

In [None]:
grader.check("unfreeze")

### Fine-Tuning

Now that *all* the weights in the network are trainable, train your model for a few more epochs. You'll probably want to do at least 2, but more may be beneficial depending on your initial training and model architecture.

**Hint:** remember that adjusting the weights of the pre-trained model also has the potential to be *destructive* if the learning rate is too high.

In [None]:
# your code here
fine_tune_epochs = ...
total_epochs = initial_epochs + fine_tune_epochs

history2 = model.fit(train_dataset,
                    initial_epoch=initial_epochs,  # Resume training
                    epochs=total_epochs,
                    validation_data=val_dataset)

**Display Full Training History**

In [None]:
full_history = {}
for key in history1.history.keys():
    full_history[key] = history1.history[key] + history2.history[key]

fig, axs = plt.subplots(1,3, figsize=(9,4))
axs = axs.flatten()

axs[0].plot(full_history['val_age_output_'+metrics['age_output']], label='train')
axs[0].plot(full_history['age_output_'+metrics['age_output']], label='validation')
axs[0].set_ylabel('MAE')
axs[0].set_title('Age')

axs[1].plot(full_history['ethnicity_output_'+metrics['ethnicity_output']], label='train')
axs[1].plot(full_history['val_ethnicity_output_'+metrics['ethnicity_output']], label='validation')
axs[1].axhline(0.2, c='r', ls=':', label='random chance')
axs[1].set_ylabel('Accuracy')
axs[1].set_title('Ethnicity')

axs[2].plot(full_history['gender_output_'+metrics['gender_output']], label='train')
axs[2].plot(full_history['val_gender_output_'+metrics['gender_output']], label='validation')
axs[2].axhline(0.5, c='r', ls=':', label='random chance')
axs[2].set_ylabel('Accuracy')
axs[2].set_title('Gender')

for ax in axs:
    ax.set_xlabel("Epoch")
    ax.set_xticks(list(range(total_epochs))[::3])
    x = np.arange(initial_epochs-1, total_epochs-1, 0.1)
    ax.fill_between(x, *ax.get_ylim(), alpha=0.2, color='green', label='fine-tuning')
    ax.legend();
    
plt.tight_layout()

In [None]:
grader.check("fine-tuning")

### Evaluate on Test Dataset

Evaluate your final model on test dataset and store the results in `test_eval`.

You should be able to achieve:
- Age MAE < 14
- Ethnicity Accuracy > 0.3
- Gender Accuracy > 0.6

In [None]:
# your code here
test_eval = ...
print(f"Test Age MAE: {test_eval[4]:.2f}")
print(f"Test Ethnicity ACC: {test_eval[5]:.2f}")
print(f"Test Gender ACC: {test_eval[6]:.2f}")

In [None]:
grader.check("test_eval")

### Predict on a Test Batch

Take one batch from the test dataset and store it as `test_batch`.

Next, use your model to predict on this batch and store the predictions in `y_pred`.

In [None]:
test_batch = ...
y_pred = ...

In [None]:
grader.check("predictions")

### You're almost there!

**There is no more code for you to enter after this point. If your notebook runs to the end you pass the final test!**

**Processing Prediction Vectors**

Here we create vectors `age_pred`, `eth_pred`, and `gen_pred` each containing the predicted values of the 3 target variables in the validation batch.

These vectors are all be flat (1D) and they should all contain *integers*.

In [None]:
age_pred = np.round(y_pred[0].flatten()).astype(int)
eth_pred = y_pred[1].argmax(axis=1)
gen_pred = (y_pred[2].flatten() >= 0.5).astype(int)

Use this code to extract the images and target variables from the batch. Note the shapes of these objects. You may want to investigate them further.

In [None]:
# input images and targets from batch
images, labels =  next(test_batch.as_numpy_iterator())

In [None]:
print('images.shape:', images.shape)
print('len(labels):', len(labels))
print('labels[0].shape:', labels[0].shape)
print('labels[1].shape:', labels[1].shape)
print('labels[2].shape:', labels[2].shape)

**True Labels**

Here are the true vectors `age_true`, `eth_true`, and `gen_true` each containing the true values of the 3 target variables in the validation batch.

These vectors are all be flat (1D) and they all contain integers

In [None]:
age_true, eth_true, gen_true = [y.flatten().astype(int) for y in labels]

Run the code below to inspect some example predictions from your validation batch

In [None]:
fig, axs = plt.subplots(4,4, figsize=(16,16))
axs = axs.flatten()
for i in range(16):
    axs[i].imshow(images[i])
    pred_str = f'{age_pred[i]} {names_ethnicity[eth_pred[i]]} {name_genders[gen_pred[i]]}'
    true_str = f'{age_true[i]} {names_ethnicity[eth_true[i]]} {name_genders[gen_true[i]]}'
    axs[i].set_title('[PRED]\n' + pred_str+'\n'+'[TRUE]\n'+ true_str)
    axs[i].set_xticks([])
    axs[i].set_yticks([])
plt.tight_layout()

OK, so the results are probably not the greatest given the time and base model's size, but with such a tiny model and dataset it is still surprising we can do much better than guessing on all there tasks at once! 

In [None]:
grader.check("you_did_it")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Run this to generate the zip you need to upload to Gradescope. Make sure all earlier cells were run and their output is still visible (Restart & Run All is always the best way).

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(filtering=False, pdf=False)