<a href="https://colab.research.google.com/github/ayulockin/faceattributes/blob/master/UTK_Face_Attribute_Classifier_with_TF2_0_and_W%26B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Initial Setup and Checks

In [0]:
!pip install wandb -q
!pip install tensorflow-gpu

In [1]:
import tensorflow as tf
print(tf.__version__)

import wandb
from wandb.keras import WandbCallback

2.0.0


In [0]:
!wandb login

## Clone project repo and set paths

In [3]:
!git clone https://github.com/ayulockin/faceattributes.git

Cloning into 'faceattributes'...
remote: Enumerating objects: 23388, done.[K
remote: Total 23388 (delta 0), reused 0 (delta 0), pack-reused 23388[K
Receiving objects: 100% (23388/23388), 115.72 MiB | 42.39 MiB/s, done.
Resolving deltas: 100% (12/12), done.
Checking out files: 100% (23749/23749), done.


In [4]:
!ls

faceattributes	sample_data  wandb


In [5]:
%cd faceattributes

/content/faceattributes


In [6]:
!ls

 datasets		      face_detector   prepareUTKFaceData.py
'EDA of Face Dataset.ipynb'   images	      README.md
 examples		      LICENSE	      test.py


In [7]:
import os

images = os.listdir('images')
print('Total number of images: ', len(images))

Total number of images:  23705


## Labels

In [0]:
import pandas as pd
import numpy as np

In [0]:
labels = pd.read_csv('datasets/face_dataset.csv')

In [10]:
labels.head(10)

Unnamed: 0,image_id,age,gender,ethnicity
0,teripprmot,100,0,0
1,ibjkghymsu,100,0,0
2,dlsaxmcymo,100,1,0
3,oyyopnhvza,100,1,0
4,nhufelmwaw,100,1,0
5,fkozflztvo,100,1,0
6,jfjtfckikm,100,1,0
7,exeinekyai,100,1,0
8,ynybxuwyrx,100,1,2
9,pkwnjssqij,100,1,2


> For each image there are three labels. Since we are building a multi-output classifier we need to prepare the dataset accordingly. 

In [11]:
labels.count()

image_id     23705
age          23705
gender       23705
ethnicity    23705
dtype: int64

In [0]:
# Shuffle dataframe
labels = labels.sample(frac=1).reset_index(drop=True)

## Helper Function

Refer EDA of the dataset for more insight into this.
Click [here](https://github.com/ayulockin/faceattributes/blob/master/EDA%20of%20Face%20Dataset.ipynb).

In [0]:
def groupAge(age):
#     [0, 5, 18, 24, 26, 27, 30, 34, 38, 46, 55, 65, len(ages)])
    if age>=0 and age<5:
        return 0
    elif age>=5 and age<18:
        return 1
    elif age>=18 and age<24:
        return 2
    elif age>=24 and age<26:
        return 3
    elif age>=26 and age<27:
        return 4
    elif age>=27 and age<30:
        return 5
    elif age>=30 and age<34:
        return 6
    elif age>=34 and age<38:
        return 7
    elif age>=38 and age<46:
        return 8
    elif age>=46 and age<55:
        return 9
    elif age>=55 and age<65:
        return 10
    else:
        return 11

## Prepare Data for training and validation

#### Imports

In [0]:
import numpy as np 
import cv2
import random
import matplotlib.pyplot as plt

%matplotlib inline

from tensorflow.keras.utils import to_categorical

#### Train-validation-test Split 

Tasks performed by this cell:
1. Split into train-validation-test in the ratio of 70:20:10
2. Create age group. Refer EDA of the dataset.
3. One hot encode each label.
4. Create separate labels for separate parts of the classifiers.

In [0]:
def formatdata(train_count, validation_count, test_count):
  partitions = {'train': [],
                'validation': [],
                'test': []}
  labels_dict = {'train_age': [], 'train_gender': [], 'train_ethnicity': [],
                 'validation_age': [], 'validation_gender': [], 'validation_ethnicity': [],
                 'test_age': [], 'test_gender': [], 'test_ethnicity': []}
  random.seed(1)

  print("[INFO] Preparing train data....")
  for ID in range(train_count):
    try:
        data = labels.loc[labels['image_id'] == images[ID][:-4]].values
        labels_dict['train_age'].append(to_categorical(groupAge(data[0][1]), num_classes=12, dtype='float32'))
        labels_dict['train_gender'].append(data[0][2])
        labels_dict['train_ethnicity'].append(to_categorical(data[0][3], num_classes=5, dtype='float32'))
        partitions['train'].append(images[ID])
    except IndexError:
        print("[ERROR]", images[ID])
        discared_data.append(images[ID])
  print("[INFO] Done")

  print("[INFO] Preparing validation data....")
  for ID in range(train_count, train_count+validation_count):
    try:
        data = labels.loc[labels['image_id'] == images[ID][:-4]].values
        labels_dict['validation_age'].append(to_categorical(groupAge(data[0][1]), num_classes=12, dtype='float32'))
        labels_dict['validation_gender'].append(data[0][2])
        labels_dict['validation_ethnicity'].append(to_categorical(data[0][3], num_classes=5, dtype='float32'))
        partitions['validation'].append(images[ID])
    except IndexError:
        print("[ERROR]", images[ID])
        discared_data.append(images[ID])
  print("[INFO] Done")

  ## Uncomment to get test split
  print("[INFO] Preparing test data....")
  for ID in range(train_count+validation_count, len(images)):
    try:
        data = labels.loc[labels['image_id'] == images[ID][:-4]].values
        labels_dict['test_age'].append(to_categorical(groupAge(data[0][1]), num_classes=12, dtype='float32'))
        labels_dict['test_gender'].append(data[0][2])
        labels_dict['test_ethnicity'].append(to_categorical(data[0][3], num_classes=5, dtype='float32'))
        partitions['test'].append(images[ID])
    except IndexError:
        print("[ERROR]", images[ID])
        discared_data.append(images[ID])
  print("[INFO] Done")

  return partitions, labels_dict

In [16]:
# train:validation:test = 70:20:10 = 16596:4742:2370

train_count = 5000  
validation_count = 1000
test_count = 100

partitions, labels_dict = formatdata(train_count, validation_count, test_count)

[INFO] Preparing train data....
[INFO] Done
[INFO] Preparing validation data....
[INFO] Done
[INFO] Preparing test data....
[INFO] Done


In [17]:
print("[INFO] Training Data")
print("Size of train data: ", len(partitions['train']))
print("Size of age as label: ", len(labels_dict['train_age']))
print("Size of gender as label: ", len(labels_dict['train_gender']))
print("Size of ethnicity as label: ", len(labels_dict['train_ethnicity']))
print("\n")
print("[INFO] Validation Data")
print("Size of validation data: ", len(partitions['validation']))
print("Size of age as label: ", len(labels_dict['validation_age']))
print("Size of gender as label: ", len(labels_dict['validation_gender']))
print("Size of ethnicity as label: ", len(labels_dict['validation_ethnicity']))
print("\n")
# Uncomment to log test split details
print("[INFO] Test Data")
print("Size of test data: ", len(partitions['test']))
print("Size of age as label: ", len(labels_dict['test_age']))
print("Size of gender as label: ", len(labels_dict['test_gender']))
print("Size of ethnicity as label: ", len(labels_dict['test_ethnicity']))

[INFO] Training Data
Size of train data:  5000
Size of age as label:  5000
Size of gender as label:  5000
Size of ethnicity as label:  5000


[INFO] Validation Data
Size of validation data:  1000
Size of age as label:  1000
Size of gender as label:  1000
Size of ethnicity as label:  1000


[INFO] Test Data
Size of test data:  17705
Size of age as label:  17705
Size of gender as label:  17705
Size of ethnicity as label:  17705


#### Load images on the memory (Good old `x_train` and `x_val`)

In [0]:
import imageio

In [0]:
def loadImages(images, imagesPath):
    print("[INFO] Loading....")
    X = []
    count = 0
    for image in images:
        if count%1000==0:
            print("[INFO] {} images loaded".format(count))
        img = imageio.imread(imagesPath+'/'+image)
        img = np.array(img)
        X.append(img)
        count+=1
    print("[INFO] Done")
    return np.array(X)

In [20]:
print("[INFO] Training Data")
trainX = loadImages(partitions['train'], 'images/')
print("[INFO] Validation Data")
validationX = loadImages(partitions['validation'], 'images/')

[INFO] Training Data
[INFO] Loading....
[INFO] 0 images loaded
[INFO] 1000 images loaded
[INFO] 2000 images loaded
[INFO] 3000 images loaded
[INFO] 4000 images loaded
[INFO] Done
[INFO] Validation Data
[INFO] Loading....
[INFO] 0 images loaded
[INFO] Done


In [0]:
trainX = trainX/255.0
validationX = validationX/255.0

In [22]:
print("[INFO] Training Images: ", trainX.shape)
print("[INFO] Validation Images: ", validationX.shape)

[INFO] Training Images:  (5000, 200, 200, 3)
[INFO] Validation Images:  (1000, 200, 200, 3)


#### Good old `y_train` and `y_val`

In [0]:
trainY = {
    'gender': np.array(labels_dict['train_gender']),
    'ethnicity': np.array(labels_dict['train_ethnicity']),
    'age': np.array(labels_dict['train_age'])
}

validationY = {
    'gender': np.array(labels_dict['validation_gender']),
    'ethnicity': np.array(labels_dict['validation_ethnicity']),
    'age': np.array(labels_dict['validation_age'])
}

trainY['gender'] = trainY['gender'].reshape(trainY['gender'].shape[0], 1)
validationY['gender'] = validationY['gender'].reshape(validationY['gender'].shape[0], 1)

In [24]:
print('Training labels')
print('[INFO] Shape of gender label: ', trainY['gender'].shape)
print('[INFO] Shape of ethnicity label: ', trainY['ethnicity'].shape)
print('[INFO] Shape of age label: ', trainY['age'].shape)
print('\nValidation labels')
print('[INFO] Shape of gender label: ', validationY['gender'].shape)
print('[INFO] Shape of ethnicity label: ', validationY['ethnicity'].shape)
print('[INFO] Shape of age label: ', validationY['age'].shape)

Training labels
[INFO] Shape of gender label:  (5000, 1)
[INFO] Shape of ethnicity label:  (5000, 5)
[INFO] Shape of age label:  (5000, 12)

Validation labels
[INFO] Shape of gender label:  (1000, 1)
[INFO] Shape of ethnicity label:  (1000, 5)
[INFO] Shape of age label:  (1000, 12)


## Weights and Biases configs and init

In [25]:
# Initilize a new wandb run
wandb.init(entity='ayush-thakur', project="multi-output-classifier")

W&B Run: https://app.wandb.ai/ayush-thakur/multi-output-classifier/runs/kor6x8g7

In [0]:
config = wandb.config

In [0]:
config.update(params={'epochs':5, 'gender_loss_wt': 0.5, 'ethnicity_loss_wt':0.5, 'age_loss_wt': 1.0}, allow_val_change=True)

In [0]:
# Hyperparameters
config.epochs = 5
config.batch_size = 32
config.shuffle_buffer = 64
config.optimizer = 'adam'

config.img_width=200
config.img_height=200

config.gender_classes = 2
config.enthnicity_classes = 5
config.age_classes = 12

config.gender_loss_wt = 0.5
config.ethnicity_loss_wt = 0.5
config.age_loss_wt = 1.0

#### Harness the power of `tf.data` input pipeline

In [0]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [0]:
train_dataset = tf.data.Dataset.from_tensor_slices((trainX, trainY))
validation_dataset = tf.data.Dataset.from_tensor_slices((validationX, validationY))

In [0]:
train_dataset = train_dataset.cache().\
    shuffle(buffer_size=config.shuffle_buffer).\
    repeat().\
    batch(config.batch_size).\
    prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

In [0]:
validation_dataset = validation_dataset.batch(config.batch_size)

## Build a Multi-Output Classification Model

In [0]:
import tensorflow.keras.backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense, Flatten, BatchNormalization, Dropout
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

In [0]:
def gender_classifier(inputLayer):
    x = Conv2D(32, kernel_size=(3,3), padding='same', activation='relu')(inputLayer)
    x = Conv2D(64, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(128, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(128, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(256, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(256, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Flatten()(x)
    x = Dense(256, activation='relu')(x)
    x = Dense(128, activation='relu')(x)
    x = Dense(64, activation='relu')(x)
    x = Dense(config.gender_classes, activation='sigmoid', name='gender')(x)
    
    return x

In [0]:
def ethnicity_classifier(inputLayer):
    x = Conv2D(64, kernel_size=(3,3), padding='same', activation='relu')(inputLayer)
    x = Conv2D(64, kernel_size=(3,3), padding='same', activation='relu')(inputLayer)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(64, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(64, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(128, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(128, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(256, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(256, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    x = Dense(256, activation='relu')(x)
    x = Dense(config.enthnicity_classes, activation='softmax', name='ethnicity')(x)
    
    return x

In [0]:
def age_classifier(inputLayer):
    x = Conv2D(32, kernel_size=(3,3), padding='same', activation='relu')(inputLayer)
    x = Conv2D(32, kernel_size=(3,3), padding='same', activation='relu')(inputLayer)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(64, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(64, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(128, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(128, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Conv2D(256, kernel_size=(3,3), padding='valid', activation='relu')(x)
    x = Conv2D(256, kernel_size=(3,3), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2,2))(x)
    x = BatchNormalization()(x)
    x = Dropout(0.25)(x)
    
    x = Flatten()(x)
    x = Dense(1024, activation='relu')(x)
    x = Dense(512, activation='relu')(x)
    x = Dense(256, activation='relu')(x)
    x = Dense(config.age_classes, activation='softmax', name='age')(x)
    
    return x

In [0]:
K.clear_session()
inputLayer = Input(shape=(config.img_width,config.img_height,3))
gender = gender_classifier(inputLayer)
ethnicity = ethnicity_classifier(inputLayer)
age = age_classifier(inputLayer)
model = Model(inputs=inputLayer, outputs=[gender, ethnicity, age])

In [38]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 200, 200, 3) 0                                            
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 200, 200, 32) 896         input_1[0][0]                    
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 200, 200, 64) 1792        input_1[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_7 (MaxPooling2D)  (None, 100, 100, 32) 0           conv2d_15[0][0]                  
______________________________________________________________________________________________

## Compile model

In [0]:
losses = {
    'gender': 'binary_crossentropy',
    'ethnicity': 'categorical_crossentropy',
    'age': 'categorical_crossentropy'
}

losses_weights = {
    'gender': config.gender_loss_wt,
    'ethnicity': config.ethnicity_loss_wt,
    'age': config.age_loss_wt
}

In [0]:
model.compile(optimizer=config.optimizer, loss=losses, loss_weights=losses_weights, metrics=['accuracy'])

## Train

In [41]:
# %%wandb
hist = model.fit_generator(train_dataset, validation_data=validation_dataset, 
                           epochs=config.epochs, 
                           steps_per_epoch=len(trainX)//config.batch_size, 
                           validation_steps=len(validationX)//config.batch_size,
                           callbacks=[WandbCallback(), tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Save your hard work

In [70]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [71]:
!ls '/content/gdrive/My Drive/BlogforWandB'

assets	datasets  images.zip  saved_model.pb  tmp  variables  wandb


In [0]:
model.save('/content/gdrive/My Drive/BlogforWandB/datasets/model_5e.h5')