### Thien Win
BrainStation Data Science Capstone <br>
April 2022 <br>

</br>

##### Notebook Table of Contents: <br>
[1] Data Scraping and Wrangling <br>
[2] CycleGAN Training <br>
[3] Model Evaluation <br>
<b>[4] FID Score </b><br>
<hr>

### [4] FID Score

##### Recommended Computing: Google Colab Pro(+) / TPU

<hr>

#### Introduction

As demonstrated in the last notebook `[3] Model Evaluation`, when evaluating image quality from a subjective point of view, there can be discrepancies from image to image and from generator to generator.  

GAN models and their associated generated images are notoriously difficult to evaluate. Model evaluation is important as it informs the builder of the correct model, when to stop training or how different changes impact model performance. Out of the several studied areas, I have decided to use the Frèchet Inception Distance (FID) score as a performance metric.

The FID score is a performance metric that calculates the distance between the feature vectors of real images (real Studio Ghibli images in this case) and the feature vectors of the associated generated “fake” images. The FID Score uses the Inception v3 model to this effort. In practice, a lower FID score has been shown to correlate with higher quality generated images with a perfect score of 0.0 indicating that the real and generated images are identical. Unfortunately, there is no baseline metric to determine if a FID score is good (ie a FID score < x is good). I am using it in this case to only compare different levels of training.

In this notebook, I will be defining the FID scoring method and evaluating the generated images from the previous notebook and associated generator to quantitatively select the best performing generator.



In [1]:
#make sure to include scikit-image library into environment

In [2]:
#used during training on Colab

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
pip install tensorflow_addons

Collecting tensorflow_addons
  Downloading tensorflow_addons-0.16.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
[?25l[K     |▎                               | 10 kB 24.7 MB/s eta 0:00:01[K     |▋                               | 20 kB 10.8 MB/s eta 0:00:01[K     |▉                               | 30 kB 9.1 MB/s eta 0:00:01[K     |█▏                              | 40 kB 8.5 MB/s eta 0:00:01[K     |█▌                              | 51 kB 4.7 MB/s eta 0:00:01[K     |█▊                              | 61 kB 5.5 MB/s eta 0:00:01[K     |██                              | 71 kB 5.6 MB/s eta 0:00:01[K     |██▍                             | 81 kB 4.3 MB/s eta 0:00:01[K     |██▋                             | 92 kB 4.8 MB/s eta 0:00:01[K     |███                             | 102 kB 5.3 MB/s eta 0:00:01[K     |███▏                            | 112 kB 5.3 MB/s eta 0:00:01[K     |███▌                            | 122 kB 5.3 MB/s eta 0:00:01[K     |███▉

In [4]:
#import modules and libraries for notebook
import tensorflow as tf
import tensorflow_addons as tfa
import numpy as np
from cycleGAN_functions import *
import math
from scipy.linalg import sqrtm
import matplotlib.pyplot as plt

<hr>

#### FID Functions

The first step is to define the Inception V3 model and functions for calculating activation from the model (embeddings) which is shown as follows:

In [5]:
inception_model = tf.keras.applications.InceptionV3(include_top=False, 
                              weights="imagenet", 
                              pooling='avg')

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5


In [6]:
def compute_embeddings(dataloader, count):
    image_embeddings = []

    for _ in range(count):
        images = next(iter(dataloader))
        embeddings = inception_model.predict(images)

        image_embeddings.extend(embeddings)

    return np.array(image_embeddings)

In [7]:
def calculate_fid(embeddings1, embeddings2):
    mu1 = embeddings1.mean(axis=0)
    mu2 = embeddings2.mean(axis=0)
    ssdiff = np.sum((mu1 - mu2)**2.0)

    sigma1 = np.cov(embeddings1,  rowvar=False)
    sigma2 = np.cov(embeddings2,  rowvar=False)
    covmean = sqrtm(sigma1.dot(sigma2))

    if np.iscomplexobj(covmean):
      covmean = covmean.real

    fid = ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)
    fid = round(fid)
    return fid

<hr>

#### SG12 Embeddings

I will calculate the embeddings for each of the 3 trained generators by importing previously generated images in the previous notebook. These generated images are avaialable on the google drive or can be created via the trained models.

In [8]:
#import images from generator trained to epoch 12
gen_SG12 = tf.keras.utils.image_dataset_from_directory(
        '/content/drive/MyDrive/cycleGAN_deliveryservice/generated_imgs/epoch12',
        labels=None,
        label_mode=None,
        class_names=None,
        color_mode='rgb',
        batch_size=None,
        image_size=(256,256),
        shuffle=False,
        seed=123,
        validation_split=None,
        subset=None,
        interpolation='bilinear',
        follow_links=True,
        crop_to_aspect_ratio=False
    )

Found 751 files belonging to 1 classes.


In [9]:
#Inception Model takes 299x299 images
def resize299(image, size=[299,299]):
    '''
    Helper function to resize image to target value i.e. 299x299
    '''
    return tf.image.resize(image, size, preserve_aspect_ratio=True, method='bilinear')

In [10]:
#apply resizing to each photo in dataset
AUTOTUNE = tf.data.AUTOTUNE
BUFFER_SIZE = 1000
BATCH_SIZE = 1

gen_SG12 = gen_SG12.cache().map(
    resize299, num_parallel_calls=AUTOTUNE).batch(BATCH_SIZE)

In [11]:
#calculate embeddings
generated_image_embeddings12 = compute_embeddings(gen_SG12, count=751)

In [12]:
#sanity check for shape
generated_image_embeddings12.shape

(751, 2048)

I will perform a sanity check to ensure that when calculating the FID score for the same feature vectors against itself, it should be 0 since it would be exactly the same.

In [13]:
#sanity check ---> the FID should be 0 since we are comparing the same feature vectors
calculate_fid(generated_image_embeddings12,generated_image_embeddings12)

0

In seeing that the FID score has calculated this properly, I will continue and perform the same for the generator trained to epoch 52 and 100 as well as calculate the embeddings for the real test SG images.

<hr>

#### SG58 Embeddings

In [14]:
#import images from generator trained to epoch 58
gen_SG58 = tf.keras.utils.image_dataset_from_directory(
        '/content/drive/MyDrive/cycleGAN_deliveryservice/generated_imgs/epoch58',
        labels=None,
        label_mode=None,
        class_names=None,
        color_mode='rgb',
        batch_size=None,
        image_size=(256,256),
        shuffle=False,
        seed=None,
        validation_split=None,
        subset=None,
        interpolation='bilinear',
        follow_links=True,
        crop_to_aspect_ratio=False
    )

Found 751 files belonging to 1 classes.


In [15]:
AUTOTUNE = tf.data.AUTOTUNE
BUFFER_SIZE = 1000
BATCH_SIZE = 1

gen_SG58 = gen_SG58.cache().map(
    resize299, num_parallel_calls=AUTOTUNE).batch(BATCH_SIZE)

In [16]:
generated_image_embeddings58 = compute_embeddings(gen_SG58, count=751)

<hr>

#### SG100 Embeddings

In [17]:
gen_SG100 = tf.keras.utils.image_dataset_from_directory(
        '/content/drive/MyDrive/cycleGAN_deliveryservice/generated_imgs/epoch100',
        labels=None,
        label_mode=None,
        class_names=None,
        color_mode='rgb',
        batch_size=None,
        image_size=(256,256),
        shuffle=False,
        seed=None,
        validation_split=None,
        subset=None,
        interpolation='bilinear',
        follow_links=True,
        crop_to_aspect_ratio=False
    )

Found 751 files belonging to 1 classes.


In [18]:
AUTOTUNE = tf.data.AUTOTUNE
BUFFER_SIZE = 1000
BATCH_SIZE = 1

gen_SG100 = gen_SG100.cache().map(
    resize299, num_parallel_calls=AUTOTUNE).batch(BATCH_SIZE)

In [19]:
generated_image_embeddings100 = compute_embeddings(gen_SG100, count=751)

<hr>

#### Real SG Embeddings

In [20]:
real_SG = tf.keras.utils.image_dataset_from_directory(
        '/content/drive/MyDrive/cycleGAN_deliveryservice/data/SG/testA',
        labels=None,
        label_mode=None,
        class_names=None,
        color_mode='rgb',
        batch_size=None,
        image_size=(1038, 1920),
        shuffle=False,
        seed=None,
        validation_split=None,
        subset=None,
        interpolation='bilinear',
        follow_links=True,
        crop_to_aspect_ratio=False
    )

Found 380 files belonging to 1 classes.


In [21]:
def real_SG_preprocess(image):
  image = normalize(image)
  image = center_crop(image)
  image = resize299(image)
  return image

In [22]:
AUTOTUNE = tf.data.AUTOTUNE
BUFFER_SIZE = 1000
BATCH_SIZE = 1

real_SG = real_SG.cache().map(
    real_SG_preprocess, num_parallel_calls=AUTOTUNE).batch(BATCH_SIZE)

In [23]:
real_image_embeddings = compute_embeddings(real_SG, count=380)

<hr>

#### Calculating FID

In [24]:
FID12 = calculate_fid(generated_image_embeddings12, real_image_embeddings)
FID58 = calculate_fid(generated_image_embeddings58, real_image_embeddings)
FID100 = calculate_fid(generated_image_embeddings100, real_image_embeddings)

In [25]:
print("FID12 =", FID12)
print("FID58 =", FID58)
print("FID100 =", FID100)

FID12 = 584583
FID58 = 594082
FID100 = 402783


From the FID Score calculated above, it can be seen that the generator that was trained for 100 epochs had the lowest score which means that it's feature vectors are closest to the feature vectors of the SG test images. 

<hr>

#### Conclusion

As demonstrated in this notebook, we have calculated the FID Score for generators trained to epoch 12, 58, and 100 with respective to Studio Ghibli feature vectors and found that my generator trained to epoch 100 produced the best score. There is still the issue with the non-convergence as seen in the previous notebook but for the problem space created, the results were satisfactory.

Though not perfect, this concludes the notebooks and submission as the project is. For future iterations, I will look to employ different methods and architectures to increase generated image fidelity.