*DELE CA2 Part A Submission*

# CIFAR10 Image Synthesis with GANs

|          Name        |      Class    | Admin No. |
|----------------------|---------------|-----------|
| Timothy Chia Kai Lun | DAAA/FT/2B/02 | P2106911  |

**<u>Objectives</u>**

The aim of this assignment will be to research and implement existing GAN architecture and methods to generate new images based on the CIFAR10 dataset.

## 1. About CIFAR10

The CIFAR10 and CIFAR100 datasets are labeled subsets of a much larger Tiny Images dataset consisting of 80 million images (Shah, 2021). The CIFAR10 dataset consists of 60,000 32x32 colour images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images. The in context of image generation, we are tasked with synthesizing new images based on the CIFAR10 dataset in the RGB colour space.

## 2. Project Setup

In this section, I will be importing the necessary packages and dataset needed for this assignment. Then, I will conduct an exploration of the dataset to identify any preprocessing steps needed before defining and training our GAN models. I will also be making use of TensorFlow's [`Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) class to create a custom dataset to feed into the GAN models for training.

### 2.1 Importing Packages and CIFAR10 Dataset

In [2]:
import os
import pickle
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from skimage import exposure
from tensorflow.keras.datasets import cifar10
from IPython import display
from tensorflow.data import Dataset
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Input, Dense, Conv2D, Conv2DTranspose, Embedding, Reshape, Flatten, Dropout, BatchNormalization, ReLU, LeakyReLU, MaxPooling2D, Concatenate
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.utils import plot_model

In [3]:
# set matplotlib style
sns.set(rc={'figure.dpi': 120})
sns.set_style('whitegrid')

# set gpu memory growth
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

# set random seed
tf.random.set_seed(42)
np.random.seed(42)

### 2.2 Exploratory Data Analysis

From importing the CIFAR10 dataset, we can see that the images come in two sets, one for training and one for testing. The dataset comes in the form of numpy arrays of sizes (50000, 32, 32, 3) and (10000, 32, 32, 3) for train and test sets respectively. I have also verified that there are only 10 classes for this dataset.

In [4]:
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
n_labels = len(np.unique(train_labels))

print(f'Train image shape: {train_images.shape}')
print(f'Test image shape: {test_images.shape}')
print(f'Number of labels: {n_labels}')

Train image shape: (50000, 32, 32, 3)
Test image shape: (10000, 32, 32, 3)
Number of labels: 10


#### 2.2.1 Inspecting Images

It is important to view what kind of images we are trying to generate new samples from, as certain images may hinder our generative model from producing proper samples. It can be seen from each class that they have some similarity in terms of features for example, airplane images contain largely single coloured backgrounds and have pointy/cylindrical shapes. 

One thing that I notice is that colours of frogs in the dataset tend to blend in with the background and that these biological features may have issues being learnt by our model.

In [None]:
label_map = {
    0: 'airplane',
    1: 'automobile',
    2: 'bird',
    3: 'cat',
    4: 'deer',
    5: 'dog',
    6: 'frog',
    7: 'horse',
    8: 'ship',
    9: 'truck'
}

def display_images(images, labels, n_images=10):
    fig, axes = plt.subplots(nrows=n_labels, ncols=n_images, figsize=(20, 20))
    for i in range(n_images):
        for j in range(n_labels):
            axes[j, i].imshow(images[labels.flatten() == j][i])
            axes[j, i].set_title(label_map[j])
            axes[j, i].axis('off')
    plt.tight_layout()
    plt.show()

display_images(train_images, train_labels)

![caifar10-images](images\submission_materials\cifar10.png)

#### 2.2.2 Class Colour Distributions

Our models see these images differently from humans, they only understand the numbers, specifically they "see" these images as numbers/pixel values. Which is why viewing and understanding the differences in each classes distribution is important as it is what our model will be aiming to approximate.

In [None]:
bins = 32

fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 5), sharex=True, sharey=True)

for i, ax in zip(range(n_labels), axes.flat):
    idx = np.where(train_labels == i)[0]
    ax.hist(train_images[idx, ..., 0].ravel(), bins=bins, color='r', alpha=.7)
    ax.hist(train_images[idx, ..., 1].ravel(), bins=bins, color='g', alpha=.7)
    ax.hist(train_images[idx, ..., 2].ravel(), bins=bins, color='b', alpha=.7)
    ax.set_title(label_map[i])

fig.legend(['Red', 'Green', 'Blue'], loc='upper right', fontsize=12, ncol=3, bbox_to_anchor=(0.592, 1.0), frameon=False)
fig.suptitle('RGB Distribution by Class', fontsize=16, y=1.05)
plt.tight_layout()
plt.show()

![class-rgb-distribution](images\submission_materials\class_distibutions.png)

### 2.3 Data Preparation

#### 2.3.1 Combining Train and Test Sets

Because of the unsupervised component of GANs, we are not trying to predict a label associated with the data and we are not trying to generalize any kind of predictions to new data. Rather, we are trying to approximate what the data looks like and generate new samples of data. Hence, I will be combining both sets to be used in training.

In [None]:
# concate train and test numpy arrays
images = np.concatenate((train_images, test_images), axis=0)
labels = np.concatenate((train_labels, test_labels), axis=0)

#### 2.3.2 Removing Low Contrast Images

In [None]:
low_contrast_idx = []

# loop over images and check for low contrast
for idx, image in enumerate(images):
    if exposure.is_low_contrast(image, fraction_threshold=0.15):
        low_contrast_idx.append(idx)

# plot images in the low contrast list
fig, axes = plt.subplots(10, 10, figsize=(20, 20))
axes = axes.flatten()

# subset of low contrast images
for i in range(100):
    img_lbl = label_map[labels[low_contrast_idx[i]][0]]
    axes[i].imshow(images[low_contrast_idx[i]])
    axes[i].set_title(f'{img_lbl}\nIndex: {low_contrast_idx[i]}')
    axes[i].axis('off')

plt.tight_layout()
plt.show()

![low-contrast-images](images\submission_materials\low_contrast.png)

In [None]:
# delete low contrast images
images = np.delete(images, low_contrast_idx, axis=0)
labels = np.delete(labels, low_contrast_idx, axis=0)

### 2.4 Converting to TensorFlow Dataset

In [None]:
BUFFER_SIZE = 10_000
BATCH_SIZE = 128

dataset = Dataset.from_tensor_slices((images, labels))
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
# normalize images to range [-1, 1] as generator will be using tanh activation
dataset = dataset.map(lambda x, y: (tf.cast(x, tf.float32) / 127.5 - 1, y))
image_spec, label_spec = dataset.element_spec 

## 3. Experiments

### 3.1 Metrics

In [None]:
class KID(Metric):
    def __init__(self, name="kid"):
        super().__init__(name=name)
        self.kid_tracker = tf.keras.metrics.Mean()
        self.encoder = tf.keras.Sequential(
            [
                tf.keras.layers.InputLayer(input_shape=(32, 32, 3)),
                tf.keras.layers.Rescaling(255.0),
                tf.keras.layers.Resizing(height=75, width=75),
                tf.keras.layers.Lambda(tf.keras.applications.inception_v3.preprocess_input),
                tf.keras.applications.InceptionV3(
                    include_top=False,
                    input_shape=(75, 75, 3),
                    weights="imagenet",
                ),
                tf.keras.layers.GlobalAveragePooling2D(),
            ],
            name="inception_encoder",
        )

    def polynomial_kernel(self, features_1, features_2):
        feature_dimensions = tf.cast(tf.shape(features_1)[1], dtype=tf.float32)
        return (features_1 @ tf.transpose(features_2) / feature_dimensions + 1.0) ** 3.0

    def update_state(self, real_images, generated_images, sample_weight=None):
        real_features = self.encoder(real_images, training=False)
        generated_features = self.encoder(generated_images, training=False)

        # compute polynomial kernels using the two sets of features
        kernel_real = self.polynomial_kernel(real_features, real_features)
        kernel_generated = self.polynomial_kernel(
            generated_features, generated_features
        )
        kernel_cross = self.polynomial_kernel(real_features, generated_features)

        # estimate the squared maximum mean discrepancy using the average kernel values
        batch_size = tf.shape(real_features)[0]
        batch_size_f = tf.cast(batch_size, dtype=tf.float32)
        mean_kernel_real = tf.reduce_sum(kernel_real * (1.0 - tf.eye(batch_size))) / (
            batch_size_f * (batch_size_f - 1.0)
        )
        mean_kernel_generated = tf.reduce_sum(
            kernel_generated * (1.0 - tf.eye(batch_size))
        ) / (batch_size_f * (batch_size_f - 1.0))
        mean_kernel_cross = tf.reduce_mean(kernel_cross)
        kid = mean_kernel_real + mean_kernel_generated - 2.0 * mean_kernel_cross

        # update the average KID estimate
        self.kid_tracker.update_state(kid)

    def result(self):
        return self.kid_tracker.result()

    def reset_state(self):
        self.kid_tracker.reset_state()

### 3.2 Baseline: Conditional DCGAN

### 3.2.1 Generator Network

In [None]:
def create_generator(latent_dim):
    # foundation for label embeedded input
    label_input = Input(shape=(1,), name='label_input')
    label_embedding = Embedding(10, 10, name='label_embedding')(label_input)
    
    # linear activation
    label_embedding = Dense(4 * 4, name='label_dense')(label_embedding)

    # reshape to additional channel
    label_embedding = Reshape((4, 4, 1), name='label_reshape')(label_embedding)
    assert label_embedding.shape == (None, 4, 4, 1)

    # foundation for 4x4 image input
    noise_input = Input(shape=(latent_dim,), name='noise_input')
    noise_dense = Dense(4 * 4 * 128, name='noise_dense')(noise_input)
    noise_dense = ReLU(name='noise_relu')(noise_dense)
    noise_reshape = Reshape((4, 4, 128), name='noise_reshape')(noise_dense)
    assert noise_reshape.shape == (None, 4, 4, 128)

    # concatenate label embedding and image to produce 129-channel output
    concat = Concatenate(name='concatenate')([noise_reshape, label_embedding])
    assert concat.shape == (None, 4, 4, 129)

    # upsample to 8x8
    conv1 = Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same', name='conv1')(concat)
    assert conv1.shape == (None, 8, 8, 128)
    conv1 = ReLU(name='conv1_relu')(conv1)

    # upsample to 16x16
    conv2 = Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same', name='conv2')(conv1)
    assert conv2.shape == (None, 16, 16, 128)
    conv2 = ReLU(name='conv2_relu')(conv2)

    # upsample to 32x32
    conv3 = Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same', name='conv3')(conv2)
    assert conv3.shape == (None, 32, 32, 128)
    conv3 = ReLU(name='conv3_relu')(conv3)

    # output 32x32x3
    output = Conv2D(3, (3, 3), activation='tanh', padding='same', name='output')(conv3)
    assert output.shape == (None, 32, 32, 3)

    model = Model(inputs=[noise_input, label_input], outputs=output, name='generator')

    return model

### 3.2.2 Discriminator Network

In [None]:
def create_discriminator():
    # foundation for label embeedded input
    label_input = Input(shape=(1,), name='label_input')
    label_embedding = Embedding(10, 10, name='label_embedding')(label_input)
    
    # linear activation
    label_embedding = Dense(32 * 32, name='label_dense')(label_embedding)
    
    # reshape to additional channel
    label_embedding = Reshape((32, 32, 1), name='label_reshape')(label_embedding)
    assert label_embedding.shape == (None, 32, 32, 1)

    # foundation for 32x32 image input
    image_input = Input(shape=(32, 32, 3), name='image_input')

    # concatenate label embedding and image to produce 129-channel input
    concat = Concatenate(name='concatenate')([image_input, label_embedding])
    assert concat.shape == (None, 32, 32, 4)

    # downsample to 16x16
    conv1 = Conv2D(128, kernel_size=3, strides=2, padding='same', name='conv1')(concat)
    assert conv1.shape == (None, 16, 16, 128)
    conv1 = LeakyReLU(alpha=0.2, name='conv1_leaky_relu')(conv1)
    
    # downsample to 8x8
    conv2 = Conv2D(128, kernel_size=3, strides=2, padding='same', name='conv2')(conv1)
    assert conv2.shape == (None, 8, 8, 128)
    conv2 = LeakyReLU(alpha=0.2, name='conv2_leaky_relu')(conv2)
    
    # downsample to 4x4
    conv3 = Conv2D(128, kernel_size=3, strides=2, padding='same', name='conv3')(conv2)
    assert conv3.shape == (None, 4, 4, 128)
    conv3 = LeakyReLU(alpha=0.2, name='conv3_leaky_relu')(conv3)

    # flatten feature maps
    flat = Flatten(name='flatten')(conv3)
    
    output = Dense(units=1, activation='sigmoid', name='output')(flat)

    model = Model(inputs=[image_input, label_input], outputs=output, name='discriminator')

    return model

### 3.2.3 Training Procedure

In [None]:
class ConditionalDCGAN(Model):
    def __init__(self, generator, discriminator, latent_dim):
        super(ConditionalDCGAN, self).__init__()
        self.generator = generator
        self.discriminator = discriminator
        self.latent_dim = latent_dim

    def compile(self, d_optimizer, g_optimizer, loss_fn):
        super(ConditionalDCGAN, self).compile()
        self.d_optimizer = d_optimizer
        self.g_optimizer = g_optimizer
        self.loss_fn = loss_fn
        self.g_loss_metric = keras.metrics.Mean(name='g_loss')
        self.d_real_loss_metric = keras.metrics.Mean(name='d_real_loss')
        self.d_fake_loss_metric = keras.metrics.Mean(name='d_fake_loss')
        self.d_acc_metric = keras.metrics.BinaryAccuracy(name='d_acc')

    @property
    def metrics(self):
        return [self.g_loss_metric, self.d_real_loss_metric, self.d_fake_loss_metric, self.d_acc_metric]

    def train_step(self, data):
        real_images, class_labels = data
        class_labels = tf.cast(class_labels, 'int32')
        batch_size = tf.shape(real_images)[0]

        # train discriminator
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))

        fake_labels = tf.zeros((batch_size, 1))  # (batch_size, 1)
        real_labels = tf.ones((batch_size, 1))  # (batch_size, 1)

        # freeze generator
        self.discriminator.trainable = True
        self.generator.trainable = False
    
        with tf.GradientTape() as disc_tape:
            disc_tape.watch(self.discriminator.trainable_variables)

            generated_images = self.generator([random_latent_vectors, class_labels], training=True)
            real_output = self.discriminator([real_images, class_labels], training=True)
            fake_output = self.discriminator([generated_images, class_labels], training=True)
            
            d_loss_real = self.loss_fn(real_labels, real_output)
            d_loss_fake = self.loss_fn(fake_labels, fake_output)
            d_loss = d_loss_real + d_loss_fake  # log(D(x)) + log(1 - D(G(z))
        
        disc_grads = disc_tape.gradient(d_loss, self.discriminator.trainable_variables)
        self.d_optimizer.apply_gradients(zip(disc_grads, self.discriminator.trainable_variables))

        # train the generator
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
        misleading_labels = tf.ones((batch_size, 1))

        # freeze discriminator
        self.discriminator.trainable = False
        self.generator.trainable = True

        with tf.GradientTape() as gen_tape:
            gen_tape.watch(self.generator.trainable_variables)

            generated_images = self.generator([random_latent_vectors, class_labels], training=True)
            pred_on_fake = self.discriminator([generated_images, class_labels], training=True)
            
            # negative log probability of the discriminator making the correct choice
            g_loss = self.loss_fn(misleading_labels, pred_on_fake)  # maximize log(D(G(z))) = minimize -log(1 - D(G(z)))
        
        gen_grads = gen_tape.gradient(g_loss, self.generator.trainable_variables)
        self.g_optimizer.apply_gradients(zip(gen_grads, self.generator.trainable_variables))

        # update metrics
        self.g_loss_metric.update_state(g_loss)
        self.d_real_loss_metric.update_state(d_loss_real)
        self.d_fake_loss_metric.update_state(d_loss_fake)
        self.d_acc_metric.update_state(real_labels, real_output)

        return {
            'g_loss': self.g_loss_metric.result(),
            'd_real_loss': self.d_real_loss_metric.result(),
            'd_fake_loss': self.d_fake_loss_metric.result(),
            'd_acc': self.d_acc_metric.result()
        }

We have seen that including label information is not enough to generated plausible due to ...

In an effort to improve the GAN model, I will be trying out various methods stabilize training.

### 4.1 Label Smoothing

### 4.2 Spectral Normilization

### 4.3 Increasing Discriminator Capacity

## 5. Evaluation

## 6. Conclusion

## 7. References

- Shah, A. (2021) MIT 80 million Tiny Images dataset, Kaggle. Available at: https://www.kaggle.com/datasets/aryashah2k/mit-80-million-tiny-images-dataset (Accessed: January 21, 2023).
- Krizhevsky, A., Nair, V. and Hinton, G. (no date) CIFAR-10 and CIFAR-100 datasets. Available at: https://www.cs.toronto.edu/~kriz/cifar.html (Accessed: January 21, 2023).