# Generative Adversarial Networks (GAN) using Keras in Python

Link to the Youtube video tutorial: https://www.youtube.com/watch?v=Mng57Tj18pc&list=PLZsOBAyNTZwboR4_xj-n3K6XBTweC4YVD&index=2&t=129s

<br />

**Concept of the Generative Adversarial Networks (GAN) model:**
1) <img src="hidden\photo1.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />
2) <img src="hidden\photo2.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />
    1) When we train the Discriminator (discriminator network), the Generator (generator network) must be fixed (means the parameters [weights and biases] of the generator network are not updated (learned) with the cost function value of each epoch). We do this by only using the discriminator network (not the GAN model) to train the discriminator network.
    2)  When we train the Generator (generator network), the Discriminator (discriminator network) must be fixed (means the parameters [weights and biases] of the discriminator network are not updated (learned) with the cost function value of each epoch). We do this by setting the discriminator network as not trainable when creating the GAN model, then we train the generator network by using the GAN model.
3) <img src="hidden\photo3.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />
4) <img src="hidden\photo4.png" alt="This image is a representation of the simple neural network" style="width: 400px;"/>  <br />

In [8]:
from keras.datasets import mnist # MNIST is the handwritten digit dataset, containing class (handwritten digit) 0 to 9. Each handwritten digit image has dimension of (28,28,1).
from keras.layers import Input, Dense, Reshape, Flatten
from keras.layers import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Sequential, Model
from keras.optimizers import adam_v2
import matplotlib.pyplot as plt
import numpy as np
import os
import datetime


# Define the image dimensions/shape involved in the GAN model

In [9]:
# Define the image dimensions/shape (which is the generator network output layer size & the discriminator network input layer size). We should define the image dimensions same as the dimensions of the real images (from the dataset)
# Large images take too much time and resources.
img_rows = 28 # the dimension of the real images (from the dataset)
img_cols = 28 # the dimension of the real images (from the dataset)
channels = 1 # grayscale, the dimension of the real images (from the dataset)
img_shape = (img_rows, img_cols, channels) # so, the image shape we defined has dimension of 28 x 28 pixels, in grayscale. Equivalent to (28,28,1).


# Define the function to group the defined layers for the generator network into an object with training/inference features

1) The generator network here consists of only 3 hidden layers as dense layer. (number of neurons in 2nd hidden (dense) layer) > (number of neurons in 1st hidden (dense) layer); (number of neurons in 3rd hidden (dense) layer) > (number of neurons in 2nd hidden (dense) layer)       
2) In short, for the generator network in a GAN model:
    1) The further the hidden (dense) layer, the larger the number of neurons in the hidden (dense) layer
    2) It accepts 1D array of noise/latent vector (seed) as the input, then it transforms the input into larger 1D array and reshape the generated larger 1D array into 2D array (so that an image is formed). You can treate element in the 1D array as a feature.
3) We don't use compile() to compile the generator network because we train the generator network by using the GAN model (not the generator network itself only).


In [10]:
# Given input of noise (latent) vector, the Generator produces an image.
def build_generator():

    noise_shape = (100,) #1D array of size 100 (latent vector / noise)

# Define your generator network (model)
# Here we are only using Dense layers. But network can be complicated based on the application. For example, you can use VGG for super res. GAN. 
# Usually, dense layer is used for hidden layer & output layer of a neural network.
# The generator network here consists of only 3 hidden layers as dense layer. (number of neurons in 2nd hidden (dense) layer) > (number of neurons in 1st hidden (dense) layer); (number of neurons in 3rd hidden (dense) layer) > (number of neurons in 2nd hidden (dense) layer)       
# In short, for a generator network in GAN:
# 1) The further the hidden (dense) layer, the larger the number of neurons in the hidden (dense) layer
# 2) It accepts 1D array of noise/latent vector (seed) as the input, then it transforms the input into larger 1D array and reshape the generated larger 1D array into 2D array (so that an image is formed). You can treate element in the 1D array as a feature.

    # Define the structure of the generator network:
    model = Sequential() # the generator network is built using sequential model

    model.add(Dense(256, input_shape=noise_shape)) # The 1st hidden (dense) layer of the generator network (2nd layer of the generator network).
    model.add(LeakyReLU(alpha=0.2)) # alpha is a small positive value (typically set to 0.01 by default) that allows a small, non-zero gradient when the input is negative. This helps to mitigate the “dying ReLU” problem, where neurons can sometimes get stuck during training and stop learning
    model.add(BatchNormalization(momentum=0.8)) # momentum refers to how fast it actually trains
    model.add(Dense(512)) # The 2nd hidden (dense) layer of the generator network 
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Dense(1024)) # The 3rd hidden (dense) layer of the generator network
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    
    model.add(Dense(np.prod(img_shape), activation='tanh')) # The output (dense) layer of the generator network. The number of neurons in the output layer = np.prod(img_shape). So this output layer will produce a 1D array/vector with dimension of np.prod(img_shape) (means the output vector consists of np.prod(img_shape) elements)
    model.add(Reshape(img_shape)) # The reshape layer, that reshapes the output vector (the 1D array containing np.prod(img_shape) elements) into a vector of 2D array that has dimension of img_shape elements

    model.summary() # print the summary of the generator network

    noise = Input(shape=noise_shape) # The input layer of the generator network (the 1st layer of the generator network). The input layer that has noise_shape number of neurons (so that each neuron in the input layer accepts 1 element/feature of the noise/latent vector). Since we are using the older Keras version to build the model, we need to explicitly define the input layer using Input(). Else, the model summary cannot be executed. Reference: https://enjoymachinelearning.com/blog/keras-input-shape/
    img = model(noise) # The variable img stores a generated image (fake data), produced by the generator network. model(x) means use the defined model (generator network) to produce the output, by taking x as the input.

    return Model(noise, img) # Grouping the layers defined for the generator network into an object with training/inference features), by specifing the variable noise as its input and the variable img as its output.


# Alpha — α is a hyperparameter which controls the underlying value to which the function saturates negatives network inputs.
# Momentum — Speed up the training

# Define the function to group the defined layers for the discriminator network into an object with training/inference features

1) The discriminator network here consists of only 2 hidden layers as dense layer. (number of neurons in 2nd hidden (dense) layer) > (number of neurons in 1st hidden (dense) layer)
2) In short, for the discriminator network in a GAN model:
    1) The further the hidden (dense) layer, the smaller the number of neurons in the hidden (dense) layer
    2) It accepts 1D array of noise/latent vector (seed) as the input, then it transforms the input into larger 1D array and reshape the generated larger 1D array into 2D array (so that an image is formed). You can treate element in the 1D array as a feature.


In [11]:
# Given an input image, the Discriminator (discriminator network) outputs the likelihood of the image being real.
# Binary classification - true or false (we're calling it validity)
# A discriminator is just like a binary classifier

def build_discriminator():

# Define your discriminator network (model)
# Here we are only using Dense layers. 
# Usually, dense layer is used for hidden layer & output layer of a neural network.
# The discriminator network here consists of only 2 hidden layers as dense layer. (number of neurons in 2nd hidden (dense) layer) > (number of neurons in 1st hidden (dense) layer)
# In short, for a discriminator network in GAN:
# 1) The further the hidden (dense) layer, the smaller the number of neurons in the hidden (dense) layer
# 2) It accepts 1D array of noise/latent vector (seed) as the input, then it transforms the input into larger 1D array and reshape the generated larger 1D array into 2D array (so that an image is formed). You can treate element in the 1D array as a feature.

    # Define the structure of the discriminator network:
    model = Sequential() # the discriminator network is built using sequential model

    model.add(Flatten(input_shape=img_shape)) # The flatten layer (2nd layer of the discriminator network). Since the discriminator network receives a 2D image as the input, we need this flatten layer to flatten the 2D image (features/pixel in 2D) into 1D array (features/pixel in 1D)
    model.add(Dense(512)) # The 1st hidden (dense) layer of the discriminator network (3rd layer of the discriminator network).
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dense(256)) # The 2nd hidden (dense) layer of the discriminator network.
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dense(1, activation='sigmoid')) # The output (dense) layer of the discriminator network. The number of neurons in the output layer is only 1, because the output of the discriminator network is to tell us if the input image is a real or fake image/data (binary /single class classification problem). So this output layer will produce only a value. The sigmoid activation function is used to convert the output value into a probability score (between 0 and 1).
    model.summary() # print the summary of the discriminator network

    img = Input(shape=img_shape) # The input layer of the discriminator network (the 1st layer of the discriminator network). The input layer accepts an input that has dimension of img_shape.
    validity = model(img) # The variable validity stores the probability score regarding if the input image is a real or fake data, produced by the discriminator network. model(x) means use the defined model (generator network) to produce the output, by taking x as the input.

    return Model(img, validity)  # Grouping the layers defined for the discriminator network into an object with training/inference features, by specifing the variable img as its input and the variable validity as its output.

# The validity is the Discriminator’s guess of input being real or not (in the form of probability score).

# Define the function to perform the GAN model training

1) **GAN Model training procedure:**
    1) Set the humber of epochs for training, the batch size (the number of images involved in the training at each epoch), the epoch interval to save the images generated by the generator network (the saved images are generated for visualization purpose only, not involved in the training)
    2) For each  epoch:
        1) Part 1: Train Discriminator (Use the discriminator network only)
            1) Use the real and fake (generated) images separately to train the discriminator network to only recognize real images as real images (not fooled by the fake images). At each epoch, we randomly select half batch numbers of the real images from the dataset & generate half batch numbers of random noise/latent vector, before the discriminator network training is started.
            2) The discriminator network accepts real and fake (generated) images as the input, and
            3) provides the probability scores of the given images being the real images as the output
            4) Calculate the cost function value (the total loss of each epoch), then use the cost function value to learn (update) the parameters (weights and biases) of the discriminator network only that allows the discriminator network to only recognize real images as real images (not fooled by the fake images)
        2) Part 2: Train Generator (Use the GAN model [generator network + discriminator network])
            1) Use the real and fake (generated) images separately to train the discriminator network to only recognize real images as real images (not fooled by the fake images)
            2) The GAN model accepts a batch (group) of randomly generated noise vectors as the input (actually it is the generator network accepts the randomly generated noise vectors as the input)
            3) The generator network generates a batch of images (fake images) as its output (For each noise vector, the generator network generates 1 image (fake image))
            4) The discriminator network then accepts that batch of images (fake images) generated by the generator network as its input. At the same time, the discriminator network is set to not trainable (so the weights and biases of the discriminator network will not be affected by the cost function value of this epoch at this stage)
            5) The discriminator network then provides the probability scores of the given images being the real images as the output (as the output of the GAN model)
            6) Calculate the cost function value (the total loss of each epoch), then use the cost function value to learn (update) the parameters (weights and biases) of the generator network only. So when more and more epochs are conducted, the generator network becomes more capable to generate fake images that are similar to the real images recognized by the discriminator network [at least causing the discriminator network to recognize the fake images as real images at high probability]
    
       
2) **VERY IMPORTANT UNDERSTANDING OF GAN TRAINING CONCEPT:**
    1) At each epoch of training (especially at the 1st epoch), the key is to train the discriminator network to recognize real images using the real images first. 
    2) After the discriminator network is completely trained to recognize real images using the real images, the discriminator network is trained to recognize the real images using the fake images. This is because after the discriminator network has been trained to recognize the real images using the real images to some extent (at this moment, the discriminator network knows nothing exists in its world except the real images, and its learned weights and biases can tell if a given image is a real image (at this stage, when the discriminator network is provided with real images, it provides the probability scores of the given images being the real images > 0.5)), when the discriminator network is trained to recognize the real images with the fake images, the discriminator network can tell the given fake images are not real images (provide the probability scores of the given images being the real images < 0.5). 
    3) Since we assume the ground truth of a real image is a value of 1, when the discriminator network is firstly trained to recognize the real images with real images (remember that the definition of neural network training is to update the neural network's weights so that the training loss reduces), the discriminator network will provide the probability score of the given image being a real image > 0.5. So now when we want to keep training the discriminator network to recognize the real images, we can train the discriminator network to recognize the real images by using the fake images and assume the ground truth of a fake image (adversarial to real image) is a value of 0 (adversarial to 1). 
    4) Since before this the discriminator network already trained to recognize the real images using the real images to some extent, when it is given with the fake images in the training, its learned weights and biases will produce probability score of the given image being a real image < 0.5 (adversarial to > 0.5). 
    5) Since the learning of a neural network is related to the gradient descent (which is also related to the partial derivative of the cost function with respect to each weight and bias), the cost function must be kept as small as possible (no matter the discriminator network is trained to recognize real images with real images or fake images) so that the discriminator network training will keep proceed in the same direction (to recognize real images better and better). 
    6) Hence: 
        1) The ground truth of a real image is assumed as a value of 1 + the probabilty score provided by the discriminator network for a real image is closer to 1 makes the cost function value tends to be smaller in the training with real images; 
        2) The ground truth of a fake image is assumed as a value of 0 + the probabilty score provided by the discriminator network for a fake image is closer to 0 makes the cost function value tends to be smaller in the training with fake images. 
    7) And that's why the discriminator network is trained in the same direction (to recognize the real images better and better) when its trainings are involving real images and fake images separately. 
    8) Since now the discriminator network has been trained with the ability to tell a given fake image is not a real image (can differentiate), the generator network can proceed with its training (to learn the ability to generate fake images that are similar to the real images recognized by the discriminator network) by generating fake images and see if the discriminator network will be fooled (recognize the fake images as the real images). 
    9) Since in the generator network training, the ground truth takes the ground truth of real image (value of 1). So at each epoch, the many the generated (fake) images can fool the discriminator network, the many times the discriminator network will provide probability score closer to 1 in that epoch, and causing the cost function value of that epoch tends to become smaller. 
    10) Remember that (the definition of neural network training is to update the neural network's weights so that the training loss reduces) + (the discriminator network will provide the probability score of the given image being a real image > 0.5 [even if it is fooled by a fake image] and causes the cost function value to tend to be smaller), so actually the generator network is also trained in the same direction as the discriminator network (to learn the ability [weights and biases] to generate fake images that are similar to the real images recognized by the discriminator network [at least causing the discriminator network to recognize the fake images as real images at high probability]). 
    11) In summary, the discriminator network can be treated as a level/challenge for the generator network. If we want to train a generator network to able to generate very realistic images (the data we want) well, we must first train the discriminator network to able to recognize the realistic images (the data we want) well.
            

In [12]:
# Now that we have constructed our two models (generator network and discriminator network), it’s time to pit them against each other.
# We do this by defining a training function, loading the dataset, rescaling the dataset (our training images) and setting the ground truths. 
def train(epochs, batch_size=128, save_interval=50): # epochs refers to the training epochs, batch_size refers to the number of images/samples we used to train the GAN in each, save_interval means we save generated image samples at every save_interval numbers of epoch 

    # Load the dataset (real images/data)
    (X_train, _), (_, _) = mnist.load_data() # We only get the train set of the dataset

    # Convert each feature (pixel value) of each image into a float number and Rescale it into the range from -1 to 1 [by dividing with 255/2=127.5, where 255 is the maximum pixel value] (Can also rescale it into the range from 0 to 1 [by dividing with 255, where 255 is the maximum pixel value])
    X_train = (X_train.astype(np.float32) - 127.5) / 127.5

# Add a new dimension to the variable X_train as the channel dimension (because the size of the variable X_train is (6000,28,28) [6000 images, each image has dimension of 28 x 28 pixels, without having a dimension for color channel] but the output of the generator network and input of the discriminator network have size of (28,28,1), this means the variable X_train lack a dimension for the channel, so we manually add that dimension to the variable X_train)
    X_train = np.expand_dims(X_train, axis=3) 

    half_batch = int(batch_size / 2) # half of the batch containing the real images & reserve another half of the batch to store the fake data (images generated by the generator network) for training

# We then loop through a number of epochs to train the discriminator network by first selecting a random batch of images from our true dataset, generating a set of images from our
#Generator, feeding both set of images into our Discriminator, and finally setting the
#loss parameters for both the real and fake images, as well as the combined loss. 
    
    for epoch in range(epochs):

        # ---------------------
        #  Part 1) Train Discriminator (Use the discriminator network only)
        #  a) Use the real and fake (generated) images separately to train the discriminator network to only recognize real images as real images (not fooled by the fake images)
        #  b) The discriminator network accepts real and fake (generated) images as the input, and
        #  c) provides the probability scores of the given images being the real images as the output
        #  d) Calculate the cost function value (the total loss of each epoch), then use the cost function value to learn (update) the parameters (weights and biases) of the discriminator network only that allows the discriminator network to only recognize real images as real images (not fooled by the fake images)
        # ---------------------

        # Select a random half batch of real images, by randomly generating half_batch (size) numbers of integers between the range from 0 (low) to X_train.shape[0] (high). Each randomly generated interger by the function represents the index of the image/sample in the variable X_train. The randomly generated indices for the real images (images in the variable X_train) are stored in the variable idx. Since each integer is randomly generated between the given range, it is possible to have multiple same integer (index of an image/sample) in idx, so that it is possible to use multiple same real images for training.
        idx = np.random.randint(0, X_train.shape[0], half_batch) # Syntax -> p.random.randint(low, high, size)
        imgs = X_train[idx] # Load the features of the real images, whose indices are available in the variable idx, into the variable imgs. In other words, the variable imgs stores the features of the real images that will be used for training.

        # Generate half_batch number of noise vector (samples) [this means the variable noise contains half_batch numbers of noise vector], each noise vector has 100 features/elements. Each feature/element is a random number drawned from a normal distribution with centre(mu) and scale(sigma).
        noise = np.random.normal(0, 1, (half_batch, 100)) # Syntax -> np.random.normal(mu, sigma, size )

        # Generate a half batch of fake images. This is done by providing the noise vectors to the generator network called generator to predict (generate) images [fake images] (Each noise vector (sample) in the variable noise is also called as noise seed [here, the variable noise has half_batch numbers of noise seeds (samples/rows)], each noise seed having 100 features/elements. The generator network predict (generate) 1 image [fake image] for each noise seed)
        gen_imgs = generator.predict(noise)

        # Train the discriminator on real and fake images, separately
        # Research showed that separate training is more effective. 
        d_loss_real = discriminator.train_on_batch(imgs, np.ones((half_batch, 1)))
            # Information of train_on_batch():
            #   1) According to Keras documentation, train_on_batch() runs a single gradient update on a single batch of data, then returns a scalar loss value. This means train_on_batch() runs the model training using the features and target (ground truth) of a batch of samples (images), followed by automatically calculating the loss of the training, then update the weights of the model, and finally return the calculated training loss in one-shot.
            #   2) According to StackExchange, train_on_batch() allows you to expressly update the weights of your model based on a collection of samples you provide, without regard to any fixed batch size. You could use train_on_batch() to directly update the weights of the existing model only on those samples you provide. Reference: https://stackoverflow.com/questions/49100556/what-is-the-use-of-train-on-batch-in-keras
            # np.ones((half_batch, 1)) creates a 1D vector storing the label (ground truth) of each real image. The label (ground truth) of a real image is a value of 1. This means np.ones((half_batch, 1)) creates a 1D vector that has half_batch rows of value 1. 
            # For each real image, train_on_batch() uses the discriminator network called discriminator to generate a probability score of that image as a real image, then use the generated probability score and the ground truth (value of 1) to calculate the loss.
            # Hence, the variable d_loss_real stores the loss of each real image (this means d_loss_real has size of (half_batch, 1))
            # ** VERY IMPORTANT UNDERSTANDING OF GAN TRAINING CONCEPT:**
            # --> At each epoch of training (especially at the 1st epoch), the key is to train the discriminator network to recognize real images first. After the discriminator network is completely trained to recognize real images, the discriminator network is trained to recognize the fake images. This is because after the discriminator network has been trained to recognize the real images to some extent (at this moment, the discriminator network knows nothing exists in its world except the real images, and its learned weights and biases can tell if a given image is a real image (at this stage, when the discriminator network is provided with real images, it provides the probability scores of the given images being the real images > 0.5)), when the discriminator network is trained with the fake images, the discriminator network can tell the given fake images are not real images (provide the probability scores of the given images being the real images < 0.5). Since we assume the ground truth of a real image is a value of 1, when the discriminator network is firstly trained with real images (remember that the definition of neural network training is to update the neural network's weights so that the training loss reduces), the discriminator network will provide the probability score of the given image being a real image > 0.5. So now when we want to keep training the discriminator network to recognize the real images, we can train the discriminator network by using the fake images and assume the ground truth of a fake image (adversarial to real image) is a value of 0 (adversarial to 1). Since before this the discriminator network already trained to recognize the real images to some extent, when it is given with the fake images in the training, its learned weights and biases will produce probability score of the given image being a real image < 0.5 (adversarial to > 0.5). Since the learning of a neural network is related to the gradient descent (which also related to the derivative of the cost function with respect to each weight and bias), the cost function must be kept as small as possible (no matter the discriminator network is trained to recognize real images with real images or fake images) so that the discriminator network training will keep proceed in the same direction (to recognize real images better and better). Hence, the ground truth of a real image is assumed as a value of 1 + the probabilty score provided by the discriminator network for a real image is closer to 1 makes the cost function value tends to be smaller in the training with real images; the ground truth of a fake image is assumed as a value of 0 + the probabilty score provided by the discriminator network for a fake image is closer to 0 makes the cost function value tends to be smaller in the training with fake images. And that's why the discriminator network is trained in the same direction (to recognize the real images better and better) when its trainings are involving real images and fake images separately. Since now the discriminator network has been trained with the ability to tell a given fake image is not a real image (can differentiate), the generator network can proceed with its training (to learn the ability to generate fake images that similar to the real images) by generating fake images and see if the discriminator network will be fooled (recognize a fake image as a real image). Since in the generator network training, the ground truth takes the ground truth of real image (value of 1). So at each epoch, the many the generated (fake) images can fool the discriminator network, the many times the discriminator network will provide probability score of 1 in that epoch, and the cost function value of that epoch tends to become smaller. Remember that (the definition of neural network training is to update the neural network's weights so that the training loss reduces) + (the discriminator network will provide the probability score of the given image being a real image > 0.5 [even if it is fooled by a fake image] and causes the cost function value to tend to be smaller), so actually the generator network is also trained in the same direction as the discriminator network (to learn the ability [weights and biases] to generate fake images that are similar to the real images recognized by the discriminator network [at least causing the discriminator network to recognize the fake images as real images at high probability]). In summary, the discriminator network can be treated as a level/challenge for the generator network. If we want to train a generator network to able to generate very realistic images (the data we want) well, we must first train the discriminator network to able to recognize the realistic images (the data we want) well.
            
        d_loss_fake = discriminator.train_on_batch(gen_imgs, np.zeros((half_batch, 1)))
            # np.zeros((half_batch, 1)) creates a 1D vector storing the label (ground truth) of each fake image. The label (ground truth) of a fake image is a value of 0. This means np.zeros((half_batch, 1)) creates a 1D vector that has half_batch rows of value 0. 
            # For each fake image, train_on_batch() uses the discriminator network called discriminator to generate a probability score of that image as a fake image, then use the generated probability score and the ground truth (value of 0) to calculate the loss.
            # Hence, the variable d_loss_fake stores the loss of each fake image (this means d_loss_fake has size of (half_batch, 1))

        # Take average loss from real and fake images and store it in the variable d_loss. This means the variable d_loss stores a value that represents the loss of the discriminator training in this epoch.
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) 

# And within the same loop we train our Generator, by setting the input noise and
# ultimately training the Generator to have the Discriminator label its samples as valid
# by specifying the gradient loss.
        # ---------------------
        #  Part 2: Train Generator (Use the GAN model [generator network + discriminator network])
        #  a) Use the real and fake (generated) images separately to train the discriminator network to only recognize real images as real images (not fooled by the fake images)
        #  b) The GAN model accepts a batch (group) of randomly generated noise vectors as the input (actually it is the generator network accepts the randomly generated noise vectors as the input)
        #  c) The generator network generates a batch of images (fake images) as its output (For each noise vector, the generator network generates 1 image (fake image))
        #  d) The discriminator network then accepts that batch of images (fake images) generated by the generator network as its input. At the same time, the discriminator network is set to not trainable (so the weights and biases of the discriminator network will not be affected by the cost function value of this epoch at this stage)
        #  e) The discriminator network then provides the probability scores of the given images being the real images as the output (as the output of the GAN model)
        #  f) Calculate the cost function value (the total loss of each epoch), then use the cost function value to learn (update) the parameters (weights and biases) of the generator network only. So when more and more epochs are conducted, the generator network becomes more capable to generate fake images that are similar to the real images recognized by the discriminator network [at least causing the discriminator network to recognize the fake images as real images at high probability]
        # ---------------------

# Create noise vectors as input for generator. 
# Create as many noise vectors as defined by the batch size. 
# Based on normal distribution. Output will be of size (batch size, 100)
        noise = np.random.normal(0, 1, (batch_size, 100)) # This means here randomly generate half_batch number of noise vector (samples) again.

        # The generator wants the discriminator to label the generated samples
        # as valid/real (ones)
        # This is where the generator is trying to trick discriminator into believing
        # the generated image is true/real/valid (hence value of 1 for y)
        valid_y = np.array([1] * batch_size) # Creates an array of all 1 of size=batch size (means creates a column of value=ground truth of a real image), to trick the discriminator to believe each generated/fake image is a real image

        # Generator is part of combined where it got directly linked with the discriminator
        # Train the generator with noise as x and 1 as y. 
        # Again, 1 as the output as it is adversarial and if generator did a great
        # job of folling the discriminator then the output would be 1 (true)
        g_loss = combined.train_on_batch(noise, valid_y) # since we used the model called combined to get the g_loss, only the parameters (weights and biases) of the generator network will be updated (learned) with the cost function value of each epoch (the discriminator network is set as not trainable in the combined model)
            # Similar to the concept of training an autoencoder, we need to pass the input data to the autoencoder (encoder + decoder) in order to train its encoder network (the decoder provides its output by taking the encoder's output as its input, then use the decoder's output and the ground truth of the sample to calculate the loss for that sample. Then the calculated loss will be used to update the weights and biases of the encoder and decoder networks)
            # Analogously, we need to pass the input data to the GAN (generator + discriminator) in order to train its generator network (the discriminator network provides its output (probability score of the image as a real image) by taking the generator network's output (generated image) as its input, then use the discriminator's output (probability score of the image as a real image) and the ground truth of a real image (to trick the discriminator network to believe that generated image is a real image) to calculate the loss for that generated image. Then the calculated loss will be used to update the weights and biases of the generator network only)).
            # Hence, in an training epoch [after comnpletely passing batch_size numbers of images into the network represents 1 epoch completed], if the discriminator network:
            # 1) knows the generated image is not a real image (probability score < 0.5), then the loss of that sample is larger, causing the total loss of that training epoch to increase, causing the weights and biases of the generator network to update (learn) more. 
            # 2) don't knows the generated image is not a real image [means the discriminator is tricked to believe that generated image is a real image] (probability score >= 0.5), then the loss of that sample is smaller, causing the total loss of that training epoch to decrease, causing the weights and biases of the generator network to update (learn) less. 

# Additionally, in order for us to keep track of our training process, we print the
# progress and save the sample image output depending on the epoch interval specified.  

        # Plot the progress
        print ("Epoch %d: [Discriminator loss= %f, Discriminator accuracy= %.2f%%] [Generator loss= %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))

        # If at save interval => save generated image samples
        if epoch % save_interval == 0:
            save_imgs(epoch)




# Define the function to save the images generated by the generator network at every epoch interval
The images (fake images) generated are for visualization purpose (for user to see the ability of the generator network in generating images) only. They are not involved in the training (updating the weights and biases of both generator and discriminator networks)

In [13]:
# when the specific sample_interval is hit, we call the sample_image function. Which looks as follows. 
# Summary, this save_imgs() will ask the generator network of the GAN to generate 25 images (fake images). Each fake image is generated based on a randomly generated noise vector. Hence, at a particular epoch, we can get a sense of how good are the parameters (weights and biases) of the generator network updated, by visualizing the 25 images the generator networks generated at that epoch.

def save_imgs(epoch):
    r, c = 5, 5
    noise = np.random.normal(0, 1, (r * c, 100)) # This means here randomly generate half_batch number of noise vector (samples) again.
    gen_imgs = generator.predict(noise) # With the latest updated/learned parameters (weights and biases), the generator network generates 1 image (fake image) for each randomly created noise vector stored in the variable noise. So eventually the generator network will generate 25 images.

    # Rescale images 0 - 1
    gen_imgs = 0.5 * gen_imgs + 0.5

    # Create the directory (folder) if the directory is not exist. If the directory exists, just skip it (nothing happens).
    os.makedirs('D:/AI_Master_New/Under_Local_Git_Covered/Deep_Learning_Tutorials_codebasics/Generative_Adversarial_Network_GAN/GAN_DigitalSreeni/images', exist_ok = True)

    # Create a 5x5 subplot that shows 25 images in one-shot
    fig, axs = plt.subplots(r, c)
    cnt = 0
    for i in range(r):
        for j in range(c):
            axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
            axs[i,j].axis('off')
            cnt += 1
    time_now  = datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S') # Get the current date and time right before saving the created 5x5 subplot
    fig.savefig('images/%s_mnist_Epoch%d.png' % (time_now, epoch)) # Save the created 5x5 subplot at the directory
    plt.close()
#This function saves our images for us to view

# Develop the GAN, by combining the generator network and discriminator network as one

1) This is the main function block of this script
2) Here, we have 4 parts:
    1) Part 1: Discriminator network 
        1) Build the discriminator network (call build_discriminator() to group the defined layers into an object with training/inference features)
        2) Compile the discriminator network (This finalizes the model, freezes all its settings, and prepares it to meet some data)
    2) Part 2: Generator network 
        1) Build the generator network (call build_generator() to group the defined layers into an object with training/inference features)
        2) Compile the generator network (This finalizes the model, freezes all its settings, and prepares it to meet some data)
    3) Part 3: GAN model
        1) Define the structure of the GAN model. GAN model only has 4 layers:
            1) Layer 1: The input layer (the variable z)
            2) Layer 2: The generator network
            3) Layer 3: The discriminator network
            4) Layer 4: The output layer (the variable valid)
        2) Build the GAN model (call Model() to group the defined layers into an object with training/inference features)
        3) Compile the GAN model (This finalizes the model, freezes all its settings, and prepares it to meet some data)
    4) Part 4: Perform GAN model training
    5) Part 5: After the GAN model traning is completed, save the trained generator network of the GAN model


In [14]:
########################Main function block of this script##########################################

# Let us also define our optimizer for easy use later on. That way if you change your mind, you can change it easily here
optimizer = adam_v2.Adam(0.0002, 0.5)  #Learning rate and momentum.

#**********************Before combining the generator and discriminator networks**********************
#-------------------------Discriminator Network Part-------------------------
# Build and compile the discriminator first:
# Extra information:
#   1) Generator will be trained as part of the combined model, later. 
#   2) Pick the loss function and the type of metric to keep track.                 
#   3) We choose the binary cross entropy as the cost function because we are doing prediction on binary problem (O-> fake image or 1-> real image) and it is a better
#   loss function compared to MSE or other for our tasks. 

# Build a build_discriminator object called discriminator (the discriminator network), using the network structure defined in the build_discriminator().
discriminator = build_discriminator() 
# Compile the discriminator network
discriminator.compile(loss='binary_crossentropy',
    optimizer=optimizer,
    metrics=['accuracy'])

#-------------------------Generator Network Part-------------------------
# Build and compile our generator, pick the loss function:
# Extra information:
#   1) Since we are only generating (faking) images, let us not track any metrics.

# Build a build_generator object called generator (the generator network), using the network structure defined in the build_generator().
generator = build_generator() 
# Compile the generator network
generator.compile(loss='binary_crossentropy', optimizer=optimizer)


#**********************Combining the generator and discriminator networks as a GAN model**********************
#-------------------------GAN model Part-------------------------
# In a GAN, the Generator network takes noise z as an input to produce its images.  
z = Input(shape=(100,))   # The input layer of the GAN model. Define the variable z is the input to the GAN (also means the variable z is the input to the generator network in the GAN)
img = generator(z) # call generator() to use generator network to generate 1 image (fake image) per noise vector in the variable z

# This ensures that when we combine our generator and discriminator networks, we only train the Generator.
# While generator training we do not want discriminator weights to be adjusted. 
# This Doesn't affect the above descriminator training.     
discriminator.trainable = False  

# This specifies that our Discriminator will take the images generated by our Generator
# and true dataset and set its output to a parameter called valid, which will indicate
# whether the input is real (real image) or not (fake image).  
valid = discriminator(img)  # The output layer of the GAN model. Call discriminator() to use discriminator network to perform validity check on all the generated images stored in the variable img


# Here we combined the generator and discriminator models as a GAN model, and also set our loss function and optimizer for the GAN model. 
# Again, we are only training the generator here. 
# The ultimate goal here is for the Generator to fool the Discriminator.  
# The combined model, also known as GAN model  (stacked generator and discriminator) takes
# noise as input => generates images => determines validity

# Build a GAN model called combined. This GAN model accepts the variable z as the input, then provides the variable valid as the output.
combined = Model(z, valid)
# Compile the GAN model
combined.compile(loss='binary_crossentropy', optimizer=optimizer)


#**********************Perform GAN model training**********************
# Perform GAN model training, with epochs numbers of epochs. At each epoch, batch_size numbers of images are used. At every epoch interval of save_interval, use the generator network to generate extra 25 images (fake images) (based on the randomly generated noise vectors, 1 image per noise vector) with the latest weigths and biases at that time (for visualization purpose only, not involved in the training), followed by making the 25 generated images into a subplot then save to the local device.
train(epochs=5001, batch_size=32, save_interval=500)


#**********************Save the trained generator network**********************
#Save model for future use to generate fake images
#Not tested yet... make sure right model is being saved..
#Compare with GAN4

# Save the trained generator network (means save the weights and biases of the generator network as a file, so that we can directly use the generator network in other applications)
generator.save('generator_model.h5')  


#Test the model on GAN4_predict...
#Change epochs back to 30K
                
#Epochs dictate the number of backward and forward propagations, the batch_size
#indicates the number of training samples per backward/forward propagation, and the
#sample_interval specifies after how many epochs we call our sample_image function.

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 512)               401920    
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 512)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 256)               131328    
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 256)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 257       
Total params: 533,505
Trainable params: 533,505
Non-trainable params: 0
________________________________________________