# **Theory Questions and Answers**

## **Q1: Explain the minimax loss function in GANs and how it ensures competitive training between the generator and discriminator.**

In **Generative Adversarial Networks (GANs)**, the **minimax loss function** defines the adversarial relationship between the **generator (G)** and the **discriminator (D)**. The objective of training a GAN is formulated as:

$$
\min_G \max_D V(G, D) = \mathbb{E}_{x \sim p_{\text{data}}} [\log D(x)] + \mathbb{E}_{z \sim p_z} [\log (1 - D(G(z)))]
$$

This equation represents a **two-player zero-sum game** where:

- The **discriminator (D)** tries to **maximize** its accuracy in distinguishing real images from fake images.
- The **generator (G)** tries to **minimize** the loss by **fooling the discriminator** into classifying its generated images as real.

### **How it Ensures Competitive Training:**
1. **Generator Improvement** – If the discriminator is **too good**, the generator receives strong gradients and learns to produce better images.
2. **Discriminator Improvement** – If the generator improves, the discriminator must become more **sensitive** to subtle differences between real and fake data.
3. **Equilibrium** – Ideally, both networks improve until the generated data distribution becomes **indistinguishable** from the real distribution.

However, in practice, **training instability** is common, requiring modifications like **WGAN, Feature Matching, or Spectral Normalization**.

---

## **Q2: What is mode collapse, why does it occur, and how can it be mitigated?**

### **Mode Collapse:**
Mode collapse occurs when the **generator fails to produce diverse outputs** and instead generates only a **few repeated patterns** to fool the discriminator. This reduces the **variety of generated images**, even if they look realistic.

### **Why Does Mode Collapse Occur?**
- **Generator Exploits Weakness in the Discriminator** – If the generator finds a small set of images that consistently fool the discriminator, it may ignore other possible variations.
- **Poor Gradient Flow** – If the loss function leads to **vanishing gradients**, the generator might struggle to learn a broad range of features.
- **Overfitting to a Few Features** – The generator focuses on a **small subset of the data distribution**, neglecting others.

### **How to Mitigate Mode Collapse:**
 **Use Mini-batch Discrimination** – Encourages diversity by comparing images across batches.  
 **Wasserstein GAN (WGAN-GP)** – Provides smoother gradients, reducing collapse.  
 **Feature Matching Loss** – Instead of only trying to fool the discriminator, the generator is trained to **match feature statistics** of real data.  
 **Entropy Regularization** – Encourages more diverse outputs by penalizing low variation.  

---

## **Q3: Explain the role of the discriminator in adversarial training.**

The **discriminator (D)** is a crucial part of the adversarial process in GANs. It acts as a **binary classifier** that distinguishes between **real** images (from the dataset) and **fake** images (generated by G).

### **Key Functions of the Discriminator:**
1. **Guiding the Generator** – The discriminator provides **feedback** (gradients) to the generator, helping it learn how to produce more realistic images.
2. **Improving Itself** – It continuously **learns to better distinguish** real vs. fake images.
3. **Adversarial Competition** – It competes with the generator, forcing it to improve.
4. **Loss Calculation** – It uses **binary cross-entropy loss** or **Wasserstein loss** to measure how well it classifies images.

### **Discriminator Training Process:**
1. The discriminator is **trained on real images** (label = 1).
2. It is also **trained on fake images** generated by **G** (label = 0).
3. It updates its weights to improve classification accuracy.
4. The **generator uses this feedback** to improve itself.

If the discriminator **becomes too strong**, the generator may struggle to improve. Techniques like **label smoothing** and **gradient penalties** can help balance training.

---

## **Q4: How do metrics like IS and FID evaluate GAN performance?**

Since GANs generate **new data**, traditional metrics like accuracy **don’t work**. Instead, **Inception Score (IS)** and **Fréchet Inception Distance (FID)** are used to evaluate **image quality and diversity**.

### **1. Inception Score (IS)**
- Measures **how realistic and diverse** the generated images are.
- Uses a **pre-trained classifier (InceptionV3)** to predict labels for generated images.
- A **higher IS** means the images are:
   **High quality** (classifier is confident in predictions).  
   **Diverse** (spread across multiple categories).  

$$
IS = \exp \left( \mathbb{E}_x \left[ KL(p(y|x) || p(y)) \right] \right)
$$

**Limitations:**  
- Doesn’t compare against real images.
- Can be **misleading** if the generator produces many distinct but unrealistic images.

---

### **2. Fréchet Inception Distance (FID)**
- Measures **how close the generated images are to real images** in feature space.
- Computes the **distance** between feature distributions of real vs. fake images using InceptionV3.
- A **lower FID score** means:
   The generator produces images closer to real data.  
   The generator captures better texture and structure.  

$$
FID = || \mu_r - \mu_g ||^2 + \text{Tr}(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2})
$$

**Limitations:**  
- Can be biased depending on dataset size.
- Requires a good feature extractor (InceptionV3).


In [None]:
import os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import layers

# 1. Hyperparameters


In [None]:
BATCH_SIZE = 256
NOISE_DIM = 128   # The dimensionality of the random noise vector
EPOCHS = 300
BUFFER_SIZE = 50000  # CIFAR-10 has 50K training images
SAVE_EVERY = 10      # Save generated images every 10 epochs
NUM_EXAMPLES_TO_GENERATE = 16  # Number of images to generate for snapshot


# 2. Load and Preprocess CIFAR-10
##    CIFAR-10 images are 32x32x3.

In [None]:
# Each image is in the range [0,255]. We will normalize to [-1,1].
# This helps the generator learn better.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train = (x_train.astype('float32') - 127.5) / 127.5  # Scale to [-1, 1]

# Create tf.data.Dataset
train_dataset = tf.data.Dataset.from_tensor_slices(x_train)
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


# 3. Generator Model
##    - Input: random noise vector of dimension NOISE_DIM
##    - Output: 32x32x3 image

In [None]:
def make_generator_model():
    model = tf.keras.Sequential(name="Generator")

    # 1) Start with a dense layer to project and reshape.
    model.add(layers.Dense(4*4*512, use_bias=False, input_shape=(NOISE_DIM,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Reshape((4, 4, 512)))

    # 2) Upsampling to 8x8
    model.add(layers.Conv2DTranspose(256, (4, 4), strides=(2, 2), padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    # 3) Upsampling to 16x16
    model.add(layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    # 4) Upsampling to 32x32
    model.add(layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), padding='same', use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    # 5) Final layer: produce 32x32x3, with tanh activation for output in [-1,1].
    model.add(layers.Conv2DTranspose(3, (4, 4), strides=(1, 1), padding='same', use_bias=False,
                                     activation='tanh'))

    return model

# 4. Discriminator Model
##    - Input: 32x32x3 image (either real or fake)
##    - Output: real/fake score (logits)

In [None]:
def make_discriminator_model():
    model = tf.keras.Sequential(name="Discriminator")

    # 1) Downsample: 32x32x3 -> 16x16x64
    model.add(layers.Conv2D(64, (4, 4), strides=(2, 2), padding='same',
                            input_shape=(32, 32, 3)))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))

    # 2) Downsample: 16x16x64 -> 8x8x128
    model.add(layers.Conv2D(128, (4, 4), strides=(2, 2), padding='same'))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))

    # 3) Downsample: 8x8x128 -> 4x4x256
    model.add(layers.Conv2D(256, (4, 4), strides=(2, 2), padding='same'))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))

    # Flatten and final dense for classification
    model.add(layers.Flatten())
    model.add(layers.Dense(1))  # Outputs a single logit

    return model

# 5. Loss Functions & Optimizers

In [None]:
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_disc_loss = real_loss + fake_loss
    return total_disc_loss

def generator_loss(fake_output):
    # We want fake_output to be classified as real, so labels=1.
    return cross_entropy(tf.ones_like(fake_output), fake_output)

generator = make_generator_model()
discriminator = make_discriminator_model()

# Adam optimizer
generator_optimizer = tf.keras.optimizers.Adam(1e-4, beta_1=0.5)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4, beta_1=0.5)

# 6. Image Saving Utility

In [None]:
seed = tf.random.normal([NUM_EXAMPLES_TO_GENERATE, NOISE_DIM])

# Directory to save generated images
if not os.path.exists('generated_images'):
    os.makedirs('generated_images')

def generate_and_save_images(model, epoch, test_input):
    """Generate and save images to disk for visualization."""
    predictions = model(test_input, training=False)

    # Rescale from [-1, 1] -> [0, 1] for display
    predictions = (predictions + 1) / 2.0

    fig = plt.figure(figsize=(4,4))
    for i in range(predictions.shape[0]):
        plt.subplot(4, 4, i+1)
        plt.imshow(predictions[i])
        plt.axis('off')

    plt.tight_layout()
    plt.savefig(f'generated_images/image_at_epoch_{epoch:03d}.png')
    plt.close(fig)

# 7. Single Training Step

In [None]:
@tf.function
def train_step(real_images):
    noise = tf.random.normal([BATCH_SIZE, NOISE_DIM])

    # Record operations for gradient computation
    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)

        real_output = discriminator(real_images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss  = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    # Compute and apply gradients for generator
    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))

    # Compute and apply gradients for discriminator
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator,
                                                discriminator.trainable_variables))

    return gen_loss, disc_loss


# 8. Full Training Loop

In [None]:
def train(dataset, epochs):
    for epoch in range(1, epochs+1):
        gen_loss_list = []
        disc_loss_list = []

        for batch_images in dataset:
            g_loss, d_loss = train_step(batch_images)
            gen_loss_list.append(g_loss)
            disc_loss_list.append(d_loss)

        # Simple tracking of epoch losses
        avg_g_loss = np.mean(gen_loss_list)
        avg_d_loss = np.mean(disc_loss_list)
        print(f"Epoch {epoch}/{epochs} | G Loss: {avg_g_loss:.4f} | D Loss: {avg_d_loss:.4f}")

        # Save images every 'SAVE_EVERY' epochs
        if epoch % SAVE_EVERY == 0:
            generate_and_save_images(generator, epoch, seed)

    # Final save at the end of training
    generate_and_save_images(generator, epochs, seed)


# 9. Kick off training

In [None]:
train(train_dataset, EPOCHS)

Epoch 1/300 | G Loss: 0.9925 | D Loss: 1.1687
Epoch 2/300 | G Loss: 0.7111 | D Loss: 1.3840
Epoch 3/300 | G Loss: 0.7231 | D Loss: 1.3679
Epoch 4/300 | G Loss: 0.7129 | D Loss: 1.3737
Epoch 5/300 | G Loss: 0.7033 | D Loss: 1.3846
Epoch 6/300 | G Loss: 0.7158 | D Loss: 1.3746
Epoch 7/300 | G Loss: 0.7069 | D Loss: 1.3804
Epoch 8/300 | G Loss: 0.7021 | D Loss: 1.3789
Epoch 9/300 | G Loss: 0.7117 | D Loss: 1.3783
Epoch 10/300 | G Loss: 0.6922 | D Loss: 1.3841
Epoch 11/300 | G Loss: 0.7019 | D Loss: 1.3843
Epoch 12/300 | G Loss: 0.7035 | D Loss: 1.3789
Epoch 13/300 | G Loss: 0.7029 | D Loss: 1.3724
Epoch 14/300 | G Loss: 0.7029 | D Loss: 1.3792
Epoch 15/300 | G Loss: 0.7006 | D Loss: 1.3716
Epoch 16/300 | G Loss: 0.7116 | D Loss: 1.3701
Epoch 17/300 | G Loss: 0.6976 | D Loss: 1.3771
Epoch 18/300 | G Loss: 0.7295 | D Loss: 1.3519
Epoch 19/300 | G Loss: 0.7065 | D Loss: 1.3807
Epoch 20/300 | G Loss: 0.7132 | D Loss: 1.3768
Epoch 21/300 | G Loss: 0.7147 | D Loss: 1.3751
Epoch 22/300 | G Loss: