Code adapted from [wiseodd on GitHub](https://github.com/wiseodd/generative-models/blob/master/GAN/vanilla_gan/gan_tensorflow.py).

## GANs vs CGANs

I'm going to show the implementation of a CGAN here, because they're almost the same as GANs but a little bit cooler. The only different is in our input. With a GAN, the generator takes in just a vector of noise. In a CGAN, the generator and the descriminator also take in a one-hot vector of the digit they're able to generate. The GAN is trying to learn the distribution of digits, so it makes sense that they'd learn better if we tell them the distinctions between digits, each of which are going to have their own distribution.

## Setup

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os

# Fetch MNIST Dataset using the supplied Tensorflow Utility Function
mnist = input_data.read_data_sets("data/MNIST_data/", one_hot=True)

Extracting data/MNIST_data/train-images-idx3-ubyte.gz
Extracting data/MNIST_data/train-labels-idx1-ubyte.gz
Extracting data/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting data/MNIST_data/t10k-labels-idx1-ubyte.gz


## Building the Networks

One new thing here is Xavier initialization, which isn't specific to GAN's but the original author of this code decided to use it. There's a great [blog post](http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization) about it that explains the statistics behind it, but the main idea is that it's a way of initializing variable weights such that values don't blow up or disappear as they flow through the network. In TensorFlow, it's implemented as pulling values from a uniform distribution $W$ with $$ \operatorname{Var}(W) = \frac{2}{n_\text{in} + n_\text{out}} $$ where $n_\text{in}$ is the number of inputs into the neuron and $n_\text{out}$ is the number out.

### Generator

Our generator for this implementation is just a simple feed-forward network taking in a $1 \times 100$ vector of noise  concatted to a 1x10 one-hot vector, and transforming it into a $1 \times 784$ MNIST image, with a $1 \times 128$ ReLU hidden layer in the middle. The output of the last layer is run through a sigmoid to map the logits to a valid pixel intensity between 0 and 1. The exact formula is $$\operatorname{sigmoid}(x) = \frac{1}{1 + \exp(-x)}$$

In [2]:
# The size of the noise vector
NOISE_SIZE = 100
HIDDEN_SIZE = 128
IMAGE_SIZE = 28*28
N_DIGITS = 10

# The input vector of noise
Z = tf.placeholder(tf.float32, shape=[None, NOISE_SIZE])
Y = tf.placeholder(tf.float32, shape=[None, N_DIGITS])
# 1st layer's weights and bias
G_W1 = tf.get_variable('G_W1', shape=[NOISE_SIZE + N_DIGITS, HIDDEN_SIZE],
                       initializer=tf.contrib.layers.xavier_initializer())
G_b1 = tf.get_variable('G_b1', shape=[HIDDEN_SIZE],
                       initializer=tf.zeros_initializer())

# 2nd layer's weights and bias
G_W2 = tf.get_variable('G_W2', shape=[HIDDEN_SIZE, IMAGE_SIZE],
                       initializer=tf.contrib.layers.xavier_initializer())
G_b2 = tf.get_variable('G_b2', shape=[IMAGE_SIZE],
                       initializer=tf.zeros_initializer())

# The trainable generator variables
theta_G = [G_W1, G_W2, G_b1, G_b2]

def generator(z, y):
    ''' The generator net.
    '''
    inputs = tf.concat(axis=1, values=[z, y])
    G_h1 = tf.nn.relu(tf.matmul(inputs, G_W1) + G_b1)
    G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
    G_prob = tf.nn.sigmoid(G_log_prob)

    return G_prob

### Descriminator

Our Descrimiator will be pretty much the same, however it will be transforming a $1 \times 784$ MNIST image concatted to a one-hot vector into a single scalar between 0 and 1 representing the probability that the input image is a "real" MNIST image, with the knowledge of what digit it's supposed to be.

In [3]:
# The input image
X = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE])

# 1st layer's weights and bias
D_W1 = tf.get_variable('D_W1', shape=[IMAGE_SIZE + N_DIGITS, HIDDEN_SIZE],
                       initializer=tf.contrib.layers.xavier_initializer())
D_b1 = tf.get_variable('D_b1', shape=[HIDDEN_SIZE],
                       initializer=tf.zeros_initializer())

# 2nd layer's weights and bias
D_W2 = tf.get_variable('D_W2', shape=[HIDDEN_SIZE, 1],
                       initializer=tf.contrib.layers.xavier_initializer())
D_b2 = tf.get_variable('D_b2', shape=[1],
                       initializer=tf.zeros_initializer())

# The trainable discriminator variables
theta_D = [D_W1, D_W2, D_b1, D_b2]

def discriminator(x, y):
    '''The discriminator net.
    '''
    inputs = tf.concat(axis=1, values=[x, y])
    D_h1 = tf.nn.relu(tf.matmul(inputs, D_W1) + D_b1)
    D_logit = tf.matmul(D_h1, D_W2) + D_b2
    D_prob = tf.nn.sigmoid(D_logit)

    return D_prob, D_logit

## Defining Loss

We will define two loss functions, one for our generator and one for our descriminator. 

The descriminator is trying to give the real MNIST images a probability as close to 1 as possible, while also giving the generated images a probability close to 0. We can construct our loss function by summing the cross-entropy for each goal's respective logits.

The generator's goal is even simpler, it just wants to have the output for its generated images as close to 1 as possible. It's loss function can just be the cross-entropy between the output logits for the generated image and 1.

In [4]:
# Image created by the generator
G_sample = generator(Z, Y)

# Descriminator's output for the real MNIST image
D_real, D_logit_real = discriminator(X, Y)
# Descriminator's output for the generated MNIST image
D_fake, D_logit_fake = discriminator(G_sample, Y)

# Descriminator wants high probability for the real image
D_loss_real = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(
        logits=D_logit_real,
        labels=tf.ones_like(D_logit_real)))
# Descriminator also wants low probability for the generated image
D_loss_fake = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(
        logits=D_logit_fake,
        labels=tf.zeros_like(D_logit_fake)))
# We sum these to get our total descriminator loss
D_loss = D_loss_real + D_loss_fake

# Generator wants high probability for the generated image
G_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=D_logit_fake,
                                                                labels=tf.ones_like(D_logit_fake)))

## Training

For training we just go through in batches, generating 128 images and getting 128 real MNIST images, running the two optimizers. Every 1000 iterations, the program saves an output of 16 generated images and reports the current loss.

In [None]:
def sample_Z(m, n):
    '''Returns a uniform sample of values between
    -1 and 1 of size [m, n].
    '''
    return np.random.uniform(-1., 1., size=[m, n])

def plot(samples):
    '''Plots a grid of 16 generated images.
    '''
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig

# The optimizer for each net
D_optimizer = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
G_optimizer = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)

BATCH_SIZE = 128

image_input = np.identity(N_DIGITS)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Image counter
    i = 0
    for it in range(1000000):
        # Save out image of 16 generated digits
        if it % 1000 == 0:
            samples = sess.run(G_sample,
                               feed_dict={
                                   Z: sample_Z(N_DIGITS, NOISE_SIZE),
                                   Y: image_input
                               })
            fig = plot(samples)
            plt.show()

        # Get a batch of real MNIST images
        X_batch, Y_batch = mnist.train.next_batch(BATCH_SIZE)
        
        # Run our optimizers
        _, D_loss_curr = sess.run([D_optimizer, D_loss],
                                  feed_dict={
                                      X: X_batch,
                                      Z: sample_Z(BATCH_SIZE, NOISE_SIZE),
                                      Y: Y_batch
                                  })
        _, G_loss_curr = sess.run([G_optimizer, G_loss],
                                  feed_dict={
                                      Z: sample_Z(BATCH_SIZE, NOISE_SIZE),
                                      Y: Y_batch
                                  })

        # Report loss
        if it % 1000 == 0:
            print('Iter: {}'.format(it))
            print('D loss: {:.4}'. format(D_loss_curr))
            print('G_loss: {:.4}'.format(G_loss_curr))
            print()

## Results

<img style="width: 50%; margin: auto;" src="imgs/ipynb/GAN-loss.png"></img>
<figure class='inb'><img src='imgs/ipynb/GAN-0.png'></img><figcaption>Iteration 0</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-1.png'></img><figcaption>Iteration 1</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-2.png'></img><figcaption>Iteration 2</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-3.png'></img><figcaption>Iteration 3</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-4.png'></img><figcaption>Iteration 4</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-5.png'></img><figcaption>Iteration 5</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-10.png'></img><figcaption>Iteration 10</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-50.png'></img><figcaption>Iteration 50</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-100.png'></img><figcaption>Iteration 100</figcaption></figure>
<figure class='inb'><img src='imgs/ipynb/GAN-200.png'></img><figcaption>Iteration 200</figcaption></figure>