In [2]:
import tensorflow as tf

# Define generator and discriminator: the old way

Tensorflow allows up to define computational graphs. We can hence describe both the generator $G$ and the discriminator $D$, just like two interacting subgraphs.

A tensorflow specific feature that we have to take into account when we want to define a GAN in the "old" way (aka without using a `tf.estimator`) is the node's scope.

## Tensorflow: nodes' scope

Every node (variable and op) in the computational graph has a unique name. The naming system of tensorflow works like a filesystem directory structure:

```
/root/tree/leaf_1
/root/tree/leaf_2
```

In this case, the scope of both `leaf_1` and `leaf_2` is `/root/tree`. Obviously, under the same scope is **not** possibile to have 2 nodes with the same name.

When describing the discriminator, we'll need to feed to the network both the real and the generated samples. We could do this in two different ways:

1. Manually creating the input batches with the real and the generated sample concatenated along the first dimension and do the same for the labels.
2. Exploit the `tf.variable_scope` `reuse` feature, that allow us to define two different graphs that shares the same variables and use them to separetely feed the real and the generated data.

We're going to use this second option because it's easier to use and understand (and more elegant).

## DCGAN

In order to produce results that are easily human understandable, the architecture we're going to implement is DCGAN<sup>[1](#1)</sup> (Deep Convolutional GAN).

This was the first GAN using convolutional neural networks (CNN) for both the generator and the discriminator. CNN are the network architecture perfectly suited for working with images.

### Generator

![DCGAN generator](images/DCGAN_G.png)

The generator architecture is pretty much a standard semantic segmmentation "decode" network architecture.

In general, semantic segmentation networks architecture are composed by 2 parts:

1. The encoder: composed by several convolutional layers that learn to extract features and to encode the input in a low-dimensional reprensentation
2. The decoder: composed by several "deconvolutional" layers that learn to extract the semantic from the low level representation.

The generator of DCGAN can be easily seen as the decoder of a semantic segmentation network, that learns to generate realistic samples starting for values sampled from a random distribution and not from a low-dimensional encoding of an input.

We can starting defining some helper functions that we are going to use to easily define the architecture.

### deconv2d

An excellent post<sup>[2](#2)</sup> highlighted that the deconvolution operation produces chessboard artifact. This layer is going to apply the suggested correction to the deconvolution operation, in order to remove/reduce the generated artifacts.

**Long story short**: do not use the `tf.conv2d_transpose` operation, but simply resize the input and apply a standard convolution on the resized input.

This layer, moreover, adds a padding before the convolution, so that the information of the border is completely captured by the convolution operation and not discarded.
The pad is a layer of 2 zeros per side (along W and H dimensions), so that a convolution with a 5x5 kernel will remove this border producing an output volume as big as expected.


In [6]:
def deconv2d(inputs, filters, strides=(1, 1), activation=tf.nn.relu):
    """"Deconvolution" layer.
    It uses upsampling with nearest neighbor interpolation to reduce the
    presence of checkboard artifacts.
    The output WxH is two times the input WxH. The kernel size is always 5x5
    """
    input_h, input_w = inputs.shape[1].value, inputs.shape[2].value
    pad = tf.image.resize_nearest_neighbor(
        inputs, (2 * input_h, 2 * input_w), name="UpSample2D"
    )
    pad = tf.pad(layer_1, [[0, 0], [2, 2], [2, 2], [0, 0]], mode="CONSTANT")
    layer = tf.layers.conv2d(
        pad,
        filters=filters,
        kernel_size=5,
        padding="VALID",
        use_bias=False,
        activation=activation,
        strides=strides,
    )
    return layer

**References**:

<a id="1">[1]</a>: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks https://arxiv.org/pdf/1511.06434

<a id="2">[2]</a>: Deconvolution and Checkerboard Artifacts https://distill.pub/2016/deconv-checkerboard/