Code adapted from [affinelayer's TensorFlow implementation](https://github.com/affinelayer/pix2pix-tensorflow). I would highly recommend checking out [his post](https://github.com/affinelayer/pix2pix-tensorflow) for a high-level overview of how the model works. This post is more meant to walk through the code line-by-line.

## Load Data

First lets load in our data. To start we're going to work with the facades dataset. The input image will be a labeled version of the second picture of a building facade, with different colors representing different features like windows, doors, etc. Our goal will be to produce the second image from the first.

Unlike prior posts, we're going to use some of TensorFlow's utilities for loading in data. We get our list of files, put them into a queue, and have a `WholeFileReader` read and decode each. The format of these images is the first half is the target photo, and the second half is the annotated version.

<img src="imgs/ipynb/pix2pix/input_example.jpg">

We'll preprocess the image to have pixel values between `[-1, 1]` and then cut it in half, assigning the first part to our target and the second to our input.

In [None]:
import tensorflow as tf
import numpy as np
import glob
import math

## Load data ##
input_paths = glob.glob('data/pix2pix/facades/train/*.jpg') # All jpgs
path_queue = tf.train.string_input_producer(input_paths, shuffle=True) # Produces image paths
reader = tf.WholeFileReader()
paths, contents = reader.read(path_queue)
rawInput = tf.image_decode_jpeg(contents)
rawInput = tf.image.convert_image_dtype(rawInput, dtype=tf.float32)

# [height, width, channel]
rawInput.set_shape([None, None, 3])
width = tf.shape(rawInput)[1]

preprocess = lambda x: x * 2 - 1 # [0, 1] => [-1, 1]
targets = preprocess(rawInput[:,:width//2,:]) # Left side
inputs = preprocess(rawInput[:,width//2:,:]) # Right side

# TODO: Move
BATCH_SIZE = 1
paths, inputs, targets = tf.train.batch([paths, inputs, targets], batch_size=BATCH_SIZE)
steps_per_epoch = int(math.ceil(len(input_paths) / BATCH_SIZE))

## Some Helper Functions

Before we start building the model, there are a couple functions we'll be using that we should define.

### Conv & Deconv

If you're not familiar with Convolution, check out my post on [Convolutional Neural Nets](/posts/cnn-mnist.html). Deconvolution (or Transposed Convolution) was new to me. Normally when performing convolution, we're applying filters in such a way that we decrease the size of our image, effectively downsampling it. Deconvolution does the opposite, upsampling an image to output a larger tensor. There is a lot of debate as what to call this; deconvolution has a well defined meaning from signal processing as the inverse of convolution, which is not what this operation does. Because of this, many people opt to use the name Transposed Convolution instead, which is also what the TensorFlow API does. I'm just going to use `deconv` in code because it's easier to type.

What deconvolution ends up looking like is just convolution but with more padding around/between pixels:

<div style='text-align: center'>
<figure style="display: inline-block;">
<img src="imgs/ipynb/pix2pix/conv.gif">
<figcaption>Convolution with Stride 2</figcaption>
</figure>
<figure style="display: inline-block;">
<img src="imgs/ipynb/pix2pix/deconv.gif">
<figcaption>Deconvolution with Stride 2</figcaption>
</figure>
<p>Animations from <a href="https://github.com/vdumoulin/conv_arithmetic">here</a></p>
</div>

In [None]:
def conv(batch_input, out_channels, stride):
    ''' Convolve input with given stride. '''
    in_channels = batch_input.get_shape()[3]
    # The trainable filter we create for the conv
    conv_filter = tf.get_variable("filter",
                                  [4, 4, in_channels, out_channels],
                                  dtype=tf.float32,
                                  initializer=tf.random_normal_initializer(0, 0.02))

    padded_input = tf.pad(batch_input,
                          [[0, 0], [1, 1], [1, 1], [0, 0]],
                          mode="CONSTANT")
    # Output of the conv
    conv = tf.nn.conv2d(padded_input,
                        conv_filter,
                        [1, stride, stride, 1],
                        padding="VALID")
    return conv

def deconv(batch_input, out_channels):
    ''' Transposed Convolution. '''
    batch, in_height, in_width, in_channels = [int(d) for d in batch_input.get_shape()]

    # The trainable filter we create for the deconv
    conv_filter = tf.get_variable("filter",
                                  [4, 4, out_channels, in_channels],
                                  [dtype=tf.float32],
                                  initializer=tf.random_normal_initializer(0, 0.02))
    # Output of the deconv
    conv = tf.nn.conv2d_transpose(batch_input,
                                  conv_filter,
                                  [batch, in_height * 2, in_width * 2, out_channels],
                                  [1, 2, 2, 1],
                                  padding="SAME")
    return conv

### Leaky ReLU

Leaky ReLU is a variation of normal ReLU that helps prevent dead neurons. With ReLU, values less than 0 are set to 0, and have no gradient. This creates the problem where a neuron can start always outputting 0 for any input and once in that state, can't get out of it since there is no gradient to go up. Leaky ReLU tries to fix this by having values less than zero have a slight negative slope, equivalent to the following:

\begin{align*}
\operatorname{ReLU}(x) &= \max(0,x) \\
\operatorname{LReLU}(x,a) &= \frac{1+a}{2}x + \frac{1-a}{2} \vert x \vert  \\
&= \begin{cases}
    x,&  x \geq 0\\
    ax,& x < 0
\end{cases}
\end{align*}

<div style="text-align:center">
<img style="margin: 5px; border: 1px solid black; display: inline-block; width: 30%" src="imgs/ipynb/pix2pix/relu.png"><img style="margin: 5px; border: 1px solid black; display: inline-block; width: 30%" src="imgs/ipynb/pix2pix/lrelu.png"></div>

In [None]:
def lrelu(x, a):
    ''' Leaky ReLU. '''
    return (0.5 * (1 + a)) * x + (0.5 * (1 - a)) * tf.abs(x)

### Batch Normalization

Batch Normalization is a cool general technique for improving training. All it does is normalize the input to a layer for mean and variance, while also including a trainable bias and scale parameter that allows the amount of normalization to be adjusted.

In [None]:
def batchnorm(inp):
    ''' Batch Normalization. '''
    channels = inp.get_shape()[3]
    offset = tf.get_variable("offset", [channels], dtype=tf.float32, initializer=tf.zeros_initializer())
    scale = tf.get_variable("scale", [channels], dtype=tf.float32, initializer=tf.random_normal_initializer(1.0, 0.02))
    mean, variance = tf.nn.moments(inp, axes=[0, 1, 2], keep_dims=False)
    variance_epsilon = 1e-5
    normalized = tf.nn.batch_normalization(inp, mean, variance, offset, scale, variance_epsilon=variance_epsilon)
    return normalized

## Generator

<figure>
<img src="imgs/ipynb/pix2pix/generator.png">
<img style="display:block;width:50%;margin:auto;" src="imgs/ipynb/pix2pix/units.png">
<figcaption>Images from <a href="https://affinelayer.com/pix2pix/">affinelayer</a></figcaption>
</figure>

### Skip Layers

In [None]:
# Number of generator filters
NGF = 64

def create_generator(generator_inputs):
    ''' Creates our generator for the given inputs. '''
    layers = []

    # encoder_1: [batch, 256, 256, 3] => [batch, 128, 128, ngf]
    output = conv(generator_inputs, NGF, stride=2)
    layers.append(output)

    layer_specs = [
        NGF * 2, # encoder_2: [batch, 128, 128, ngf] => [batch, 64, 64, ngf * 2]
        NGF * 4, # encoder_3: [batch, 64, 64, ngf * 2] => [batch, 32, 32, ngf * 4]
        NGF * 8, # encoder_4: [batch, 32, 32, ngf * 4] => [batch, 16, 16, ngf * 8]
        NGF * 8, # encoder_5: [batch, 16, 16, ngf * 8] => [batch, 8, 8, ngf * 8]
        NGF * 8, # encoder_6: [batch, 8, 8, ngf * 8] => [batch, 4, 4, ngf * 8]
        NGF * 8, # encoder_7: [batch, 4, 4, ngf * 8] => [batch, 2, 2, ngf * 8]
        NGF * 8, # encoder_8: [batch, 2, 2, ngf * 8] => [batch, 1, 1, ngf * 8]
    ]

    for out_channels in layer_specs:
        rectified = lrelu(layers[-1], 0.2)
        # [batch, in_height, in_width, in_channels] => [batch, in_height/2, in_width/2, out_channels]
        convolved = conv(rectified, out_channels, stride=2)
        output = batchnorm(convolved)
        layers.append(output)

    layer_specs = [
        (NGF * 8, 0.5),   # decoder_8: [batch, 1, 1, ngf * 8] => [batch, 2, 2, ngf * 8 * 2]
        (NGF * 8, 0.5),   # decoder_7: [batch, 2, 2, ngf * 8 * 2] => [batch, 4, 4, ngf * 8 * 2]
        (NGF * 8, 0.5),   # decoder_6: [batch, 4, 4, ngf * 8 * 2] => [batch, 8, 8, ngf * 8 * 2]
        (NGF * 8, 0.0),   # decoder_5: [batch, 8, 8, ngf * 8 * 2] => [batch, 16, 16, ngf * 8 * 2]
        (NGF * 4, 0.0),   # decoder_4: [batch, 16, 16, ngf * 8 * 2] => [batch, 32, 32, ngf * 4 * 2]
        (NGF * 2, 0.0),   # decoder_3: [batch, 32, 32, ngf * 4 * 2] => [batch, 64, 64, ngf * 2 * 2]
        (NGF, 0.0),       # decoder_2: [batch, 64, 64, ngf * 2 * 2] => [batch, 128, 128, ngf * 2]
    ]

    num_encoder_layers = len(layers)
    for decoder_layer, (out_channels, dropout) in enumerate(layer_specs):
        skip_layer = num_encoder_layers - decoder_layer - 1
        if decoder_layer == 0:
            # first decoder layer doesn't have skip connections
            # since it is directly connected to the skip_layer
            input = layers[-1]
        else:
            input = tf.concat([layers[-1], layers[skip_layer]], axis=3)

        rectified = tf.nn.relu(input)
        # [batch, in_height, in_width, in_channels] => [batch, in_height*2, in_width*2, out_channels]
        output = deconv(rectified, out_channels)
        output = batchnorm(output)

        if dropout > 0.0:
            output = tf.nn.dropout(output, keep_prob=1 - dropout)

        layers.append(output)

    # decoder_1: [batch, 128, 128, ngf * 2] => [batch, 256, 256, 3]
    inp = tf.concat([layers[-1], layers[0]], axis=3)
    rectified = tf.nn.relu(inp)
    output = deconv(rectified, 3)
    output = tf.tanh(output)
    layers.append(output)

    return layers[-1]

## Descriminator

<figure>
<img src="imgs/ipynb/pix2pix/discriminator.png">
<figcaption>Image from <a href="https://affinelayer.com/pix2pix/">affinelayer</a></figcaption>
</figure>

In [None]:
NDF = 64

def create_discriminator(discrim_inputs, discrim_targets):
    n_layers = 3
    layers = []

    # 2x [batch, height, width, in_channels] => [batch, height, width, in_channels * 2]
    inp = tf.concat([discrim_inputs, discrim_targets], axis=3)

    # layer_1: [batch, 256, 256, in_channels * 2] => [batch, 128, 128, ndf]
    convolved = conv(inp, NDF, stride=2)
    rectified = lrelu(convolved, 0.2)
    layers.append(rectified)

    # layer_2: [batch, 128, 128, ndf] => [batch, 64, 64, ndf * 2]
    # layer_3: [batch, 64, 64, ndf * 2] => [batch, 32, 32, ndf * 4]
    # layer_4: [batch, 32, 32, ndf * 4] => [batch, 31, 31, ndf * 8]
    for i in range(n_layers):
        out_channels = a.ndf * min(2**(i+1), 8)
        stride = 1 if i == n_layers - 1 else 2  # last layer here has stride 1
        convolved = conv(layers[-1], out_channels, stride=stride)
        normalized = batchnorm(convolved)
        rectified = lrelu(normalized, 0.2)
        layers.append(rectified)

    # layer_5: [batch, 31, 31, ndf * 8] => [batch, 30, 30, 1]
    convolved = conv(rectified, out_channels=1, stride=1)
    output = tf.sigmoid(convolved)
    layers.append(output)

    return layers[-1]

## Loss

## Train

## Results