![](pics/header.png)

# Deep Learning: Generative Adversarial Network (GAN)

Kevin Walchko

---

## References and Examples

- [CartoonGAN](https://video.udacity-data.com/topher/2018/November/5bea23cd_cartoongan/cartoongan.pdf)
- [StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/abs/1612.03242)
- [Generative Adversarial Nets](https://arxiv.org/pdf/1406.2661.pdf)
- DCGAN: [UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS](https://arxiv.org/pdf/1511.06434.pdf)
- DCGAN: [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf)

## Overview

<img src="pics/gan/gan2.png" width="50%">

- Creates new output based on training from realworld data
    - Generator: creates output based on random noise input from probability distributions from real data
    - Descriminator: tries to determine the probability the input data is real or fake
- Since the Generator is trying to fool the Discriminator, it will constanty change its output to move up hill along the function learned by the Discriminator until the probabily the data the Discriminator sees is 50% (real or fake)
    - In the picture above, see how the distribution on the left eventually moves on top of the real data distribution on the right?
- GANs use game theory
- Run 2 optimizers, one for G and one for D
    - Adam is a good optimizer for both
    - Use [BCEWithLogitsLoss](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html#torch.nn.BCEWithLogitsLoss) since you are just determining fake (0) or real (1)
        - d_loss = nn.BCEWithLogitsLoss(logits, labels*0.9)
        - g_loss = nn.BCEWithLogitsLoss(logits, flipped_labels)
        - Image -> D -> logits -> sigmoid -> probabilities
        - This combines a sigmoid activation function and and binary cross entropy loss in one function.
    - All hidden layers will have [Leaky ReLu](https://pytorch.org/docs/stable/nn.html#torch.nn.LeakyReLU) applied to their output

## Architecture

<img src="pics/gan/gan_network.png" width="50%">

> **Note:** You can do the sigmoid in the loss function if you want. Here we will use the `nn.BCEWithLogitsLoss`. This loss combines a `Sigmoid` layer and the `BCELoss` in one single class. This version is more numerically stable than using a plain `Sigmoid` followed by a `BCELoss` as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

```python
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Discriminator(nn.Module):

    def __init__(self, input_size, hidden_dim, output_size):
        super(Discriminator, self).__init__()
        
        # define hidden linear layers
        self.fc1 = nn.Linear(input_size, hidden_dim*4)
        self.fc2 = nn.Linear(hidden_dim*4, hidden_dim*2)
        self.fc3 = nn.Linear(hidden_dim*2, hidden_dim)
        
        # final fully-connected layer
        self.fc4 = nn.Linear(hidden_dim, output_size)
        
        # dropout layer 
        self.dropout = nn.Dropout(0.3)
        
        
    def forward(self, x):
        # flatten image
        x = x.view(-1, 28*28)
        # all hidden layers
        x = F.leaky_relu(self.fc1(x), 0.2) # (input, negative_slope=0.2)
        x = self.dropout(x)
        x = F.leaky_relu(self.fc2(x), 0.2)
        x = self.dropout(x)
        x = F.leaky_relu(self.fc3(x), 0.2)
        x = self.dropout(x)
        # final layer
        out = self.fc4(x)

        return x
    
class Generator(nn.Module):

    def __init__(self, input_size, hidden_dim, output_size):
        super(Generator, self).__init__()
        
        # define hidden linear layers
        self.fc1 = nn.Linear(input_size, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim*2)
        self.fc3 = nn.Linear(hidden_dim*2, hidden_dim*4)
        
        # final fully-connected layer
        self.fc4 = nn.Linear(hidden_dim*4, output_size)
        
        # dropout layer 
        self.dropout = nn.Dropout(0.3)
        

    def forward(self, x):
        # all hidden layers
        x = F.leaky_relu(self.fc1(x), 0.2) # (input, negative_slope=0.2)
        x = self.dropout(x)
        x = F.leaky_relu(self.fc2(x), 0.2)
        x = self.dropout(x)
        x = F.leaky_relu(self.fc3(x), 0.2)
        x = self.dropout(x)
        # final layer with tanh applied
        out = F.tanh(self.fc4(x)) # sigmoid in nn.BCEWithLogitsLoss
        
        return x

# Calculate losses
def real_loss(D_out, smooth=False):
    batch_size = D_out.size(0)
    # label smoothing
    if smooth:
        # smooth, real labels = 0.9
        labels = torch.ones(batch_size)*0.9
    else:
        labels = torch.ones(batch_size) # real labels = 1
        
    # numerically stable loss
    criterion = nn.BCEWithLogitsLoss()
    # calculate loss
    loss = criterion(D_out.squeeze(), labels)
    return loss

def fake_loss(D_out):
    batch_size = D_out.size(0)
    labels = torch.zeros(batch_size) # fake labels = 0
    criterion = nn.BCEWithLogitsLoss()
    # calculate loss
    loss = criterion(D_out.squeeze(), labels)
    return loss


# learning rate for optimizers
lr = 0.002

# Create optimizers for the discriminator and generator
d_optimizer = optim.Adam(D.parameters(), lr)
g_optimizer = optim.Adam(G.parameters(), lr)
```

## Deep Convolutional GAN (DCGAN)

<table> 
    <tr>
        <td><img src="pics/gan/dcgan-g.png" width="691px"></td>
        <td><img src="pics/gan/dcgan-d.png" width="691px"></td>
    </tr>
</table>

> Architecture guidelines for stable Deep Convolutional GANs:
>
> - Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
> - Use batchnorm (mean=0, variance=1) in both the generator and the discriminator.
> - Remove fully connected hidden layers for deeper architectures.
> - Use ReLU activation in generator for all layers except for the output, which uses Tanh.
> - Use LeakyReLU activation in the discriminator for all layers.

## Batch Normalization

Batch normalization normalizes the output of a previous layer by subtracting the batch mean and dividing by the batch standard deviation. This results in:

- **Networks train faster:** Each training iteration will actually be slower because of the extra calculations during the forward pass and the additional hyperparameters to train during back propagation. However, it should converge much more quickly, so training should be faster overall.
- **Allows higher learning rates:** Gradient descent usually requires small learning rates for the network to converge. And as networks get deeper, their gradients get smaller during back propagation so they require even more iterations. Using batch normalization allows us to use much higher learning rates, which further increases the speed at which networks train.
- **Makes weights easier to initialize:** Weight initialization can be difficult, and it's even more difficult when creating deeper networks. Batch normalization seems to allow us to be much less careful about choosing our initial starting weights.
- **Makes more activation functions viable:** Some activation functions do not work well in some situations. Sigmoids lose their gradient pretty quickly, which means they can't be used in deep networks. And ReLUs often die out during training, where they stop learning completely, so we need to be careful about the range of values fed into them. Because batch normalization regulates the values going into each activation function, non-linearlities that don't seem to work well in deep networks actually become viable again.
- **Simplifies the creation of deeper networks:** Because of the first 4 items listed above, it is easier to build and faster to train deeper neural networks when using batch normalization. And it's been shown that deeper networks generally produce better results, so that's great.
- **Provides a bit of regularization:** Batch normalization adds a little noise to your network. In some cases, such as in Inception modules, batch normalization has been shown to work as well as dropout. But in general, consider batch normalization as a bit of extra regularization, possibly allowing you to reduce some of the dropout you might add to a network.
- **May give better results overall:** Some tests seem to show batch normalization actually improves the training results. However, it's really an optimization to help train faster, so you shouldn't think of it as a way to make your network better. But since it lets you train networks faster, that means you can iterate over more designs more quickly. It also lets you build deeper networks, which are usually better. So when you factor in everything, you're probably going to end up with better results if you build your networks with batch normalization

Basic equations:

$$
\mu_B \leftarrow \frac {1}{m} \sum^m_{i=1} x_i \\
\sigma^2_B \leftarrow \frac {1}{m} \sum^m_{i=1} (x_i - \mu_B)^2 \\
\hat x_i \leftarrow \frac {x_i - \mu_B} {\sqrt{\sigma^2_B + \epsilon}} \\
y_i \leftarrow \gamma \hat x_i + \beta
$$

where $\epsilon$ is any small positive value (e.g., 0.001) to ensure we don't divide by zero, $\hat x_i$ is the normalized value, and both $\gamma$ and $\beta$ are learnable parameters that modify the normalized value before it ($y_i$) is feed into the next layer.

## PyTorch Batch Normalization

- Layers with batch normalization do not include a bias term. So, for linear or convolutional layers, you'll need to set bias=False if you plan to add batch normalization on the outputs.
- You can use PyTorch's [nn.BatchNorm1d](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html#torch.nn.BatchNorm1d) function to handle the math on linear outputs or [nn.BatchNorm2d](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html#torch.nn.BatchNorm2d) for 2D outputs, like filtered images from convolutional layers.
- You add the batch normalization layer before calling the activation function, so it always goes layer -> batch norm -> activation.

## Pix2Pix and CycleGAN

## Loss Function



## Logits

The vector of raw (non-normalized) predictions that a classification model generates, which is ordinarily then passed to a normalization function. If the model is solving a multi-class classification problem, logits typically become an input to the softmax function. The softmax function then generates a vector of (normalized) probabilities with one value for each possible class. 

- stack overflow:[ref](https://stackoverflow.com/a/60543547/5374768) 
- pytorch: [ref](https://developers.google.com/machine-learning/glossary#logits)

In [1]:
from torch import nn

In [7]:
try:
    c = nn.Conv2d(32,1,3,bias=False)
    print(c.__class__.__name__)
    print(c.bias.data)
except Exception as e:
    print(e)

Conv2d
'NoneType' object has no attribute 'data'


In [6]:
try:
    cc = nn.Conv2d(32,1,3)
    print(cc.bias.data)
except Exception as e:
    print(e)

tensor([0.0078])
