# Generative Adversarial Networks (GAN)

 - Practicum: [Week9](https://www.youtube.com/watch?v=xYc11zyZ26M) - [Instant](https://youtu.be/xYc11zyZ26M?t=2947)
 
 
A generative adversarial network (GAN) is a network that includes:
 - a module `COST` that try to differenciate if the input is from the original dataset (lower output), `vector x`, or if the input was syntetic generated (higher output), `vector ^x`. 
 - a module `GENERATOR` that try to generate syntetic inputs, `vector ^x`, from a random distribution that are as similar to the original dataset as possible.

![generative-adversarial-network](./res/generative-adversarial-network.png)

`COST` module should generate low cost to the pink x, and high cost to the blue x hat. `GENERATOR` module try to trick/fool the `COST` module, trying to create syntentic xs very similar to the original xs.

In the dimensional space, this netwoork starts from some point in the latent space, the generator model convert this latent point to one point in the input dimensional space. The `cost` model identifies if this new point follows the distribution of the training points, real points. 

![gan_space](./res/gan_space.png)

To train models we minimizes the corresponding loss functions:
 - to train `cost network` we use a `MSE loss function` between real `x` (low cost) and syntetic `x` (high cost).
 - to train `generator network` the loss function is the `cost network`. 

![gan_train_loss](./res/gan_train_loss.png)

Possible problems:
 - The `generator network` maps **all** the points in the latent space to a only point in the real dataset.
 - Vanishing gradients
 - Unstable converge

-----

In this notebook, we are going to inspect a code generate by professionals and available from the pytorch github in the examples section ([link](https://github.com/pytorch/examples/tree/master/dcgan)).

We are going to copy and split the raw code in the following cells to analyse and include coments. The original code is a script that you can execute by command line.

**THIS CODE IS NOT EXECUTABLE BECAUSE IT USES INSTRUCTIONS ONLY VALID BY COMMAND LINE. THIS CODE IS ONLY TO MANUAL INSPECTION**

In [None]:
# Imports
from __future__ import print_function
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils

The original code is a script executable from the command line. This is the code to capture the arguments of the command execution.

In [None]:
parser = argparse.ArgumentParser()
parser.add_argument('--dataset', required=True, help='cifar10 | lsun | mnist |imagenet | folder | lfw | fake')
parser.add_argument('--dataroot', required=False, help='path to dataset')
parser.add_argument('--workers', type=int, help='number of data loading workers', default=2)
parser.add_argument('--batchSize', type=int, default=64, help='input batch size')
parser.add_argument('--imageSize', type=int, default=64, help='the height / width of the input image to network')
parser.add_argument('--nz', type=int, default=100, help='size of the latent z vector')
parser.add_argument('--ngf', type=int, default=64)
parser.add_argument('--ndf', type=int, default=64)
parser.add_argument('--niter', type=int, default=25, help='number of epochs to train for')
parser.add_argument('--lr', type=float, default=0.0002, help='learning rate, default=0.0002')
parser.add_argument('--beta1', type=float, default=0.5, help='beta1 for adam. default=0.5')
parser.add_argument('--cuda', action='store_true', help='enables cuda')
parser.add_argument('--dry-run', action='store_true', help='check a single training cycle works')
parser.add_argument('--ngpu', type=int, default=1, help='number of GPUs to use')
parser.add_argument('--netG', default='', help="path to netG (to continue training)")
parser.add_argument('--netD', default='', help="path to netD (to continue training)")
parser.add_argument('--outf', default='.', help='folder to output images and model checkpoints')
parser.add_argument('--manualSeed', type=int, help='manual seed')
parser.add_argument('--classes', default='bedroom', help='comma separated list of classes for the lsun data set')

Print all the options

In [None]:
opt = parser.parse_args()
print(opt)

Generate a folder to export the model

In [None]:
try:
    os.makedirs(opt.outf)
except OSError:
    pass

Set the seed to the random generation (deterministic random generation to reproducible results)

In [None]:
if opt.manualSeed is None:
    opt.manualSeed = random.randint(1, 10000)
print("Random Seed: ", opt.manualSeed)
random.seed(opt.manualSeed)
torch.manual_seed(opt.manualSeed)

Allows faster GPU routines

In [None]:
cudnn.benchmark = True

CUDA check and warning

In [None]:
if torch.cuda.is_available() and not opt.cuda:
    print("WARNING: You have a CUDA device, so you should probably run with --cuda")

Check that there is a training dataset

In [None]:
if opt.dataroot is None and str(opt.dataset).lower() != 'fake':
    raise ValueError("`dataroot` parameter is required for dataset \"%s\"" % opt.dataset)

Load datset (different options of datset)

In [None]:
if opt.dataset in ['imagenet', 'folder', 'lfw']:
    # folder dataset
    dataset = dset.ImageFolder(root=opt.dataroot,
                               transform=transforms.Compose([
                                   transforms.Resize(opt.imageSize),
                                   transforms.CenterCrop(opt.imageSize),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                               ]))
    nc=3


elif opt.dataset == 'lsun':
    classes = [ c + '_train' for c in opt.classes.split(',')]
    dataset = dset.LSUN(root=opt.dataroot, classes=classes,
                        transform=transforms.Compose([
                            transforms.Resize(opt.imageSize),
                            transforms.CenterCrop(opt.imageSize),
                            transforms.ToTensor(),
                            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                        ]))
    nc=3
elif opt.dataset == 'cifar10':
    dataset = dset.CIFAR10(root=opt.dataroot, download=True,
                           transform=transforms.Compose([
                               transforms.Resize(opt.imageSize),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ]))
    nc=3

elif opt.dataset == 'mnist':
        dataset = dset.MNIST(root=opt.dataroot, download=True,
                           transform=transforms.Compose([
                               transforms.Resize(opt.imageSize),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5,), (0.5,)),
                           ]))
        nc=1

elif opt.dataset == 'fake':
    dataset = dset.FakeData(image_size=(3, opt.imageSize, opt.imageSize),
                            transform=transforms.ToTensor())
    nc=3

assert dataset

Put the dataset in a pytorch data loader

In [None]:
dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt.batchSize,
                                         shuffle=True, num_workers=int(opt.workers))

Definition of variables

In [None]:
device = torch.device("cuda:0" if opt.cuda else "cpu") 
ngpu = int(opt.ngpu) # Number of gpu
nz = int(opt.nz)     # Number of latent variable
ngf = int(opt.ngf)   # Number of generative features
ndf = int(opt.ndf)   # Number of discriminative (cost) features

Weight initialization to get some proper training

In [None]:
# custom weights initialization called on netG and netD
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        torch.nn.init.normal_(m.weight, 1.0, 0.02)
        torch.nn.init.zeros_(m.bias)

Define the generator network

In [None]:
class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()                            ## Output from -1 to +1
            # state size. (nc) x 64 x 64
        )

    def forward(self, input):
        if input.is_cuda and self.ngpu > 1:
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
        else:
            output = self.main(input)
        return output

# Instiate network
netG = Generator(ngpu).to(device)

# Initialize weights
netG.apply(weights_init) 

# Load previous state to continue previous training
if opt.netG != '':
    netG.load_state_dict(torch.load(opt.netG))
print(netG)

Define the discriminator (cost) network

In [None]:
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        if input.is_cuda and self.ngpu > 1:
            output = nn.parallel.data_parallel(self.main, input, range(self.ngpu))
        else:
            output = self.main(input)

        return output.view(-1, 1).squeeze(1)

# Instiate network
netD = Discriminator(ngpu).to(device)

# Initialize weights
netD.apply(weights_init)

# Load previous state to continue previous training
if opt.netD != '':
    netD.load_state_dict(torch.load(opt.netD))
print(netD)

Define loss function

In [None]:
criterion = nn.BCELoss()

Define noise and labels

In [None]:
fixed_noise = torch.randn(opt.batchSize, nz, 1, 1, device=device)
real_label = 1
fake_label = 0

Setup optimizer

In [None]:
optimizerD = optim.Adam(netD.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))

If simple run only one iteration (epoch)

In [None]:
if opt.dry_run:
    opt.niter = 1

Training loop

In the training, by epoch we train both networks (discrimator and generator) at same time. In one epoch, we train a batch of real inputs and generated inputs. 

By each batch, first we train the discrimator and then the generator.

To train the discrimator, we first train the real data batch and then with the fake data batch (from the generator: `fake = netG(noise)`). Firstly, we clean the previous gradient computation of the discriminator (`netD.zero_grad()`). Then, we compute the partial derivate of both types of inputs (`errD_real.backward()` and `errD_fake.backward()`). Finally, we apply the training step as a sum of boths (`errD = errD_real + errD_fake ; optimizerD.step()`). 


To train the generator, we clean the previous gradient computation of the generator (`netD.zero_grad()`). Then we compute the loss from the cost network output of the fake data with a real label (it computes how different is the fake data of the real data) (`label.fill_(real_label) ; output = netD(fake) : errG = criterion(output, label)`). Finally, compute the gradient and perform the training step of the generator.

Every line of the code has his own comment to follow better the nature of each instruction.

In [None]:
for epoch in range(opt.niter):
    for i, data in enumerate(dataloader, 0):
        
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        
        #  ---- train with real
        netD.zero_grad()                                       # Clean previous gradient computations
        real_cpu = data[0].to(device)                          # Set the data in the device
        batch_size = real_cpu.size(0)                          # Get the batch size of the real data
        label = torch.full((batch_size,), real_label,          # Set the real label to the real data
                           dtype=real_cpu.dtype, device=device)

        output = netD(real_cpu)                                # Forward the discriminator model with the real data
        errD_real = criterion(output, label)                   # Compute loss 
        errD_real.backward()                                   # Calculate gradient 
        D_x = output.mean().item()                             # Calculate the mean discrimator output with the real data

        
        # ---- train with fake
        
        ## Generate the fake data with the generator network
        noise = torch.randn(batch_size, nz, 1, 1, device=device) # Z (space latent) - Generate batch_size noise points 
        fake = netG(noise)                                       # Generate batch_size fake points - forward the generator with the noise
        
        label.fill_(fake_label)                                # Set the fake label to the fake data
        output = netD(fake.detach())                           # Forward the discriminator model with the fake data
        errD_fake = criterion(output, label)                   # Compute loss 
        errD_fake.backward()                                   # Calculate gradient 
        D_G_z1 = output.mean().item()                          # Calculate the mean discrimator output with the fake data
        
        errD = errD_real + errD_fake                           # Sum gradients from real and fake data
        optimizerD.step()                                      # Perform training step

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()                                       # Clean previous gradient computations
        label.fill_(real_label)                                # Fake labels are real for generator cost
        output = netD(fake)                                    # Forward the cost model with the fake data (generator loss is the cost output)
        errG = criterion(output, label)                        # Compute loss 
        errG.backward()                                        # Calculate gradient 
        D_G_z2 = output.mean().item()                          # Calculate the mean cost output with the fake data as real data
        optimizerG.step()                                      # Perform training step

        ## LOG DEGUG INFO
        print('[%d/%d][%d/%d] Loss_D: %.4f Loss_G: %.4f D(x): %.4f D(G(z)): %.4f / %.4f'
              % (epoch, opt.niter, i, len(dataloader),
                 errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
        
        ## After some specific number of steps it stored debug info
        if i % 100 == 0:
            # Stored the real images that we used to train in this step
            vutils.save_image(real_cpu,
                    '%s/real_samples.png' % opt.outf,
                    normalize=True)
            # Stored as the generator is producing a fake image (each step it should improve)
            fake = netG(fixed_noise)
            vutils.save_image(fake.detach(),
                    '%s/fake_samples_epoch_%03d.png' % (opt.outf, epoch),
                    normalize=True)
        
        # If this is a simple iteration stop the loop
        if opt.dry_run:
            break
            
    # After each epoch do checkpointing of the models
    torch.save(netG.state_dict(), '%s/netG_epoch_%d.pth' % (opt.outf, epoch))
    torch.save(netD.state_dict(), '%s/netD_epoch_%d.pth' % (opt.outf, epoch))

Now, if you want execute this code you should follow the instructions of this [page](https://github.com/pytorch/examples/tree/master/dcgan)