# GAN: theory and applications

*Adversarial Training (also called GAN for Generative Adversarial Networks) is most interesting idea in the last 10 years of ML* <sup>[1](#1)</sup>

GANs are a framework for the estimation of generative models via an adversarial process in which 2 models, a **discriminator** $D$ and a **generator** $G$, are trained simultaneously.

The generative model $G$ aim is to capture the data distribution, whilst the discriminative model $D$ estimates the probability that a sample came from the training data rhather then $G$.

The power of the adversarial training framework cames from the fact that both $D$ and $G$ can be a non-linear, parametric, mapping functions such as **neural networks** and all the network can be trained end to end using **gradient descent**.

To learn a generator distribtution $p_g$ over the data **$x$** the generator builds a mapping from a **prior** noise distribution $p_z(z)$ to a data space as $G(z;\theta_g)$.

The discriminator $D(x;\theta_d)$ outputs a single scalar representing the probability that $x$ came from real data rather than $p_g$.

The original GAN framework poses this problem as a **min-max** game in which the two players ($G$ and $D$) compete one against the other.

## Generator

The generator is responsible for learning a distribution as good as a sample $Y \sim p_g$ can fool the discriminator

## Discriminator

The discriminator is responsibile for classifiyng the generated sample $D(Y) = D(G(z))$ in 1 of 2 different classes: **real** vs **fake**.

Generator and Discriminator compete against each other, playing the following zero sum min-max game with value function $V_{GAN}(D,G)$

$$ \min_G \max_D V_{GAN}(D,G) = \mathbb{E} _{x \sim p_{data}(x)}[log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))] $$

## Intuitive explaination

We want to be sure to train the **discriminator** $D$ in order to correctly classify the values sampled from the real data (**maximize** $\mathbb{E}_{x\sim p_{data}(x)}[\log D(x)]$ ) and **at the same time** given a fake sample $G(z), z \sim p_z(z)$ the discriminator is expected to output a probability $D(G(z))$ close to zero, by **maximizing** $\mathbb{E}_{z \sim p_z{z}}[\log(1- D(G(z)))]$.

The **generator** instead, is trained in order to *fool the discriminator*, so it will learn to produce samples that are more and more similar to the ones sampled from the real data distribution and it will do this by **minimizing** $\mathbb{E}_{z \sim p_z{z}}[\log(1- D(G(z)))]$

Note: *the minmax game is played only in the second part of the equation, in fact when updating the generator parameters the first term has no impact*.

## Non saturating value function

As Goodfellow itself pointed out in the original GAN paper<sup>[2](#2)</sup> the previous equation may not provide sufficient gradient for $G$ to learn well. Early in learning, when the quality of the generated samples from $G$ is poor, $D$ can reject samples with high confidence because they are clearly different from the training data. In this case, $\log(1 - D(G(z)))$ **saturates**.

The proposed solution is to train $G$ to **maximize** $\log D(G(z))$ instead of minimizing $\log(1 - D(G(z)))$. This means that the 2 networks are going to optimize 2 different, but interacting, value functions (they are playing the same game, in a different manner):

$$V_{GAN}(D,G) = \begin{cases}
D: & \min_G \max_D  \mathbb{E} _{x \sim p_{data}(x)}[log D(x)] + \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))] \\
G: & \max_G \mathbb{E}_{z \sim p_z(z)}[\log(D(G(z)))]
\end{cases}$$


<a id="1">[1]</a>: According to Yann LeCun answer on Quora: https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs-in-deep-learning/answer/Yann-LeCun

<a id="2">[2]</a>: Generative Adversarial Networks https://arxiv.org/pdf/1406.2661.pdf