# StyleGan

Look at this: [https://thispersondoesnotexist.com/](https://thispersondoesnotexist.com/) ! Would you believe this picture is not real??? Well it is not! Feel free to refresh the page to look at more examples. This picture has been generated by [StyleGan](https://arxiv.org/pdf/1812.04948.pdf), a groundbreaking architecture in the world of GANs! 

The StyleGan uses a lot of tricks that we have seen in the course (eg, progressive growing) but also relies on a novel approach to controlled generation. The figure below describes the architecture of the network:

<img src='assets/stylegan.png' width=60% />


Wow! We have a completely new network in our GAN. This network is called the **noise mapping** network. The architecture of this network is fairly simple, it only consists of 8 fully connected layers. The authors argue that mapping the latent vector $z$ to a new latent vector $w$ facilitates **disentanglement**. 

We talked about disentanglement in the course: when modifying the latent vector $z$ to control the aspect of the generated image, we often face entangled features. For example, longer hair can be correlated with a more feminine face. Using the mapping vector in StyleGan facilities the decorrelation of such features.

What happens next? The generated $w$ latent vector is injected into a "classic" generator network. However, this network has two components you are not yet familiar with, as seen in the figure above. Indeed, after each convolution layer, we see that the authors are adding **noise**. Moreover, the convolution output with added noise is then fed into a **adaptive instance normalization layer or AdaIN**.

In this notebook, you will implement a **noise injection layer** and the **AdaIn layer**.

## Noise injection

The noise injection helps with something that the authors call **stochastic variation**. They argue that many aspects of a human face are stochastic, such as hair curls or freckles. By adding random noise at different levels in the generator, they can create more variability without changing the overall image. For example in the image below, we can see how different noise vectors impacts the placement of the hair.

<br>
<img src='assets/stochastic_variation.png' width=40% />
<br>

After each convolution layer in the generator, the authors added a noise injection layer. A random gaussian noise is added to the output and scaled by a learned factor. Let's look at an example together:
* let's say that the output shape of the convolution layer is `(1, 256, 32, 32)` where 256 is the number of channels and 32x32 the spatial dimensions.
* we create a random noise matrix of dimensions `(1, 1, 32, 32)`
* we multiply the above random by a learned scaling factor vector of dimensions `(1, 256, 1, 1)`. This learned scaling factor is initialized with zeros.

For the first exercise of this notebook, you will implement the `ApplyNoise` layer.

<br>
<br>
<details>
<summary>
<font size="3" color="black">
<b>Click for tips</b>
</font>
</summary>

* You can read about custom pytorch modules implementation [here](https://pytorch.org/tutorials/beginner/examples_nn/two_layer_net_module.html).
</details>


In [1]:
import torch
import torch.nn as nn

import tests

In [2]:
class ApplyNoise(nn.Module):
    """
    Noise injection layer with learnable parameters.
    
    args:
    - channels: number of channels of the input
    """
    def __init__(self, channels: int):
        super(ApplyNoise, self).__init__()
        self.channels = channels
        self.weights = nn.Parameter(torch.zeros(1, channels, 1, 1))
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        noise = torch.randn(1, 1, x.shape[2], x.shape[3])
        x = x + self.weights * noise
        return x

In [3]:
apply_noise = ApplyNoise(512)

In [4]:
tests.check_apply_noise(apply_noise)

Congrats, you successfully implemented a noise injection layer!


## Adaptive instance normalization


The Adaptive instance normalization (AdaIN) is a variation of the **Instance Normalization layer**. In the course, we have discussed about the importance of Batch Normalizations layers. However, we also have seen that in some cases (eg, when using gradient penalties), Batch Normalization is not the preferred type of normalization layer. 

This figure from the [Group Normalization](https://arxiv.org/pdf/1803.08494.pdf) paper helps to understand the differences between the normalization layers. In the figure below, $H$ and $W$ are the spatial dimensions, $C$ the channel dimension and $N$ the batch dimension.


<br>
<img src='assets/normalization_layers.png' width=80% />
<br>

In [**Batch Normalization**](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html), we normalize pixels of the same channel, accross the batch and spatial dimensions.

In [**Layer Normalization**](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html), we normalize pixels of the same batch index, accross the channel and spatial dimensions.

In [**Instance Normalization**](https://pytorch.org/docs/stable/generated/torch.nn.InstanceNorm2d.html), we normalize pixels of the same batch index and channel, accross the spatial dimensions only.

In [**Group Normalization**](https://pytorch.org/docs/stable/generated/torch.nn.GroupNorm.html), we group pixels of the batch index together. 

The AdaIN layer is an extension of the Instance Normalization layer. It takes as input both the output of the previous convolution layer and the latent vector $w$. Then it performs the following:
* map the latent vector $w$ to styles vector $(y_{s}, y_{b})$ through learned affine transformations (fully connected layers).
* calculate the output of the layer using the following equation: $y_{s} * In(x) + y_{b}$ where $In(x)$ is the input $x$ fed through an instance normalization layer.

For the second exercise in this notebook, you will implement the AdaIN layer.

<br>
<br>
<details>
<summary>
<font size="3" color="black">
<b>Click for tips</b>
</font>
</summary>

1. You can use the torch Instance Normalization module for an easier implementation.
2. Test
</details>

In [5]:
class AdaIN(nn.Module):
    """
    Adaptive Instance Normalization layer
    
    args:
    - channels: number of channels of the input
    - w_dim: dimension of the latent vector w
    
    inputs:
    - x: float32 tensor of dim [N, C, H, W]
    - w: float32 tensor of dim [N, W_DIM]
    """
    def __init__(self, channels: int, w_dim: int):
        super(AdaIN, self).__init__()
        self.channels = channels
        self.w_dim = w_dim
        self.instance_norm  = nn.InstanceNorm2d(channels)
        self.linear_s = nn.Linear(w_dim, channels)
        self.linear_b = nn.Linear(w_dim, channels)
        
    def forward(self, x: torch.Tensor, w: torch.Tensor) -> torch.Tensor:
        x = self.instance_norm(x)       
        ys = self.linear_s(w)[..., None, None]
        yb = self.linear_b(w)[..., None, None]
        return x * ys + yb

In [6]:
adain = AdaIN(512, 128)

In [7]:
tests.check_adain(adain)

Congrats, you successfully implemented the AdaIN layer!
