<div style="width: 100%; overflow: hidden;">
    <a href="http://www.uc.pt/fctuc/dei/">
    <div style="display: block;margin-left: auto;margin-right: auto; width: 50%;"><img src="https://eden.dei.uc.pt/~naml/images_ecos/dei25.png"  /></div>
    </a>
</div>

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Subset
from torchvision import datasets, transforms
from torchvision.utils import save_image
import numpy as np
import os
import matplotlib.pyplot as plt
import numpy as np
from torchvision.datasets import ImageFolder
import torchvision
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")

print(device)

In [None]:
# Define the directory where your images are
image_directory = 'data-students\\TRAIN'

# # Define the directory where your images are in Google Drive
# image_directory = '/content/drive/My Drive/your_folder_name/TRAIN'

In [None]:
# Define the directory where your images are
image_directory = 'data-students\\TRAIN'

transform = transforms.Compose([
    transforms.Resize((64,64)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize to [-1, 1]
])

train_dataset = ImageFolder(root=image_directory, transform=transform)


print(train_dataset.class_to_idx)  # Mostra todas as classes e seus índices

# Suponha que você quer apenas a classe cujo nome é 'class_name'
target_class = '6'
target_class_index = train_dataset.class_to_idx[target_class]

# Filtrar índices
target_indices = [i for i, (img, label) in enumerate(train_dataset) if label == target_class_index]

# Criar subset apenas com a classe desejada
target_dataset = Subset(train_dataset, target_indices)

# DataLoader para o subset
train_loader = DataLoader(target_dataset, batch_size=128, shuffle=True)


In [None]:
## show some images from train_loader

for images, labels in train_loader:
    plt.figure(figsize=(16, 8))
    plt.axis("off")
    plt.title("Training Images")
    plt.imshow(np.transpose(torchvision.utils.make_grid(images, nrow=16).cpu(), (1, 2, 0)))
    break

In [None]:
def show_images(images):
    plt.figure(figsize=(8, 8))
    for i in range(images.shape[0]):
        plt.subplot(4, 4, i + 1)
        plt.imshow(np.transpose(images[i].detach().cpu().numpy(), (1, 2, 0)), interpolation='nearest')
        plt.axis('off')
    plt.show()

<h2><font color='#172241'>1. Introduction</font></h2>

In this class we are going to discuss and implement Generative Adversarial Networks (GANs). A GAN combines two deep neural networks: a discriminator D and a generator G. The generator G receives a noise as input and outputs a fake sample, trying to replicate the data distribution used as input for D. The discriminator D receives the real data and fake samples as input, and tries to distinguish between them. These components are trained simultaneously as adversaries, hopefully creating strong generative and discriminative components. Then image bellow presents an overview of the entire process.

<img src="GANDiagram.png">


Over the years several GANs models and architectures have been proposed in the literature [\[1\]](https://arxiv.org/abs/1701.07875)[\[2\]](https://arxiv.org/abs/1912.04958). These models are quite advanced and produce impressive results but they require large ammounts of computational resources.

In our case we are going to implement a Deep Convolutional GAN (DCGAN). This specific model assumes that the Generator and de Discriminator are Deep Convolutional Neural Networks. In particular, the Generator makes use of a [Conv2DTranspose](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html) layer for upsampling the images.

We will create a GAN to replicate images from the CIFAR-10 dataset [\[1\]](https://www.cs.toronto.edu/~kriz/cifar.html)[\[2\]](https://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf). 

The CIFAR-10 dataset is highly used by state-of-the art methods, allowing a comparison between different CNN architectures. The datasets is composed of 60000 32x32 RGB images, where each image is assigned to one of 10 classes: 
- 'airplane', 
- 'automobile'
- 'bird' 
- 'cat' 
- 'deer'
- 'dog'
- 'frog'
- 'horse'
- 'ship'
- 'truck'

The 10 are non-overlapping, meaning that there are exactly 6000 images per class. The dataset is split into train and test sets, with 50000 images used for training and the remaining 10000 images used for testing. The test set contains exactly 1000 randomly-selected images from each class. 


To simplify we will only use images from the class `cat`.

To train a GAN: 
- The generator maps a random vector x,  the latent space, of shaoe (latent_dim,) to images of shape (32,32,3)
- The discriminator network receives images of shape (32,32,3) from the the discriminator and the real dataset, and produces a binary score estimating the probability of each image being real or false.
- The GAN network joins the generator and the discriminator, i.e., GAN = D(G(x)). The GAN uses the generator to map the latent space to the descriminator, which will asssess the realism of the latent vectors as decoded by Generator.
- The Discriminator is trainined on fake and real images
- The generator is trainined using the gradients of the generator's weights with regard to the loss of the GAN model. This means that, at every step, we move the weights of the generator in a direction that will make the discriminator more likely to classify as "real" the images decoded by the generator. I.e. we train the generator to fool the discriminator.


### Some Problem and Some Tricks...
Building and training GANs is extremely difficult. Two common problems regarding training of GANs are the vanishing gradient and mode collapse. The vanishing gradient occurs when the discriminator D became perfect and do not commit mistakes anymore. Hence, the loss function is zeroed, the gradient does not flow through the neural network of the generator, and the GAN progress stagnates. In mode collapse, the generator captures only a small portion of the dataset distribution provided as input to the discriminator. This is not desirable once we want to reproduce the whole distribution of the data.

Over the years some tips and tricks have been proposed to help build GANs. Keep in mind that most of these tricks are just expert knowledge from people that have spent countless hours working on these models. 

Here are a few tricks [\[3\]](https://www.manning.com/books/deep-learning-with-python):
- `tanh` should be used as the last activation in the Generator, instead of `sigmoid
- The latent space is created using a normal distribution (Gaussian distribution), not a uniform distribution
- Stochasticity is good. Since GAN training results in a dynamic equilibrium, GANs are likely to get "stuck" in all sorts of ways. Introducing randomness during training helps prevent this. We introduce randomness in two ways: 1) we use dropout in the discriminator, 2) we add some random noise to the labels for the discriminator.
- Sparse gradients, i.e. when the network does not receive enough signals to adjust its weights, can hinder GAN training. There are two things that can induce gradient sparsity: 1) max pooling operations, 2) ReLU activations. Instead of max pooling, we recommend using strided convolutions for downsampling, and we recommend using a LeakyReLU layer instead of a ReLU activation. It is similar to ReLU but it relaxes sparsity constraints by allowing small negative activation values.
- In generated images, it is common to see "checkerboard artifacts" caused by unequal coverage of the pixel space in the generator. To fix this, we use a kernel size that is divisible by the stride size, whenever we use a strided Conv2DTranpose or Conv2D in both the generator and discriminator.






### The Generator

Lets start by building the generator model. It turns a vector (from the latent space -- during training it will sampled at random) into a candidate image. One of the many issues that commonly arise with GANs is that the generator gets stuck with generated images that look like noise. A possible solution is to use dropout on both the discriminator and generator.

In [None]:
#torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True,
# padding_mode='zeros', device=None, dtype=None)


class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            
            nn.ConvTranspose2d(100, 1024, 4, 1, 0, bias=False),  # Saída: (512, 4, 4)
            nn.BatchNorm2d(1024),
            nn.ReLU(True),
            
            nn.ConvTranspose2d(1024, 512, 4, 2, 1, bias=False),  
            nn.BatchNorm2d(512),
            nn.ReLU(True),

            nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),  # Saída: (256, 8, 8)
            nn.BatchNorm2d(256),
            nn.ReLU(True),

            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),  # Saída: (128, 16, 16)
            nn.BatchNorm2d(128),
            nn.ReLU(True),

            nn.ConvTranspose2d(128, 3, 4, 2, 1, bias=False),     # Saída: (3, 64, 64)
            nn.Tanh()  # Normaliza a saída para [-1, 1]
        )

    def forward(self, input):
        return self.main(input)
    
    # def forward(self, input):
    #     for layer in self.main:
    #         input = layer(input)
    #         print(input.shape)  # print the output shape after each layer
    #     return input


In [None]:
# # PARA VER AS SHAPES DO GENERATOR E AJUSTAR O DISCRIMINATOR
# # # Create an instance of the Generator class
# gen = Generator()

# # Create a random input tensor
# input_tensor = torch.randn(1, 100, 1, 1)

# # Call the forward method
# output = gen.forward(input_tensor)

### The Discriminator

The Discriminator model, that takes as input a candidate image (real or synthetic) and classifies it into one of two classes, either "generated image" or "real image that comes from the training set".

In [None]:
#torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True,
# padding_mode='zeros', device=None, dtype=None)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 128, 4, 2, 1, bias=False),  # Saída: (64, 32, 32)
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),

            nn.Conv2d(128, 256, 4, 2, 1, bias=False),  # Saída: (128, 16, 16)
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),

            nn.Conv2d(256, 512, 4, 2, 1, bias=False),  # Saída: (256, 8, 8)
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),

            nn.Conv2d(512, 1024, 4, 2, 1, bias=False),  # Saída: (512, 4, 4)
            nn.BatchNorm2d(1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),

            nn.Conv2d(1024, 1, 4, 1, 0, bias=False),    # Saída: (1, 1, 1)
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.model(input).view(-1, 1).squeeze(1)

    # def forward(self, input):
    #     print(f'Input shape: {input.shape}')  # print the shape of the input
        
    #     for layer in self.model:
    #         input = layer(input)
    #         print(f'Output shape: {input.shape}')  # print the output shape after each layer
            
    #     final_output = input.view(-1, 1).squeeze(1)
    #     print(f'Final output shape: {final_output.shape}')  # print the shape of the final output
    #     return final_output


In [None]:
# # Create an instance of the Discriminator class
# disc = Discriminator()

# # Create a random input tensor that matches the output shape of the generator, last value should be the last from generator (x, y, Z, Z)
# input_tensor = torch.randn(128, 3, 64, 64)

# # Call the forward method
# output = disc.forward(input_tensor)

# # result should be batch_size, 1, 1, 1

### The Adversarial Network
Finally, we need to chaning the Generator and the Discriminator, i.e., create the GAN. 
It will turn latent space points into a classification decision, `fake` or `real`, and it is meant to be trained with labels that are always "these are real images". So training the GAN will update the weights of generator in a way that makes discriminator more likely to predict "real" when looking at fake images. Very importantly, we set the discriminator to be frozen during training (non-trainable): its weights will not be updated when training gan. If the discriminator weights could be updated during this process, then we would be training the discriminator to always predict "real", which is not what we want.

In [None]:
netG = Generator().to(device)
netD = Discriminator().to(device)

optimizerD = optim.Adam(netD.parameters(), lr=0.0001, betas=(0.5, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=0.0001, betas=(0.5, 0.999))

### How to Train the DCGAN
- Load the Dataset
- Get all the images for the cat class
- Normalize the images do all pixels are between 0 and 1
- Define:
    - Maximum number of iterations
    - Batch size

- for each batch in epoch:
    for k steps:
        
        1 - Randomly generated a sample from the latent space using a normal distribution with size=(batch_size, latent_dim).
    
        2 - Generate images with `Generator` using the latent space defined in the previous step (1.).
        
        3 - Combine the generated images from point 2 with real images from the training dataset.
        
        4 - Train the `Discriminator` using the combine batch of images, with corresponding targets, either "real" or "fake".

    5 - Generate sample from the latent space latent space using a normal distribution with size=(batch_size, latent_dim). 

    6 -Train the `Generator` using the samples from 5. the generated images labelled as  "real". This will update the weights of the `Generator` only to move them towards getting the `Discriminator` to predict "these are real images" for generated images, i.e. this trains the generator to fool the discriminator.


In [None]:
criterion = nn.BCELoss()

real_label = 0.9
fake_label = 0.1
num_epochs = 1000

for epoch in range(num_epochs):
    for i, (data, _) in enumerate(train_loader):

        netD.zero_grad()
        real_cpu = data.to(device)
        batch_size = real_cpu.size(0)
        label = torch.full((batch_size,), real_label, dtype=torch.float, device=device)

        output = netD(real_cpu)
        errD_real = criterion(output, label)
        errD_real.backward()
        D_x = output.mean().item()

        noise = torch.randn(batch_size, 100, 1, 1, device=device)
        fake = netG(noise)
        label.fill_(fake_label)

        output = netD(fake.detach())

        errD_fake = criterion(output, label)
        errD_fake.backward()
        D_G_z1 = output.mean().item()

        errD = errD_real + errD_fake
        optimizerD.step()

        netG.zero_grad()
        noise = torch.randn(batch_size, 100, 1, 1, device=device)
        fake = netG(noise)
        label.fill_(real_label)  # fake labels are real for generator cost
        output = netD(fake)
        errG = criterion(output, label)
        errG.backward()
        D_G_z2 = output.mean().item()
        optimizerG.step()

        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(train_loader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

    # Save outputs


print("Training complete.")

In [None]:
output.shape

In [None]:
real_cpu.shape

In [None]:
data.shape

In [None]:
torch.save(netG.state_dict(), './netG.pth')
torch.save(netD.state_dict(), './netD.pth')

### Generate images using  the trained generator

In [None]:
def generate_images(generator, num_images):
    with torch.no_grad():  # Temporarily set all the requires_grad flag to false
        noise = torch.randn(num_images, 100, 1, 1, device=device)  # 100 is the size of the noise vector
        generated_images = generator(noise)
        generated_images = (generated_images + 1) / 2  # Rescale images from [-1, 1] to [0, 1]
        return generated_images



In [None]:
images = generate_images(netG, 16)  # Generate 16 images
show_images(images)