<div style="text-align:center;font-size:1.5em">

**Authors: Anastasiia Karpova, Valentin Abribat, William Liaw**

Academic report presented to Télécom Paris as an activity of the course Photographie computationelle / Méthodes par patchs (IMA206).

**Palaiseau**
</div>

**ABSTRACT**

This document presents the codebase and accompanying documentation for Project 8 of Group 11, focusing on the re-implementation of the SinGAN architecture. SinGAN, originally introduced by Shaham, Dekel, and Michaeli (2019), is a generative model designed to learn from a single natural image. The re-implementation is based on:

    Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a Generative Model from a Single Natural Image. arXiv. https://doi.org/10.48550/arxiv.1905.01164

and its official source code available at:

    Tamarott. (n.d.). GitHub - tamarott/SinGAN: Official PyTorch implementation of the paper: “SinGAN: Learning a Generative Model from a Single Natural Image.” GitHub. https://github.com/tamarott/SinGAN

Additionally, we conducted various experiments, focusing primarily on evaluating different padding functions for images and optimizing the cost function for random sample generation by the generator.

**TABLE OF CONTENTS**

1. Modules imports
2. Hyperparameter definitions
   1. Workspace
   2. Load, input, save configurations
   3. Networks
   4. Pyramid
   5. Optimization
   6. Fréchet Induction Distance
3. Auxiliary functions
4. SinGAN
5. Models
6. Training
   1. Training functions
   2. Main
7. Random Samples
   1. Main
8.  SIFID
    1.  SIFID functions
    2.  Main

**DISCLAIMER**

The sections *Modules imports*, *Hyperparameter definitions*, *Auxiliary functions*, *SinGAN*, *Models* should be executed at the beginning of every execution of the present work, otherwise the code may yield errors or unexpected behaviors. Appart from those sections, the following sections can be executed independently:

- Training: trains a model
- Random Samples: generates random samples using the architecture in a directory (by default: `TrainedModels`)
- SIFID: using the SIFID metric, evaluates fake images in a given directory (by default `Output/RandomSamples/<img_name>/gen_start_scale=X`) with respects to a real image (by default: `Input/Images/birds.png`)

# Modules imports

Throughout all the methodology of the present academic work, the following modules were used:

In [None]:
import math
import os
import pathlib
import random

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from matplotlib.pyplot import imread
from torchvision import models
from torchvision.io import read_image
from torchvision.transforms import v2
from tqdm import tqdm
from torchaudio.functional import frechet_distance

%matplotlib inline

# Hyperparameter definitions

During this code implementation, the following hyperparameters were required:

## Workspace

In [None]:
not_cuda = False  # disables cuda

In [None]:
device = torch.device("cpu" if not_cuda else "cuda:0")

## Load, input, save configurations

In [None]:
netG_path = ""  # path to netG (to continue training)
netD_path = ""  # path to netD (to continue training)
manualSeed = None  # manual seed
nc_z = 3  # noise # channels
nc_im = 3  # image # channels
out = "Output"  # output folder

In [None]:
if manualSeed is None:
    manualSeed = random.randint(1, 10000)
random.seed(manualSeed)
torch.manual_seed(manualSeed)

print("Random Seed: ", manualSeed)

## Networks

In [None]:
nfc = 32  # number of feature components
min_nfc = 32  # number of minimal feature components
kernel_size = 3  # kernel size
num_layer = 5  # number of layers
stride = 1  # stride
padd_size = 0  # net pad size

## Pyramid

In [None]:
# Image is resized if it's bigger than max_size
# Pyramid of images scales: 1, scale_factor, scale_factor^2, ... scale_factor^n
scale_factor = 0.75  # pyramid scale factor
noise_amp_init = 0.1  # addative noise cont weight
min_size = 25  # image minimal size at the smallest scale
max_size = 250  # image maximal size at the biggest scale

## Optimization

In [None]:
niter = 1000  # number of epochs to train per scale
gamma = 0.1  # scheduler gamma
lr_g = 0.0005  # learning rate
lr_d = 0.0005  # learning rate
beta1 = 0.5  # beta1 for adam
Gsteps = 1  # Generator inner steps
Dsteps = 3  # Discriminator inner steps
lambda_grad = 0.1  # gradient penelty weight
alpha = 10  # reconstruction loss weight

## Fréchet Induction Distance

In [None]:
patch_size = 11  # patch size

# Auxiliary functions

The following functions were utilized to perform various tasks:

In [None]:
def descale(x):
    """
    Descale a tensor from the range [-1, 1] to [0, 1].
    """
    out = (x + 1) / 2
    return out.clamp(0, 1)


def scale(x):
    """
    Scale a tensor from the range [0, 1] to [-1, 1].
    """
    out = (x - 0.5) * 2
    return out.clamp(-1, 1)


def convert_image_np(img):
    """
    Convert a PyTorch tensor to a NumPy array with values in the range [0, 1].
    """
    return np.clip(descale(img).squeeze(0).cpu().numpy().transpose(1, 2, 0), 0, 1)


def calc_gradient_penalty(netD, real_data, fake_data, lambda_, device):
    """
    Calculate the gradient penalty for WGAN-GP
    """
    alpha = torch.rand(1, 1)
    alpha = alpha.expand(real_data.size())
    alpha = alpha.to(device)

    interpolates = alpha * real_data + ((1 - alpha) * fake_data)

    interpolates = interpolates.to(device)
    interpolates = torch.autograd.Variable(interpolates, requires_grad=True)

    disc_interpolates = netD(interpolates)

    gradients = torch.autograd.grad(
        outputs=disc_interpolates,
        inputs=interpolates,
        grad_outputs=torch.ones(disc_interpolates.size()).to(device),
        create_graph=True,
        retain_graph=True,
        only_inputs=True,
    )[0]

    gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean() * lambda_
    return gradient_penalty


def upsampling(im, sx, sy):
    """
    Upsample an image to the specified size using bilinear interpolation.
    """
    m = nn.Upsample(size=[round(sx), round(sy)], mode="bilinear", align_corners=True)
    return m(im)


def get_mu_sigma(sample, device, patch_size=patch_size):
    """
    Calculate the mean and covariance of patches from an image tensor.
    """
    P = (
        sample.squeeze(0)
        .permute(1, 2, 0)
        .unfold(0, patch_size, 1)
        .unfold(1, patch_size, 1)
    )
    P = torch.reshape(P, (-1, 3 * patch_size * patch_size)).T

    mu = torch.mean(P, dim=1).to(device)
    sigma = torch.cov(P).to(device)

    return mu, sigma


def get_mean_border(image):
    """
    Calculate the mean pixel value of the borders of an image.
    """
    top_border = image[:, 0, :]
    bottom_border = image[:, -1, :]
    left_border = image[:, :, 0]
    right_border = image[:, :, -1]

    border_pixels = torch.cat(
        [top_border, left_border, right_border, bottom_border], dim=1
    )

    return border_pixels.mean()


def get_padding(mode_pad, pad_image, pad_noise, image=None):
    """
    Get padding layers for noise and image based on the specified mode.
    """
    if mode_pad == 0:
        m_noise = nn.ZeroPad2d(pad_noise)
        m_image = nn.ZeroPad2d(pad_image)
    elif mode_pad == 1:
        pad_const = get_mean_border(image.squeeze(0))
        m_noise = nn.ConstantPad2d(pad_noise, pad_const)
        m_image = nn.ConstantPad2d(pad_image, pad_const)
    elif mode_pad == 2:
        m_noise = nn.ReplicationPad2d(pad_noise)
        m_image = nn.ReplicationPad2d(pad_image)
    elif mode_pad == 3:
        m_noise = nn.CircularPad2d(pad_noise)
        m_image = nn.CircularPad2d(pad_image)
    return m_noise, m_image


def get_padding_singan(mode_pad, pad_image, image=None):
    """
    Get padding layers for image based on the specified mode.
    """
    if mode_pad == 0:
        m_image = nn.ZeroPad2d(pad_image)
    elif mode_pad == 1:
        pad_const = get_mean_border(image.squeeze(0))
        m_image = nn.ConstantPad2d(pad_image, pad_const)
    elif mode_pad == 2:
        m_image = nn.ReplicationPad2d(pad_image)
    elif mode_pad == 3:
        m_image = nn.CircularPad2d(pad_image)
    return m_image

# SinGAN

In [None]:
def SinGAN_generate(
    Gs,
    Zs,
    reals,
    NoiseAmp,
    mode,
    input_name,
    in_s=None,
    scale_v=1,
    scale_h=1,
    n=0,
    gen_start_scale=0,
    num_samples=500,
    mode_pad=0,
):
    """
    This function generates fake random samples using the SinGAN architecture
    """
    if in_s is None:
        in_s = torch.zeros(reals[0].shape, device=device)
    images_cur = []
    for G, Z_opt, noise_amp in zip(Gs, Zs, NoiseAmp):
        pad1 = int(((kernel_size - 1) * num_layer) / 2)
        m = get_padding_singan(mode_pad, pad1, reals[0])
        nzx = (Z_opt.shape[2] - pad1 * 2) * scale_v
        nzy = (Z_opt.shape[3] - pad1 * 2) * scale_h

        images_prev = images_cur
        images_cur = []

        for i in range(0, num_samples, 1):
            if n == 0:
                z_curr = torch.randn(
                    1,
                    1,
                    round(nzx),
                    round(nzy),
                    device=device,
                )
                z_curr = z_curr.expand(1, 3, z_curr.shape[2], z_curr.shape[3])
            else:
                z_curr = torch.randn(
                    1,
                    nc_z,
                    round(nzx),
                    round(nzy),
                    device=device,
                )

            z_curr = m(z_curr)

            if images_prev == []:
                I_prev = m(in_s)
            else:
                I_prev = images_prev[i]
                I_prev = v2.functional.resize(
                    I_prev,
                    (
                        round(nzx),
                        round(nzy),
                    ),
                )
                I_prev = m(I_prev)

            if n < gen_start_scale:
                z_curr = Z_opt

            z_in = noise_amp * z_curr + I_prev
            I_curr = G(z_in.detach(), I_prev)

            if n == len(reals) - 1:
                if mode == "train" or mode == "random_samples":
                    dir_to_save = f"{out}/RandomSamples/{input_name[:-4]}/gen_start_scale={gen_start_scale}"
                elif mode == "random_samples_arbitraty_size":
                    dir_to_save = f"{out}/RandomSamples_ArbitrerySizes/{input_name[:-4]}/scale_v={scale_v}_scale_h={scale_h}"

                try:
                    os.makedirs(dir_to_save)
                except OSError:
                    pass

                plt.imsave(
                    f"{dir_to_save}/{i}.png",
                    convert_image_np(I_curr.detach()),
                    vmin=0,
                    vmax=1,
                )
            images_cur.append(I_curr)
        n += 1

# Models

In [None]:
class ConvBlock(nn.Sequential):
    """
    Convolutional Block from SinGAN
    """
    def __init__(self, in_channel, out_channel, kernel_size, padd, stride):
        super(ConvBlock, self).__init__()
        self.add_module(
            "conv",
            nn.Conv2d(
                in_channel,
                out_channel,
                kernel_size=kernel_size,
                padding=padd,
                stride=stride,
            ),
        ),
        self.add_module("norm", nn.BatchNorm2d(out_channel)),
        self.add_module("LeakyRelu", nn.LeakyReLU(0.2, inplace=True))

In [None]:
class Generator(nn.Module):
    """
    Generator Block from SinGAN
    """
    def __init__(self, nc_im, nfc, min_nfc, kernel_size, padd_size, num_layer):
        super(Generator, self).__init__()
        # Receives an "image"
        self.head = ConvBlock(nc_im, nfc, kernel_size, padd_size, 1)
        self.body = nn.Sequential()
        for i in range(num_layer - 2):
            # Each convolutional block halves the number of channels
            N = math.floor(nfc / (2 ** (i + 1)))
            block = ConvBlock(
                max(2 * N, min_nfc),
                max(N, min_nfc),
                kernel_size,
                padd_size,
                1,
            )
            self.body.add_module(f"block{i + 1}", block)
        # Outputs an image
        self.tail = nn.Sequential(
            nn.Conv2d(
                max(N, min_nfc),
                nc_im,
                kernel_size=kernel_size,
                stride=1,
                padding=padd_size,
            ),
            nn.Tanh(),
        )

    def forward(self, x, y):
        x = self.head(x)
        x = self.body(x)
        x = self.tail(x)
        ind = int((y.shape[2] - x.shape[2]) / 2)
        y = y[:, :, ind : (y.shape[2] - ind), ind : (y.shape[3] - ind)]
        return x + y

In [None]:
class Discriminator(nn.Module):
    """
    Discriminator Block from SinGAN
    """
    def __init__(self, nc_im, nfc, min_nfc, kernel_size, padd_size, num_layer):
        super(Discriminator, self).__init__()
        # Receives an "image"
        self.head = ConvBlock(nc_im, nfc, kernel_size, padd_size, 1)
        self.body = nn.Sequential()
        for i in range(num_layer - 2):
            # Each convolutional block halves the number of channels
            N = nfc // (2 ** (i + 1))
            block = ConvBlock(
                max(2 * N, min_nfc),
                max(N, min_nfc),
                kernel_size,
                padd_size,
                1,
            )
            self.body.add_module(f"block{i + 1}", block)
        # Outputs a label
        self.tail = nn.Conv2d(
            max(N, min_nfc),
            1,
            kernel_size=kernel_size,
            stride=1,
            padding=padd_size,
        )

    def forward(self, x):
        x = self.head(x)
        x = self.body(x)
        x = self.tail(x)
        return x

In [None]:
def weights_init(m):
    """
    Performs weights initialization on a given model
    """
    classname = m.__class__.__name__
    if classname.find("Conv2d") != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find("Norm") != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)


def init_models(
    netG_path,
    netD_path,
    nc_im,
    nfc,
    min_nfc,
    kernel_size,
    padd_size,
    num_layer,
    device,
    verbose=False,
):
    """
    Initialize models for the SinGAN architecture
    """
    netG = Generator(nc_im, nfc, min_nfc, kernel_size, padd_size, num_layer).to(device)
    netG.apply(weights_init)
    if netG_path != "":
        netG.load_state_dict(torch.load(netG_path))
    netD = Discriminator(nc_im, nfc, min_nfc, kernel_size, padd_size, num_layer).to(
        device
    )
    netD.apply(weights_init)
    if netD_path != "":
        netD.load_state_dict(torch.load(netD_path))
    if verbose:
        print(netG)
        print(netD)
    return netD, netG

# Training

## Training functions

In [None]:
def draw_concat(
    Gs, Zs, reals, NoiseAmp, in_s, mode, nc_z, m_noise, m_image, scale_factor
):
    """
    Progressively enlarges, adds noise and introduces a given image to Generators
    in a concatenated fashion
    """
    G_z = in_s
    if len(Gs) > 0:
        if mode == "rand":
            count = 0
            pad_noise = int(((kernel_size - 1) * num_layer) / 2)
            for G, Z_opt, real_curr, real_next, noise_amp in zip(
                Gs, Zs, reals, reals[1:], NoiseAmp
            ):
                if count == 0:
                    z = torch.randn(
                        1,
                        1,
                        Z_opt.shape[2] - 2 * pad_noise,
                        Z_opt.shape[3] - 2 * pad_noise,
                        device=device,
                    )
                    z = z.expand(1, 3, z.shape[2], z.shape[3])
                else:
                    z = torch.randn(
                        1,
                        nc_z,
                        Z_opt.shape[2] - 2 * pad_noise,
                        Z_opt.shape[3] - 2 * pad_noise,
                        device=device,
                    )
                z = m_noise(z)
                G_z = G_z[:, :, 0 : real_curr.shape[2], 0 : real_curr.shape[3]]
                G_z = m_image(G_z)
                z_in = noise_amp * z + G_z
                G_z = G(z_in.detach(), G_z)
                G_z = v2.functional.resize(
                    G_z,
                    (
                        math.ceil(G_z.shape[2] / scale_factor),
                        math.ceil(G_z.shape[3] / scale_factor),
                    ),
                )
                G_z = G_z[:, :, 0 : real_next.shape[2], 0 : real_next.shape[3]]
                count += 1
        if mode == "rec":
            count = 0
            for G, Z_opt, real_curr, real_next, noise_amp in zip(
                Gs, Zs, reals, reals[1:], NoiseAmp
            ):
                G_z = G_z[:, :, 0 : real_curr.shape[2], 0 : real_curr.shape[3]]
                G_z = m_image(G_z)
                z_in = noise_amp * Z_opt + G_z
                G_z = G(z_in.detach(), G_z)
                G_z = v2.functional.resize(
                    G_z,
                    (
                        math.ceil(G_z.shape[2] / scale_factor),
                        math.ceil(G_z.shape[3] / scale_factor),
                    ),
                )
                G_z = G_z[:, :, 0 : real_next.shape[2], 0 : real_next.shape[3]]
                count += 1

    return G_z

In [None]:
def train_single_scale(
    netD,
    netG,
    reals,
    Gs,
    Zs,
    in_s,
    NoiseAmp,
    kernel_size,
    num_layer,
    stride,
    alpha,
    nc_z,
    lr_d,
    lr_g,
    beta1,
    scale_num,
    noise_amp_init,
    scale_factor,
    mode_pad,
    dir_to_save,
    draw_plots=True,
):
    """
    Trains a single step (scale) of the SinGAN pyramid, with a WGAN with
    Gradient Penalty fashion
    """
    real = reals[len(Gs)]
    nzx = real.shape[2]
    nzy = real.shape[3]
    pad_noise = round((kernel_size - 1) * num_layer / 2)
    pad_image = round((kernel_size - 1) * num_layer / 2)
    m_noise, m_image = get_padding(mode_pad, pad_image, pad_noise, real)

    z_opt = torch.zeros((1, nc_z, nzx, nzy), device=device)
    z_opt = m_noise(z_opt)

    # setup optimizer
    optimizerD = optim.Adam(netD.parameters(), lr=lr_d, betas=(beta1, 0.999))
    optimizerG = optim.Adam(netG.parameters(), lr=lr_g, betas=(beta1, 0.999))
    schedulerD = optim.lr_scheduler.MultiStepLR(
        optimizer=optimizerD, milestones=[1600], gamma=gamma
    )
    schedulerG = optim.lr_scheduler.MultiStepLR(
        optimizer=optimizerG, milestones=[1600], gamma=gamma
    )

    criterion = nn.MSELoss()
    loss = nn.MSELoss()

    errD2plot = []
    errG2plot = []
    D_real2plot = []
    D_fake2plot = []
    z_opt2plot = []
    xplot = []
    for epoch in (pbar := tqdm(range(niter))):
        pbar.set_postfix_str(f"scale: {scale_num}")
        if Gs == []:
            z_opt = torch.randn(1, 1, nzx, nzy, device=device)
            z_opt = m_noise(z_opt.expand(1, 3, nzx, nzy))
            noise_ = torch.randn(1, 1, nzx, nzy, device=device)
            noise_ = m_noise(noise_.expand(1, 3, nzx, nzy))
        else:
            noise_ = torch.randn(1, nc_z, nzx, nzy, device=device)
            noise_ = m_noise(noise_)

        # (1) Update D network: maximize D(x) + D(G(z))
        for j in range(Dsteps):
            # train with real
            optimizerD.zero_grad()
            output = netD(real)
            errD_real = -output.mean()
            errD_real.backward(retain_graph=True)
            D_x = -errD_real.item()

            # train with fake
            if j == 0 and epoch == 0:
                if Gs == []:
                    prev = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    in_s = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    prev = m_image(prev)
                    z_prev = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    z_prev = m_noise(z_prev)
                    noise_amp = 1
                else:
                    prev = draw_concat(
                        Gs,
                        Zs,
                        reals,
                        NoiseAmp,
                        in_s,
                        "rand",
                        nc_z,
                        m_noise,
                        m_image,
                        scale_factor,
                    )
                    prev = m_image(prev)
                    z_prev = draw_concat(
                        Gs,
                        Zs,
                        reals,
                        NoiseAmp,
                        in_s,
                        "rec",
                        nc_z,
                        m_noise,
                        m_image,
                        scale_factor,
                    )
                    RMSE = torch.sqrt(criterion(real, z_prev))
                    noise_amp = noise_amp_init * RMSE
                    z_prev = m_image(z_prev)
            else:
                prev = draw_concat(
                    Gs,
                    Zs,
                    reals,
                    NoiseAmp,
                    in_s,
                    "rand",
                    nc_z,
                    m_noise,
                    m_image,
                    scale_factor,
                )
                prev = m_image(prev)

            if Gs == []:
                noise = noise_
            else:
                noise = noise_amp * noise_ + prev

            fake = netG(noise.detach(), prev)
            output = netD(fake.detach())
            errD_fake = output.mean()
            errD_fake.backward(retain_graph=True)
            D_G_z = output.mean().item()

            gradient_penalty = calc_gradient_penalty(
                netD, real, fake, lambda_grad, device
            )
            gradient_penalty.backward()

            errD = errD_real + errD_fake + gradient_penalty
            optimizerD.step()

        # (2) Update G network: maximize D(G(z))
        for j in range(Gsteps):
            optimizerG.zero_grad()
            if j == 0 and epoch == 0:
                if Gs == []:
                    prev = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    in_s = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    prev = m_image(prev)
                    noise_amp = 1
                else:
                    prev = draw_concat(
                        Gs,
                        Zs,
                        reals,
                        NoiseAmp,
                        in_s,
                        "rand",
                        nc_z,
                        m_noise,
                        m_image,
                        scale_factor,
                    )
                    prev = m_image(prev)
                    z_prev = draw_concat(
                        Gs,
                        Zs,
                        reals,
                        NoiseAmp,
                        in_s,
                        "rec",
                        nc_z,
                        m_noise,
                        m_image,
                        scale_factor,
                    )
                    RMSE = torch.sqrt(criterion(real, z_prev))
                    noise_amp = noise_amp_init * RMSE
                    z_prev = m_image(z_prev)

            if Gs == []:
                noise = noise_
            else:
                noise = noise_amp * noise_ + prev

            fake = netG(noise.detach(), prev)
            output = netD(fake)
            errG = -output.mean()
            errG.backward(retain_graph=True)
            if alpha != 0:
                Z_opt = noise_amp * z_opt + z_prev
                rec_loss = alpha * loss(netG(Z_opt.detach(), z_prev), real)
                rec_loss.backward(retain_graph=True)
                rec_loss = rec_loss.detach()
            else:
                Z_opt = z_opt
                rec_loss = 0

            optimizerG.step()

        if epoch % round(niter / 50) == 0 or epoch == (niter - 1):
            xplot.append(epoch)
            errD2plot.append(errD.detach().cpu())
            errG2plot.append((errG.detach() + rec_loss).cpu())
            D_real2plot.append(D_x)
            D_fake2plot.append(D_G_z)
            z_opt2plot.append(rec_loss.cpu())

        if epoch % round(niter / 4) == 0 or epoch == (niter - 1):
            plt.imsave(
                f"{dir_to_save}/{scale_num}/fake_sample.png",
                convert_image_np(fake.detach()),
                vmin=0,
                vmax=1,
            )
            plt.imsave(
                f"{dir_to_save}/{scale_num}/G(z_opt).png",
                convert_image_np(netG(Z_opt.detach(), z_prev).detach()),
                vmin=0,
                vmax=1,
            )

            torch.save(z_opt, f"{dir_to_save}/{scale_num}/z_opt.pt")

        schedulerD.step()
        schedulerG.step()

    if draw_plots:
        fig = plt.figure(figsize=(10, 6), dpi=80)

        plt.title(f"Discriminator loss - scale {scale_num}", fontsize=22)

        plt.plot(xplot, errD2plot, label="Discriminator loss")
        plt.plot(xplot, D_real2plot, label="Discriminator loss (real)")
        plt.plot(xplot, D_fake2plot, label="Discriminator loss (fake)")

        plt.gca().spines[["top", "right"]].set_alpha(0)
        plt.gca().spines[["bottom", "left"]].set_alpha(0.3)
        plt.grid(alpha=0.3)
        plt.xticks(fontsize=12)
        plt.yticks(fontsize=12)
        plt.legend()
        plt.savefig(f"{dir_to_save}/{scale_num}/discriminator_loss_plot.png")
        plt.show()

        fig = plt.figure(figsize=(10, 6), dpi=80)

        plt.title(f"Generator loss - scale {scale_num}", fontsize=22)

        plt.plot(xplot, errG2plot, label="Generator loss")
        plt.plot(xplot, z_opt2plot, label="Reconstruction loss")

        plt.gca().spines[["top", "right"]].set_alpha(0)
        plt.gca().spines[["bottom", "left"]].set_alpha(0.3)
        plt.grid(alpha=0.3)
        plt.xticks(fontsize=12)
        plt.yticks(fontsize=12)
        plt.legend()
        plt.savefig(f"{dir_to_save}/{scale_num}/generator_loss_plot.png")
        plt.show()

        fig, axs = plt.subplots(1, 3, figsize=(15, 5), dpi=80)

        axs[0].imshow(convert_image_np(fake.detach()))
        axs[0].set_title("Fake sample")
        axs[0].axis("off")

        axs[1].imshow(convert_image_np(netG(Z_opt.detach(), z_prev).detach()))
        axs[1].set_title("G(z_opt)")
        axs[1].axis("off")

        axs[2].imshow(convert_image_np(real))
        axs[2].set_title("Real")
        axs[2].axis("off")

        plt.tight_layout()
        plt.savefig(f"{dir_to_save}/{scale_num}/imgs_plot.png")
        plt.show()

    torch.save(netG.state_dict(), f"{dir_to_save}/{scale_num}/netG.pt")
    torch.save(netD.state_dict(), f"{dir_to_save}/{scale_num}/netD.pt")
    torch.save(z_opt, f"{dir_to_save}/{scale_num}/z_opt.pt")

    return z_opt, in_s, netG, noise_amp

In [None]:
def train_single_scale_frechet_distance(
    netG,
    reals,
    Gs,
    Zs,
    in_s,
    NoiseAmp,
    kernel_size,
    num_layer,
    stride,
    alpha,
    nc_z,
    lr_d,
    lr_g,
    beta1,
    scale_num,
    noise_amp_init,
    scale_factor,
    mode_pad,
    dir_to_save,
    draw_plots=True,
):
    """
    Trains a single step (scale) of the SinGAN pyramid, with a Fréchet
    Distance from the patch distribution as a cost function
    """
    real = reals[len(Gs)]
    nzx = real.shape[2]
    nzy = real.shape[3]
    pad_noise = round((kernel_size - 1) * num_layer / 2)
    pad_image = round((kernel_size - 1) * num_layer / 2)
    m_noise, m_image = get_padding(mode_pad, pad_image, pad_noise, real)

    z_opt = torch.zeros((1, nc_z, nzx, nzy), device=device)
    z_opt = m_noise(z_opt)

    # setup optimizer
    optimizerG = optim.Adam(netG.parameters(), lr=lr_g, betas=(beta1, 0.999))
    schedulerG = optim.lr_scheduler.MultiStepLR(
        optimizer=optimizerG, milestones=[1600], gamma=gamma
    )

    criterion = nn.MSELoss()
    loss = nn.MSELoss()

    errG2plot = []
    z_opt2plot = []
    xplot = []
    for epoch in (pbar := tqdm(range(niter))):
        pbar.set_postfix_str(f"scale: {scale_num}")
        if Gs == []:
            z_opt = torch.randn(1, 1, nzx, nzy, device=device)
            z_opt = m_noise(z_opt.expand(1, 3, nzx, nzy))
            noise_ = torch.randn(1, 1, nzx, nzy, device=device)
            noise_ = m_noise(noise_.expand(1, 3, nzx, nzy))
        else:
            noise_ = torch.randn(1, nc_z, nzx, nzy, device=device)
            noise_ = m_noise(noise_)

        # Update G network: maximize FID(G(z))
        for j in range(Gsteps):
            optimizerG.zero_grad()
            if j == 0 and epoch == 0:
                if Gs == []:
                    prev = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    in_s = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    prev = m_image(prev)
                    z_prev = torch.zeros((1, nc_z, nzx, nzy), device=device)
                    z_prev = m_noise(z_prev)
                    noise_amp = 1
                else:
                    prev = draw_concat(
                        Gs,
                        Zs,
                        reals,
                        NoiseAmp,
                        in_s,
                        "rand",
                        nc_z,
                        m_noise,
                        m_image,
                        scale_factor,
                    )
                    prev = m_image(prev)
                    z_prev = draw_concat(
                        Gs,
                        Zs,
                        reals,
                        NoiseAmp,
                        in_s,
                        "rec",
                        nc_z,
                        m_noise,
                        m_image,
                        scale_factor,
                    )
                    RMSE = torch.sqrt(criterion(real, z_prev))
                    noise_amp = noise_amp_init * RMSE
                    z_prev = m_image(z_prev)

            if Gs == []:
                noise = noise_
            else:
                noise = noise_amp * noise_ + prev

            fake = netG(noise.detach(), prev)
            mu_real, sigma_real = get_mu_sigma(real, device)
            mu_fake, sigma_fake = get_mu_sigma(fake, device)
            errG = frechet_distance(mu_real, sigma_real, mu_fake, sigma_fake)
            errG.backward(retain_graph=True)
            if alpha != 0:
                Z_opt = noise_amp * z_opt + z_prev
                rec_loss = alpha * loss(netG(Z_opt.detach(), z_prev), real)
                rec_loss.backward(retain_graph=True)
                rec_loss = rec_loss.detach()
            else:
                Z_opt = z_opt
                rec_loss = 0

            optimizerG.step()

        if epoch % round(niter / 50) == 0 or epoch == (niter - 1):
            xplot.append(epoch)
            errG2plot.append((errG.detach() + rec_loss).cpu())
            z_opt2plot.append(rec_loss.cpu())

        if epoch % round(niter / 4) == 0 or epoch == (niter - 1):
            plt.imsave(
                f"{dir_to_save}/{scale_num}/fake_sample.png",
                convert_image_np(fake.detach()),
                vmin=0,
                vmax=1,
            )
            plt.imsave(
                f"{dir_to_save}/{scale_num}/G(z_opt).png",
                convert_image_np(netG(Z_opt.detach(), z_prev).detach()),
                vmin=0,
                vmax=1,
            )

            torch.save(z_opt, f"{dir_to_save}/{scale_num}/z_opt.pt")

        schedulerG.step()

    if draw_plots:
        fig = plt.figure(figsize=(10, 6), dpi=80)

        plt.title(f"Generator loss - scale {scale_num}", fontsize=22)

        plt.plot(xplot, errG2plot, label="Generator loss")
        plt.plot(xplot, z_opt2plot, label="Reconstruction loss")

        plt.gca().spines[["top", "right"]].set_alpha(0)
        plt.gca().spines[["bottom", "left"]].set_alpha(0.3)
        plt.grid(alpha=0.3)
        plt.xticks(fontsize=12)
        plt.yticks(fontsize=12)
        plt.legend()
        plt.savefig(f"{dir_to_save}/{scale_num}/generator_loss_plot.png")
        plt.show()

        fig, axs = plt.subplots(1, 3, figsize=(15, 5), dpi=80)

        axs[0].imshow(convert_image_np(fake.detach()))
        axs[0].set_title("Fake sample")
        axs[0].axis("off")

        axs[1].imshow(convert_image_np(netG(Z_opt.detach(), z_prev).detach()))
        axs[1].set_title("G(z_opt)")
        axs[1].axis("off")

        axs[2].imshow(convert_image_np(real))
        axs[2].set_title("Real")
        axs[2].axis("off")

        plt.tight_layout()
        plt.savefig(f"{dir_to_save}/{scale_num}/imgs_plot.png")
        plt.show()

    torch.save(netG.state_dict(), f"{dir_to_save}/{scale_num}/netG.pt")
    torch.save(z_opt, f"{dir_to_save}/{scale_num}/z_opt.pt")

    return z_opt, in_s, netG, noise_amp

In [None]:
def train(
    real,
    stop_scale,
    scale_factor,
    nfc,
    min_nfc,
    dir_to_save,
    netG_path,
    netD_path,
    noise_amp_init,
    device,
    frechet=True,
    mode_pad=0,
):
    """
    Performs de training of SinGAN
    """
    Gs = []
    Zs = []
    reals = []
    NoiseAmp = []
    in_s = 0
    scale_num = 0
    nfc_prev = 0

    # Create pyramid
    scales = np.logspace(
        start=stop_scale, stop=0, num=stop_scale + 1, base=scale_factor
    )
    for scale in scales:
        curr_real = v2.functional.resize(
            real, (math.ceil(scale * real.shape[2]), math.ceil(scale * real.shape[3]))
        )
        reals.append(curr_real)

    # Iterating on the pyramid
    for scale_num, curr_real in enumerate(reals):
        # Every 4 steps on the pyramid the number of features components doubles
        nfc = min(nfc * (2 ** math.floor(scale_num / 4)), 128)
        min_nfc = min(min_nfc * (2 ** math.floor(scale_num / 4)), 128)

        try:
            os.makedirs(f"{dir_to_save}/{scale_num}")
        except OSError:
            pass

        plt.imsave(
            f"{dir_to_save}/{scale_num}/real_scale.png",
            convert_image_np(curr_real),
            vmin=0,
            vmax=1,
        )

        if frechet:
            _, G_curr = init_models(
                netG_path,
                "",
                nc_im,
                nfc,
                min_nfc,
                kernel_size,
                padd_size,
                num_layer,
                device,
            )
        else:
            D_curr, G_curr = init_models(
                netG_path,
                netD_path,
                nc_im,
                nfc,
                min_nfc,
                kernel_size,
                padd_size,
                num_layer,
                device,
            )

        # Models with the same number of features (4 neighbor steps) are initialized
        # with the same weights
        if nfc_prev == nfc:
            G_curr.load_state_dict(torch.load(f"{dir_to_save}/{scale_num - 1}/netG.pt"))
            if not frechet:
                D_curr.load_state_dict(
                    torch.load(f"{dir_to_save}/{scale_num - 1}/netD.pt")
                )

        if frechet:
            z_curr, in_s, G_curr, noise_amp = train_single_scale_frechet_distance(
                G_curr,
                reals,
                Gs,
                Zs,
                in_s,
                NoiseAmp,
                kernel_size,
                num_layer,
                stride,
                alpha,
                nc_z,
                lr_d,
                lr_g,
                beta1,
                scale_num,
                noise_amp_init,
                scale_factor,
                mode_pad,
                dir_to_save,
            )
        else:
            z_curr, in_s, G_curr, noise_amp = train_single_scale(
                D_curr,
                G_curr,
                reals,
                Gs,
                Zs,
                in_s,
                NoiseAmp,
                kernel_size,
                num_layer,
                stride,
                alpha,
                nc_z,
                lr_d,
                lr_g,
                beta1,
                scale_num,
                noise_amp_init,
                scale_factor,
                mode_pad,
                dir_to_save,
            )

        # Persistency
        G_curr = G_curr.requires_grad_(False).eval()
        if not frechet:
            D_curr = D_curr.requires_grad_(False).eval()
        Gs.append(G_curr)
        Zs.append(z_curr)
        NoiseAmp.append(noise_amp)
        torch.save(Zs, f"{dir_to_save}/Zs.pt")
        torch.save(Gs, f"{dir_to_save}/Gs.pt")
        torch.save(reals, f"{dir_to_save}/reals.pt")
        torch.save(NoiseAmp, f"{dir_to_save}/NoiseAmp.pt")

        nfc_prev = nfc

        if frechet:
            del G_curr
        else:
            del D_curr, G_curr

    return Gs, Zs, reals, NoiseAmp

## Main

In [None]:
input_dir = "Input/Images"  # input image dir
input_name = "birds.png"  # input image name
mode = "train"  # task to be done"
frechet = False
mode_pad = 0
num_samples = 500

# for random_samples:
gen_start_scale = 0  # generation start scale

dir_to_save = (
    f"TrainedModels/{input_name[:-4]}/scale_factor={scale_factor},alpha={alpha}"
)

In [None]:
if os.path.exists(dir_to_save):
    print("Trained model already exist")
else:
    try:
        os.makedirs(dir_to_save)
    except OSError:
        pass

    real = read_image(f"{input_dir}/{input_name}")
    real = v2.functional.to_dtype(real, dtype=torch.float32, scale=True)
    real = real[:nc_im]
    real = scale(real)
    real = real.unsqueeze(0)
    real = real.to(device)

    num_scales = math.ceil(math.log(min_size / min(real.shape[2:]), scale_factor)) + 1
    scale_to_stop = math.ceil(
        math.log(
            min(max_size, max(real.shape[2:])) / max(real.shape[2:]),
            scale_factor,
        )
    )

    stop_scale = num_scales - scale_to_stop
    # First scale of the pyramid is the image's original size, or max_size
    scale0 = min(max_size / max(real.shape[2:]), 1)
    real = v2.functional.resize(
        real, (math.ceil(scale0 * real.shape[2]), math.ceil(scale0 * real.shape[3]))
    )

    # Corrected scale factor
    corrected_scale_factor = (min_size / min(real.shape[2:])) ** (1 / (stop_scale))

    Gs, Zs, reals, NoiseAmp = train(
        real,
        stop_scale,
        corrected_scale_factor,
        nfc,
        min_nfc,
        dir_to_save,
        netG_path,
        netD_path,
        noise_amp_init,
        device,
        frechet,
        mode_pad,
    )
    SinGAN_generate(
        Gs,
        Zs,
        reals,
        NoiseAmp,
        mode,
        input_name,
        gen_start_scale=gen_start_scale,
        num_samples=num_samples,
        mode_pad=mode_pad,
    )

# Random samples

## Main

In [None]:
input_dir = "Input/Images"  # input image dir
input_name = "birds.png"
mode = "random_samples"  # random_samples | random_samples_arbitrary_sizes
num_samples = 500

# for random_samples:
gen_start_scale = 0  # generation start scale

# for random_samples_arbitrary_sizes:
scale_h = 1.5  # horizontal resize factor for random samples
scale_v = 1  # vertical resize factor for random samples

In [None]:
Gs = []
Zs = []
reals = []
NoiseAmp = []
if mode == "random_samples":
    dir_to_save = f"{out}/RandomSamples/{input_name[:-4]}/gen_start_scale={gen_start_scale}"  # random_samples
elif mode == "random_samples_arbitrary_sizes":
    dir_to_save = f"{out}/RandomSamples_ArbitrerySizes/{input_name[:-4]}/scale_v={scale_v}_scale_h={scale_h}"  # random_samples_arbitraty_size
dir_to_save_train = (
    f"TrainedModels/{input_name[:-4]}/scale_factor={scale_factor},alpha={alpha}"
)

In [None]:
if os.path.exists(dir_to_save):
    if mode == "random_samples":
        print(
            f"random samples for image {input_name}, start scale={gen_start_scale}, already exist"
        )
    elif mode == "random_samples_arbitrary_sizes":
        print(
            f"random samples for image {input_name} at size: scale_h={scale_h}, scale_v={scale_v}, already exist"
        )
else:
    try:
        os.makedirs(dir_to_save)
    except OSError:
        pass

    if os.path.exists(dir_to_save_train):
        Gs = torch.load(f"{dir_to_save_train}/Gs.pt")
        Zs = torch.load(f"{dir_to_save_train}/Zs.pt")
        reals = torch.load(f"{dir_to_save_train}/reals.pt")
        NoiseAmp = torch.load(f"{dir_to_save_train}/NoiseAmp.pt")

    if mode == "random_samples":
        real = reals[gen_start_scale]
        real_down = upsampling(real, real.shape[2], real.shape[3])
        if gen_start_scale == 0:
            in_s = torch.full(real_down.shape, 0, device=device)
        else:
            in_s = upsampling(real_down, real_down.shape[2], real_down.shape[3])

        SinGAN_generate(
            Gs,
            Zs,
            reals,
            NoiseAmp,
            mode,
            input_name,
            gen_start_scale=gen_start_scale,
            num_samples=num_samples,
            mode_pad=mode_pad,
        )

    elif mode == "random_samples_arbitrary_sizes":
        real = reals[gen_start_scale]
        real_down = upsampling(real, scale_v * real.shape[2], scale_h * real.shape[3])
        if gen_start_scale == 0:
            in_s = torch.full(real_down.shape, 0, device=device)
        else:
            in_s = upsampling(real_down, real_down.shape[2], real_down.shape[3])

        SinGAN_generate(
            Gs,
            Zs,
            reals,
            NoiseAmp,
            mode,
            input_name,
            in_s=in_s,
            scale_v=scale_v,
            scale_h=scale_h,
            num_samples=num_samples,
            mode_pad=mode_pad,
        )

# SIFID

## SIFID functions

In [None]:
class InceptionV3(nn.Module):
    """Pretrained InceptionV3 network returning feature maps"""

    # Index of default block of inception to return,
    # corresponds to output of final average pooling
    DEFAULT_BLOCK_INDEX = 3

    # Maps feature dimensionality to their output blocks indices
    BLOCK_INDEX_BY_DIM = {
        64: 0,  # First max pooling features
        192: 1,  # Second max pooling featurs
        768: 2,  # Pre-aux classifier features
        2048: 3,  # Final average pooling features
    }

    def __init__(
        self,
        output_blocks=[DEFAULT_BLOCK_INDEX],
        resize_input=False,
        normalize_input=True,
        requires_grad=False,
    ):
        """Build pretrained InceptionV3

        Parameters
        ----------
        output_blocks : list of int
            Indices of blocks to return features of. Possible values are:
                - 0: corresponds to output of first max pooling
                - 1: corresponds to output of second max pooling
                - 2: corresponds to output which is fed to aux classifier
                - 3: corresponds to output of final average pooling
        resize_input : bool
            If true, bilinearly resizes input to width and height 299 before
            feeding input to model. As the network without fully connected
            layers is fully convolutional, it should be able to handle inputs
            of arbitrary size, so resizing might not be strictly needed
        normalize_input : bool
            If true, scales the input from range (0, 1) to the range the
            pretrained Inception network expects, namely (-1, 1)
        requires_grad : bool
            If true, parameters of the model require gradient. Possibly useful
            for finetuning the network
        """
        super(InceptionV3, self).__init__()

        self.resize_input = resize_input
        self.normalize_input = normalize_input
        self.output_blocks = sorted(output_blocks)
        self.last_needed_block = max(output_blocks)

        assert self.last_needed_block <= 3, "Last possible output block index is 3"

        self.blocks = nn.ModuleList()

        inception = models.inception_v3(weights="IMAGENET1K_V1")

        # Block 0: input to maxpool1
        block0 = [
            inception.Conv2d_1a_3x3,
            inception.Conv2d_2a_3x3,
            inception.Conv2d_2b_3x3,
        ]

        self.blocks.append(nn.Sequential(*block0))

        # Block 1: maxpool1 to maxpool2
        if self.last_needed_block >= 1:
            block1 = [
                nn.MaxPool2d(kernel_size=3, stride=2),
                inception.Conv2d_3b_1x1,
                inception.Conv2d_4a_3x3,
            ]
            self.blocks.append(nn.Sequential(*block1))

        # Block 2: maxpool2 to aux classifier
        if self.last_needed_block >= 2:
            block2 = [
                nn.MaxPool2d(kernel_size=3, stride=2),
                inception.Mixed_5b,
                inception.Mixed_5c,
                inception.Mixed_5d,
                inception.Mixed_6a,
                inception.Mixed_6b,
                inception.Mixed_6c,
                inception.Mixed_6d,
                inception.Mixed_6e,
            ]
            self.blocks.append(nn.Sequential(*block2))

        # Block 3: aux classifier to final avgpool
        if self.last_needed_block >= 3:
            block3 = [
                inception.Mixed_7a,
                inception.Mixed_7b,
                inception.Mixed_7c,
            ]
            self.blocks.append(nn.Sequential(*block3))

        if self.last_needed_block >= 4:
            block4 = [nn.AdaptiveAvgPool2d(output_size=(1, 1))]
            self.blocks.append(nn.Sequential(*block4))

        for param in self.parameters():
            param.requires_grad = requires_grad

    def forward(self, inp):
        """Get Inception feature maps

        Parameters
        ----------
        inp : torch.autograd.Variable
            Input tensor of shape Bx3xHxW. Values are expected to be in
            range (0, 1)

        Returns
        -------
        List of torch.autograd.Variable, corresponding to the selected output
        block, sorted ascending by index
        """
        outp = []
        x = inp

        if self.resize_input:
            x = F.upsample(x, size=(299, 299), mode="bilinear", align_corners=False)

        if self.normalize_input:
            x = 2 * x - 1  # Scale from range (0, 1) to range (-1, 1)

        for idx, block in enumerate(self.blocks):
            x = block(x)
            if idx in self.output_blocks:
                outp.append(x)

            if idx == self.last_needed_block:
                break

        return outp

In [None]:
def get_activations(files, model, batch_size=1, dims=64, cuda=False, verbose=False):
    """Calculates the activations of the pool_3 layer for all images.

    Params:
    -- files       : List of image files paths
    -- model       : Instance of inception model
    -- batch_size  : Batch size of images for the model to process at once.
                     Make sure that the number of samples is a multiple of
                     the batch size, otherwise some samples are ignored. This
                     behavior is retained to match the original FID score
                     implementation.
    -- dims        : Dimensionality of features returned by Inception
    -- cuda        : If set to True, use GPU
    -- verbose     : If set to True and parameter out_step is given, the number
                     of calculated batches is reported.
    Returns:
    -- A numpy array of dimension (num images, dims) that contains the
       activations of the given tensor when feeding inception with the
       query tensor.
    """
    model.eval()

    if len(files) % batch_size != 0:
        print(
            (
                "Warning: number of images is not a multiple of the "
                "batch size. Some samples are going to be ignored."
            )
        )
    if batch_size > len(files):
        print(
            (
                "Warning: batch size is bigger than the data size. "
                "Setting batch size to data size"
            )
        )
        batch_size = len(files)

    n_batches = len(files) // batch_size
    n_used_imgs = n_batches * batch_size

    pred_arr = np.empty((n_used_imgs, dims))

    for i in range(n_batches):
        if verbose:
            print("\rPropagating batch %d/%d" % (i + 1, n_batches), end="", flush=True)
        start = i * batch_size
        end = start + batch_size

        images = np.array([imread(str(f)).astype(np.float32) for f in files[start:end]])

        images = images[:, :, :, 0:3]
        # Reshape to (n_images, 3, height, width)
        images = images.transpose((0, 3, 1, 2))
        # images = images[0,:,:,:]
        images /= 255

        batch = torch.from_numpy(images).type(torch.FloatTensor)
        if cuda:
            batch = batch.cuda()

        pred = model(batch)[0]

        # If model output is not scalar, apply global spatial average pooling.
        # This happens if you choose a dimensionality not equal 2048.

        # if pred.shape[2] != 1 or pred.shape[3] != 1:
        #    pred = adaptive_avg_pool2d(pred, output_size=(1, 1))

        pred_arr = (
            pred.cpu()
            .data.numpy()
            .transpose(0, 2, 3, 1)
            .reshape(batch_size * pred.shape[2] * pred.shape[3], -1)
        )

    if verbose:
        print("done")

    return pred_arr


def calculate_activation_statistics(
    files, model, batch_size=1, dims=64, cuda=False, verbose=False
):
    """Calculation of the statistics used by the FID.
    Params:
    -- files       : List of image files paths
    -- model       : Instance of inception model
    -- batch_size  : The images numpy array is split into batches with
                     batch size batch_size. A reasonable batch size
                     depends on the hardware.
    -- dims        : Dimensionality of features returned by Inception
    -- cuda        : If set to True, use GPU
    -- verbose     : If set to True and parameter out_step is given, the
                     number of calculated batches is reported.
    Returns:
    -- mu    : The mean over samples of the activations of the inception model.
    -- sigma : The covariance matrix of the activations of the inception model.
    """
    act = get_activations(files, model, batch_size, dims, cuda, verbose)
    mu = np.mean(act, axis=0)
    sigma = np.cov(act, rowvar=False)
    return torch.from_numpy(mu), torch.from_numpy(sigma)


def calculate_sifid_given_paths(real_path, path2, batch_size, cuda, dims, suffix):
    """Calculates the SIFID of two paths"""

    block_idx = InceptionV3.BLOCK_INDEX_BY_DIM[dims]

    model = InceptionV3([block_idx])
    if cuda:
        model.cuda()

    real_path = pathlib.Path(real_path)

    path2 = pathlib.Path(path2)
    files2 = sorted(list(path2.glob(f"*.{suffix}")))

    m1, s1 = calculate_activation_statistics(
        [real_path], model, batch_size, dims, cuda
    )

    fid_values = []
    Im_ind = []
    for f2 in tqdm(files2):
        m2, s2 = calculate_activation_statistics(
            [f2], model, batch_size, dims, cuda
        )
        fid_values.append(frechet_distance(m1, s1, m2, s2))
        file_name1 = real_path.name
        file_name2 = f2.name
        Im_ind.append(file_name1[:-4])
        Im_ind.append(file_name2[:-4])
    return fid_values

## Main

In [None]:
input_name = "birds.png"
real_path = f"Input/Images/{input_name}"  # Path to the real images
fake_directory = f"Output/RandomSamples/{input_name[:-4]}/gen_start_scale=0"  # Path to generated images
gpu = "0"  # GPU to use (leave blank for CPU only)
images_suffix = input_name[-3:]  # image file suffix

In [None]:
os.environ["CUDA_VISIBLE_DEVICES"] = gpu

sifid_values = calculate_sifid_given_paths(
    real_path, fake_directory, 1, gpu != "", 64, images_suffix
)

sifid_values = np.asarray(sifid_values, dtype=np.float32)
np.save(f"Output/RandomSamples/{input_name[:-4]}/gen_start_scale=0/SIFID.npy", sifid_values)
print("SIFID: ", sifid_values.mean())

# Results

In [None]:
print(f"{"frechet" if frechet else "default"}\n\t`niter`: ${niter}$, `Gsteps`: ${Gsteps}$, `scale_factor`: ${scale_factor}$, `mode_pad`: ${mode_pad}$")

In [None]:
for directory in ["TrainedModels", "Output"]:
    os.rename(directory, f"{directory}_{"frechet" if frechet else "default"}_{niter}_{Gsteps}_{scale_factor}_{mode_pad}")

- Default SinGAN
  - `niter`: $1000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $0$
    - `birds.png`
      - Execution time: $28$ min
      - SIFID: `2.1939146e-05` 
  - `niter`: $1000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $1$
    - `birds.png`
      - Execution time: $30$ min
      - SIFID: `1.2019249e-05` 
  - `niter`: $1000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $2$
    - `birds.png`
      - Execution time: $29$ min
      - SIFID: `1.227873e-05` 
  - `niter`: $1000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $3$
    - `birds.png`
      - Execution time: $24$ min
      - SIFID: `0.00011935688` 
  - `niter`: $2000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $0$
    - `birds.png`
      - Execution time: $60$ min
      - SIFID: `8.802319e-06` 
  - `niter`: $2000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $2$
    - `birds.png`
      - Execution time: $62$ min
      - SIFID: `1.383028e-05` 
  - `niter`: $2000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $3$
    - `birds.png`
      - Execution time: $52$ min
      - SIFID: `0.00013286904` 
- SinGAN with Fréchet Induction Distance cost function
  - `niter`: $1000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $0$
    - `birds.png`
      - Execution time: $16$ min
      - SIFID: `1.1237572e-05` 
  - `niter`: $2000$, `Gsteps`: $1$, `scale_factor`: $0.75$, `mode_pad`: $0$
    - `birds.png`
      - Execution time: $31$ min
      - SIFID: `1.1789973e-05`