<a href="https://colab.research.google.com/github/yavuzuzun/machine-learning-practice/blob/main/variational_autoencodrs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Variational Auto-Encoders

Variational Autoencoders (VAEs) are generative models that can learn the latent representation of high-dimensional data. VAEs are a type of artificial neural network that uses a probabilistic approach to encode and decode data. The main objective of VAEs is to learn the underlying structure of the data by modeling the probability distribution of the data in a low-dimensional latent space.

VAEs are composed of two main parts: an encoder and a decoder. The encoder maps the input data to a low-dimensional latent space, which is a compressed representation of the input data. The decoder then maps the latent space back to the input data. The objective of the VAE is to optimize the encoding and decoding functions so that the reconstructed data is similar to the original input data.

The encoder of a VAE typically consists of a series of neural network layers that map the input data to a latent space distribution. The distribution is typically Gaussian, with a mean and a variance. The mean and variance vectors are used to sample a random point from the latent space, which is then fed into the decoder.

The decoder of a VAE is also typically composed of a series of neural network layers that map the latent space back to the input data space. The output of the decoder is the reconstructed input data.

During the training phase, the VAE tries to minimize a loss function, which is typically the sum of two terms: the reconstruction loss, which measures the difference between the input data and the reconstructed data, and the KL divergence loss, which measures the difference between the latent space distribution and a unit Gaussian distribution.

The VAE is a powerful generative model that can be used to generate new data samples by sampling from the learned latent space distribution. It has been successfully applied to a wide range of tasks, including image generation, anomaly detection, and data imputation.

# With prepackage

# Using Keras

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define encoder
latent_dim = 2  # Size of latent space

encoder_inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)

# Reparameterization trick
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.keras.backend.random_normal(shape=(tf.keras.backend.shape(z_mean)[0], latent_dim),
                                              mean=0.0, stddev=1.0)
    return z_mean + tf.keras.backend.exp(0.5 * z_log_var) * epsilon

z = layers.Lambda(sampling, output_shape=(latent_dim,), name="z")([z_mean, z_log_var])

encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")

# Define decoder
latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation="relu")(latent_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)

decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")

# Define VAE
class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def train_step(self, data):
        if isinstance(data, tuple):
            data = data[0]
        with tf.GradientTape() as tape:
            z_mean, z_log_var = self.encoder(data)
            z = self.sampling((z_mean, z_log_var))
            reconstruction = self.decoder(z)
            reconstruction_loss = tf.reduce_mean(
                keras.losses.binary_crossentropy(data, reconstruction)
            )
            reconstruction_loss *= 28 * 28
            kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
            kl_loss = tf.reduce_mean(kl_loss)
            kl_loss *= -0.5
            total_loss = reconstruction_loss + kl_loss
        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

    def call(self, inputs):
        z_mean, z_log_var = self.encoder(inputs)
        z = self.sampling((z_mean, z_log_var))
        return self.decoder(z)

    def sampling(self, args):
        z_mean, z_log_var = args
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

# Instantiate VAE model
vae = VAE(encoder, decoder)

# Compile the model
vae.compile(optimizer=keras.optimizers.Adam())

1 - Training the model:

To train the VAE model, you can simply call the fit() method on the VAE object and pass in your training data. For example:

In [None]:
from tensorflow.keras.datasets import mnist

# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize images and reshape them
train_images = train_images.astype("float32") / 255.0
test_images = test_images.astype("float32") / 255.0
train_images = np.reshape(train_images, (train_images.shape[0], 28, 28, 1))
test_images = np.reshape(test_images, (test_images.shape[0], 28, 28, 1))

# Train the model
vae.fit(train_images, epochs=10, batch_size=128, validation_data=(test_images, None))


Epoch 1/10


ValueError: ignored

This will train the VAE model on the MNIST dataset for 10 epochs with a batch size of 128.

2 - Generating new images:

To generate new images from the VAE model, you can sample points from the latent space and decode them into images using the decoder model. For example:

In [None]:
# Generate random points in the latent space
random_latent_vectors = np.random.normal(size=(10, latent_dim))

# Decode the random latent vectors into images
generated_images = decoder.predict(random_latent_vectors)

# Plot the generated images
for i in range(generated_images.shape[0]):
    plt.subplot(2, 5, i+1)
    plt.imshow(generated_images[i, :, :, 0], cmap='gray')
    plt.axis('off')
plt.show()

This will generate 10 random points in the latent space, decode them into images, and plot them.

3 - Encoding existing images:

To encode existing images into their latent representations using the encoder model, you can simply call the predict() method on the encoder model and pass in the images. For example:

In [None]:
# Encode the test images into their latent representations
z_mean, _, _ = encoder.predict(test_images)

# Plot the latent representations
plt.figure(figsize=(10, 10))
plt.scatter(z_mean[:, 0], z_mean[:, 1], c=np.argmax(test_labels, axis=1), cmap='viridis')
plt.colorbar()
plt.xlabel('z[0]')
plt.ylabel('z[1]')
plt.show()

This will encode the test images into their latent representations using the encoder model, and plot the latent representations in a 2D scatter plot. The color of each point corresponds to the label of the corresponding image.

# Using PyTorch

In [None]:
import torch
import torch.nn as nn

# Define encoder
class Encoder(nn.Module):
    def __init__(self, latent_dim):
        super(Encoder, self).__init__()

        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(7*7*64, 16)
        self.fc_mean = nn.Linear(16, latent_dim)
        self.fc_log_var = nn.Linear(16, latent_dim)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = self.flatten(x)
        x = torch.relu(self.fc1(x))
        mean = self.fc_mean(x)
        log_var = self.fc_log_var(x)
        return mean, log_var

# Define decoder
class Decoder(nn.Module):
    def __init__(self, latent_dim):
        super(Decoder, self).__init__()

        self.fc1 = nn.Linear(latent_dim, 7*7*64)
        self.reshape = nn.Unflatten(-1, (64, 7, 7))
        self.conv1 = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1)
        self.conv2 = nn.ConvTranspose2d(32, 1, kernel_size=3, stride=2, padding=1, output_padding=1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.reshape(x)
        x = torch.relu(self.conv1(x))
        x = torch.sigmoid(self.conv2(x))
        return x

# Define VAE
class VAE(nn.Module):
    def __init__(self, latent_dim):
        super(VAE, self).__init__()

        self.latent_dim = latent_dim
        self.encoder = Encoder(latent_dim)
        self.decoder = Decoder(latent_dim)

    def encode(self, x):
        mean, log_var = self.encoder(x)
        return mean, log_var

    def reparameterize(self, mean, log_var):
        std = torch.exp(0.5 * log_var)
        eps = torch.randn_like(std)
        return mean + eps * std

    def decode(self, z):
        return self.decoder(z)

    def forward(self, x):
        mean, log_var = self.encode(x)
        z = self.reparameterize(mean, log_var)
        return self.decode(z), mean, log_var

    def loss_function(self, recon_x, x, mu, logvar):
        BCE = nn.functional.binary_cross_entropy(recon_x.view(-1, 784), x.view(-1, 784), reduction='sum')
        KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
        return BCE + KLD

# Instantiate VAE model
vae = VAE(latent_dim=2)

# Define optimizer
optimizer = torch.optim.Adam(vae.parameters())

To train the model, you will need a dataset of images to feed into the VAE. Here is an example of how you could train the VAE on the MNIST dataset:

In [None]:
import torch
import torchvision.datasets as dsets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define dataset and dataloader
batch_size = 128
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = dsets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# Instantiate VAE model
vae = VAE(latent_dim=2)

# Define optimizer
optimizer = torch.optim.Adam(vae.parameters())

# Training loop
epochs = 10
for epoch in range(epochs):
    total_loss = 0
    for batch_idx, (x, _) in enumerate(train_loader):
        optimizer.zero_grad()

        # Forward pass
        recon_x, mu, log_var = vae(x)

        # Compute loss
        loss = vae.loss_function(recon_x, x, mu, log_var)

        # Backward pass and update weights
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(train_loader.dataset)
    print(f'Epoch {epoch+1}/{epochs}, Average Loss: {avg_loss:.4f}')

In this example, we first define a dataset and dataloader for the MNIST dataset using PyTorch's DataLoader class. We then instantiate the VAE model and define an optimizer (Adam in this case). Finally, we loop through the dataset for a specified number of epochs, computing the loss and updating the weights after each batch. The loss function used is the sum of the reconstruction loss and the Kullback-Leibler divergence (KLD) between the learned distribution and a unit Gaussian. The KLD term encourages the learned distribution to be close to a unit Gaussian, which helps prevent overfitting.