# DC-GAN (Deep Convolutional Generative Adversarial Network)

I recently opened Tensorflow's tutorials on GAN (Generative Adversarial Network), and found this [link](https://www.tensorflow.org/tutorials/generative/dcgan). 

Basically the tutorial is making a generative handwritten digits image using Keras API. But in here, I'm gonna using TensorLayerX's API to build the GAN. Some parts may not following the Tensorflow's tutorial directly. 

Things to install:
- tensorflow
- tensorlayerx
- torch (Just if you need it's backend. If not, you can skip torch)
- matplotlib
- numpy
- pillow

The goal here is to make a generative model that generates new handwritten digit images. 

In [1]:
import tensorflow;
from tensorflow.keras.datasets import mnist;
import numpy;

In [2]:
import torch;

torch.cuda.is_available()

True

## Dataset Loading & Pre-Processing

MNIST dataset is a database of handwritten digits for training. So people will type in 0 to 9 handwrittenly, taking the photo of it, and resizing it to 28 x 28 greyscale pixels.

Tensorflow Keras API had an API to download MNIST dataset, and using it directly as train-test splits. If I'm not mistaken, TensorLayerX also had one, but I prefered this one since I once use it.

By applying `.shape`, we can then see that by default, the training set had 60K worth of data while test set had 10K. From these 10K set, I split the test set into 50:50, making validation set from it.

| Splits | Total data |
|---|---|
| Train | 60000 |
| Test | 5000 |
| Val | 5000 |

In [3]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data();

# Split the test and val by 50:50
test_val_images_split = numpy.array_split(test_images, 2);
test_val_labels_split = numpy.array_split(test_labels, 2);

test_images = test_val_images_split[0];
test_labels = test_val_labels_split[0];

val_images = test_val_images_split[1];
val_labels = test_val_labels_split[1];

train_images.shape, test_images.shape, val_images.shape

((60000, 28, 28), (5000, 28, 28), (5000, 28, 28))

## Data Preparation

With the done of the data preprocessing, I will do Data Preparation using TensorLayerX's API. This part is crucial since we can't just set the data into the model without reshaping it into proper tensor format.

As mentioned above, we know that the images consists of data dimension below:

| Attribute | Size |
|---|---|
| Width | 28px |
| Height | 28px |
| Color Channels | Gray Scale |

With this information above, we can conclude that the tensor dimension should be (28 -> Width, 28 -> Height, 1 -> Color Channel). Which is, not same as the loaded version that was (28, 28).

The data preparation is done using Data Loading pattern. This pattern will encapsulating the process of loading and preprocessing data. Since the rest of data pre-processing phase below will manipulating the nature of the data, like converting it to some data format, it is best to encapsulating it into a class. 

So what we will do in this part are:

1. Set Tensorlayerx's backend as Tensorflow / Torch
2. Create Data Loader class
3. Converts every value within features with shape of (28, 28, 1) and as float32 tensor since it is easier to model to read.
4. Standardize the data within features by dividing it to 255.
5. Register all train-test-val data with Data Loader

In [4]:
import tensorlayerx;
from tensorlayerx import expand_dims, convert_to_tensor, float32, squeeze;
from tensorlayerx.dataflow import IterableDataset;

import os;

# Set Tensorlayerx's backend as Tensorflow / Torch
# os.environ["TL_BACKEND"] = "tensorflow"; # Uncomment this line to use Tensorflow's Backend
os.environ["TL_BACKEND"] = "torch"; # Uncomment this line to use Torch's backend

Using TensorFlow backend.


In [5]:
# Create Data Loader class
class DatasetLoader(IterableDataset):
    def __init__(self, feature, label):

        # Converts every value within features with shape of (28, 28, 1) and as float32 tensor since it is easier to model to read.
        # Standardize the data within features by dividing it to 255.
        self.data = feature.reshape((len(feature), 28, 28, 1)).astype('float32') / 255;
        self.label = label.astype('int32');
    
        print(f"Data Shape: {self.data.shape}");
        print(f"Label Shape: {self.label.shape}");

    def __getitem__(self, index):
        data = self.data[index];
        label = self.label[index];

        return data, label;

    def __len__(self):
        return len(self.data);

    def __iter__(self):
        for i in range(len(self.data)):
            yield self.data[i], self.label[i];

In [6]:
# Register all train-test-val data with Data Loader

train_set = DatasetLoader(train_images, train_labels);
test_set = DatasetLoader(test_images, train_labels);
val_set = DatasetLoader(val_images, train_labels);

Data Shape: (60000, 28, 28, 1)
Label Shape: (60000,)
Data Shape: (5000, 28, 28, 1)
Label Shape: (60000,)
Data Shape: (5000, 28, 28, 1)
Label Shape: (60000,)


In [7]:
data, label = val_set.__getitem__(0);

data.shape, label

((28, 28, 1), 5)

## Model Architecture

GAN architecture consist of 2 Models. Generator, and Discriminator. 

Generator model inteded to be generate the *fake* hand written. By mean *fake*, means that the model will generate new hand written digit sample. 
Discriminator model intended for telling us whether the generated image are right or not. 

With these information aboves, we can assume that both generator and discriminator model must have at least these architectures as follow:

1. Generator

| Layer type | Specification | Purpose |
|---|---|---|
| Input Layer | Dense -> Shape (100, ) | This contains the random noise data for initializing the weight distribution for the model. |
| Output Layer |  Conv2d -> Shape (28, 28, 1) | The drawing result from the model. |

2. Discriminator

| Layer type | Specification | Purpose |
|---|---|---|
| Input Layer | Conv2d -> Shape (28, 28, 1) | This is must be matched with Generator's output |
| Output Layer | Linear -> Shape (1) | This is boolean. Truthy or falsy. Determine whether the generated image from Generator is fake or not. |

Before we jumping into 

In [8]:
from tensorlayerx.nn import Module, Conv2d, Linear, BatchNorm, MaxPool2d, Flatten, Input, Reshape, ConvTranspose2d;
from tensorlayerx import LeakyReLU, Tanh, Sigmoid;
from tensorlayerx.metrics import Accuracy;
from tensorlayerx.losses import binary_cross_entropy;

import numpy;

import tensorlayerx as tlx;

In [14]:
# Generator Model

class MNIST_Model_G(Module):

    def __init__(self):
        super(MNIST_Model_G, self).__init__();

        # I don't know why this can't be from tlx.initializers import TruncatedNormal
        w_init = tlx.initializers.TruncatedNormal(stddev = 0.02);
        b_init = tlx.initializers.TruncatedNormal(mean = 1.0, stddev = 0.02);
    
        self.input = Input(shape = (None, 100,));
        
        self.dense1 = Linear(7 * 7 * 256, b_init = b_init);
        self.bn1 = BatchNorm();
        self.act1 = LeakyReLU();
        self.reshape1 = Reshape(shape = (7, 7, 256));
        self.conv_trans1 = ConvTranspose2d(out_channels = 128, kernel_size = (5, 5), stride = (1, 1), padding = "SAME", b_init = b_init);
        self.reshape2 = Reshape(shape = (7, 7, 128));
        self.bn2 = BatchNorm();
        self.act2 = LeakyReLU();
        self.conv_trans2 = ConvTranspose2d(out_channels = 64, kernel_size = (5, 5), stride = (2, 2), padding = "SAME", b_init = b_init);
        self.reshape3 = Reshape(shape = (14, 14, 64));
        self.bn3 = BatchNorm();
        self.act3 = LeakyReLU();
        self.conv_trans3 = ConvTranspose2d(out_channels = 1, kernel_size = (5, 5), act = Tanh, stride = (2, 2), padding = "SAME", b_init = b_init);
        self.output = Reshape(shape = (28, 28, 1));
        

        # self.input = Input(shape = (64, 28, 28, 1));
        # self.conv1 = Conv2d(out_channels = 64, kernel_size = (3, 3), act = LeakyReLU, padding = "SAME", W_init = w_init, b_init = b_init, data_format = "channel_first", name = "conv1");
        # self.conv2 = Conv2d(out_channels = 64, kernel_size = (3, 3), act = LeakyReLU, padding = "SAME", W_init = w_init, b_init = b_init, data_format = "channel_first", name = "conv2");
        
        # self.output = Conv2d(out_channels = 1, kernel_size = (1, 1), act = Tanh, padding = "SAME", W_init = w_init, b_init = b_init, data_format = "channel_first", name = "output");

    def forward(self, x):
        x = self.dense1(x);
        x = self.bn1(x);
        x = self.act1(x);
        x = self.reshape1(x);

        x = self.conv_trans1(x);
        x = self.reshape2(x);
        x = self.bn2(x);
        x = self.act2(x);

        x = self.conv_trans2(x);
        x = self.reshape3(x);
        x = self.bn3(x);
        x = self.act3(x);

        x = self.conv_trans3(x);

        x = self.output(x);

        return x;

    def construct(self, x):
        x = self.input(x);
        x = self.dense1(x);
        x = self.bn1(x);
        x = self.act1(x);
        x = self.reshape1(x);

        x = self.conv_trans1(x);
        x = self.reshape2(x);
        x = self.bn2(x);
        x = self.act2(x);

        x = self.conv_trans2(x);
        x = self.reshape3(x);
        x = self.bn3(x);
        x = self.act3(x);

        x = self.conv_trans3(x);
        
        out = self.output(x);

        return out;

class G_With_Loss(Module):
    def __init__(self, G_Net: Module, D_Net: Module, loss_fn):
        super(G_With_Loss, self).__init__();

        self.G_Net = G_Net;
        self.D_Net = D_Net;
        self.loss_function = loss_fn;

    def forward(self, data, ground_truth):
        fake_image = self.G_Net(data);
        logits_fake = self.D_Net(fake_image);

        loss = self.loss_function(logits_fake, ground_truth);

        return loss;

In [16]:
class MNIST_Model_D(Module):
    
    def __init__(self):
        super(MNIST_Model_D, self).__init__();

        # I don't know why this can't be from tlx.initializers import TruncatedNormal
        w_init = tlx.initializers.TruncatedNormal(stddev = 0.02);
        b_init = tlx.initializers.TruncatedNormal(mean = 1.0, stddev = 0.02);

        self.input = Input(shape = (28, 28, 1));
        
        self.conv1 = Conv2d(out_channels = 64, kernel_size = (3, 3), act = LeakyReLU, padding = "SAME", W_init = w_init, b_init = b_init, data_format = "channel_first", name = "conv1");
        self.pool1 = MaxPool2d(kernel_size = (2, 2), name = "pool1");

        self.conv2 = Conv2d(out_channels = 32, kernel_size = (3, 3), act = LeakyReLU, padding = "SAME", W_init = w_init, b_init = b_init, data_format = "channel_first", name = "conv2");
        self.pool2 = MaxPool2d(kernel_size = (2, 2), name = "pool2");

        self.flat = Flatten(name = "flat");
        self.output = Linear(
            out_features = 1,
            W_init = tlx.initializers.TruncatedNormal(stddev = 5e-2),
            b_init = tlx.initializers.TruncatedNormal(mean = 1, stddev = 2e-2),
            act = Sigmoid,
            name = "output"
        );

    def forward(self, x):
        x = self.conv1(x);
        x = self.pool1(x);
        
        x = self.conv2(x);
        x = self.pool2(x);

        x = self.flat(x);
        x = self.output(x);

        return x;

    def construct(self, x):
        x = self.input(x);

        x = self.conv1(x);
        x = self.pool1(x);
        
        x = self.conv2(x);
        x = self.pool2(x);

        x = self.flat(x);
        out = self.output(x);

        return out;

class D_With_Loss(Module):
    def __init__(self, G_Net: Module, D_Net: Module, loss_fn):
        super(D_With_Loss, self).__init__();

        self.G_Net = G_Net;
        self.D_Net = D_Net;
        self.loss_function = loss_fn;


    def forward(self, real_data, fake_data):
        logits_real = self.D_Net(real_data);

        fake_image = self.G_Net(fake_data);
        logits_fake = self.D_Nate(fake_image);

        valid = tlx.convert_to_tensor(numpy.ones(logits_real.shape), dtype = tlx.float32);
        fake = tlx.convert_to_tensor(numpy.ones(logits_fake.shape), dtype = tlx.float32);

        loss = self.loss_function(logits_real, valid) + self.loss_function(logits_fake, fake);

        return loss;
        

## Adversarial Training

As per this code being typed, I believe that every adversarial training use case required different treatment. MNIST for example, the G network must be init with set of randomized numbers as initializer, and then being fed to model. 

There are 2 trainings to be done in here:

1. G initialization
This will actually train the Generator model to make fake samples.

2. Adversarial Training
This will 

In [17]:
from tensorlayerx.model import TrainOneStep; 
from tensorlayerx.optimizers import SGD;

from tqdm import tqdm;
from time import time;

In [18]:
mse = tlx.losses.mean_squared_error;

# Training Config
epoch = 10;
generator = MNIST_Model_G();
discriminator = MNIST_Model_D();

G_With_Loss = G_With_Loss(generator, discriminator, mse);
D_With_Loss = D_With_Loss(generator, discriminator, mse);

G_trainer = TrainOneStep(G_With_Loss, SGD(), generator.trainable_weights);
D_trainer = TrainOneStep(D_With_Loss, SGD(), discriminator.trainable_weights);

[TLX] Input  _inputlayer_3: (None, 100)
[TLX] Linear  linear_2: 12544 No Activation
[TLX] BatchNorm batchnorm_4: momentum: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TLX] Reshape reshape_5
[TLX] ConvTranspose2d convtranspose2d_4: out_channels: 128 kernel_size: (5, 5) stride: (1, 1) pad: SAME act: No Activation output_padding: 0 groups: 1
[TLX] Reshape reshape_6
[TLX] BatchNorm batchnorm_5: momentum: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TLX] ConvTranspose2d convtranspose2d_5: out_channels: 64 kernel_size: (5, 5) stride: (2, 2) pad: SAME act: No Activation output_padding: 0 groups: 1
[TLX] Reshape reshape_7
[TLX] BatchNorm batchnorm_6: momentum: 0.900000 epsilon: 0.000010 act: No Activation is_train: True
[TLX] ConvTranspose2d convtranspose2d_6: out_channels: 1 kernel_size: (5, 5) stride: (2, 2) pad: SAME act: Tanh output_padding: 0 groups: 1
[TLX] Reshape reshape_8
[TLX] Input  _inputlayer_4: (28, 28, 1)
[TLX] Conv2d conv1: out_channels : 64 k

In [24]:
progress_epoch = [];
progress_train_G_loss = [];
progress_train_D_loss = [];
progress_train_G_acc = [];
progress_train_D_acc = [];
progress_val_G_loss = [];
progress_val_D_loss = [];
progress_val_G_acc = [];
progress_val_D_acc = [];

for i in range(epoch):
    start_time = time();
    print(f"Training Epoch {i+1} / {epoch}", end = " ");

    # Training phase
    generator.set_train();
    discriminator.set_train();
    train_n_iter = 0;
    train_D_loss, train_D_acc = 0, 0;
    train_G_loss, train_G_acc = 0, 0;

    for X_batch, y_batch in tqdm(train_set):
        noise = tlx.convert_to_tensor(numpy.random.normal([256, 100]), dtype = tlx.float32);

        D_loss = D_trainer(X_batch, noise);
        G_loss = G_trainer(noise, y_batch);

        train_D_loss += D_loss;
        train_G_loss += G_loss;
        train_n_iter += 1;

        # Calculate accuracy
        # metric_train.update(logits, y_batch);
        # train_acc += metric_train.result();
        # metric_train.reset();

    # Validation phase

    # network.set_eval();
    # val_loss, val_acc, val_n_iter = 0, 0, 0;

    # for X_batch, y_batch in val_set_loader:
    #     loss = trainer(X_batch, y_batch);
    #     val_loss += loss;

    #     val_n_iter += 1;

    #     logits = network(X_batch);

    #     # Calculate accuracy
    #     metric_val.update(logits, y_batch);
    #     val_acc += metric_val.result();
        # metric_val.reset();

    time_done = (int)(time() - start_time);

    train_D_loss = train_D_loss / train_n_iter;
    train_G_loss = train_G_loss / train_n_iter;
    # val_loss = val_loss / val_n_iter;
    # val_acc = val_acc / val_n_iter;

    progress_epoch.append(i+1);
    progress_train_D_loss.append(train_D_loss);
    progress_train_G_loss.append(train_G_loss);
    # progress_val_acc.append(val_acc);
    # progress_val_loss.append(val_loss);

    print(f"Epoch {i+1} - {time_done}s - train D loss: {train_D_loss} - train G loss: {train_G_loss}");

Training Epoch 1 / 10 

  0%|          | 0/60000 [00:00<?, ?it/s]


AttributeError: 'numpy.ndarray' object has no attribute 'get_shape'