#  **ICT303 - Assignment 1**

**Your name: Ang Jin Wei**

**Student ID: 34792195**

**Email: xsolsticegfx@gmail.com**


## **1. Description**

We would like to develop, using Multilayer Perceptron (MLP), a computer program that takes images of handwritten text, finds the written characters in the image and displays the written characters.

To achieve this, we will proceed in steps:

1. Develop and train an MLP for the recognition of handwritten characters from images. In the first instance, the images are assumed to contain only one handwritten.
2. Train and test the MLP, and evaluate its performance by using loss curves and proper accuracy/performance measures
3. Improve the performance of the MLP by tuning its hyper parameters.
4. Extend the program you developed to localize (detect) and recognize handwritten characters in an image that contains multiple handwritten characters.

For this purpose, we will use the following dataset for training, validation and testing: https://www.kaggle.com/datasets/dhruvildave/english-handwritten-characters-dataset.

You are required to justify every design choice. Justifications should be theoretical and validated with experiments.

It is important that you start as earlier as possible. Coding is usually easy. However, training neural networks and tuning its hyper-parameters takes time.

##**2. Marking Guide**##

- The overal structure of the program - it should follow the structure we used so far in the labs **[30 Marks]**. This includes:
  - A class that defines the network architecture that extends the class `nn.Module`. It should have a constructor method (`__init__()`) and a forward function (`forward()`)
  - The Trainer class
  - A main function

- Training working and running on GPU **[10 marks]**

- Curves for training loss and validation loss plotted and training stopped when the network starts to overfit (i.e., when the validation loss starts to increase). You must use TensorBoard to visualize curves and monitor performance **[10 marks]**

- Testing code properly working. **[10 marks]**

- Hyper parameters finetuned and the best ones selected. **[10 marks]**

- Quality of the dicussions **[20 marks]**: did the student discuss various design choices, including the hyperparamters or any choices they made to improve the performance? Any design choice should be properly justified.

- Extension to the localization of the characters **[10 marks]**

## **3. What to submit**

You need to upload to LMS the notebook as well as a folder that contains the .py files you created. All classes should be implemented in .py files. The notebook will sever as a documentation of your work as well as the codes that demonstrated the training, validation and testing of your MLP models that you created.




# Assignment

## 1. Introduction

A computer program using Multilayer Perceptron (MLP) that takes images of handwritten text, finds the character written in the image and display the written characters.

Notes:
1. This program is coded in Visual Studio Code using locally downloaded datasets. The .py files and datasets will be zipped and uploaded in the LMS.
2. I am running the program on Nvidia GTX 3080 graphics card.
3. The program is **not made to work** inside Colab notebook as it requires local dataset, hence, please run the program through the zipped file along with the datasets included, in order to run it in Colab notebook, you will have to manually upload the local dataset to the notebook following the same folder/data structure.



## 2. Organizing of data files

In [None]:
import os
import shutil
import pandas as pd

# Read CSV file (assuming it has columns 'Image' and 'Class')
dir_path = os.path.dirname(os.path.realpath(__file__))
dataset_folder = 'Data/lowercase.csv'
image_folder = 'Data/Img'
csv_path = os.path.join(dir_path, dataset_folder)
df = pd.read_csv(csv_path)

# Path to the folder containing all images
image_folder = os.path.join(dir_path, image_folder)

# Loop through each row in the CSV file
for index, row in df.iterrows():
    image_filename = row['image']
    class_label = str(row['label'])  # Ensure class labels are converted to strings

    # Create a folder for the class if it doesn't exist
    class_folder = os.path.join(image_folder, class_label)
    os.makedirs(class_folder, exist_ok=True)

    # Move the image to the class folder
    source_path = os.path.join(image_folder, image_filename)
    destination_path = os.path.join(class_folder, image_filename)

    # Check if the file exists before moving
    if os.path.exists(source_path):
        shutil.move(source_path, destination_path)
    else:
        print(f"File {image_filename} not found.")

print("Images organized into folders based on class labels.")

The data files are organized using the above program.

1. I will separate the csv files into 2 (lowercase.csv and english.csv)
2. The data in the csv will be renamed from Img/ImgName.png to just ImgName.png
3. I will run the program on both the csv files
4. The data images will be moved to their respective class folders for easier importation into the training model later.

## 3. Importing dependencies



In [None]:
import os
import torch
from torch import nn
from tqdm import tqdm
from torch.utils.data import DataLoader
from torchvision import transforms
import torchvision
from torchvision.datasets import ImageFolder
from torch.utils.tensorboard import SummaryWriter

We import the following dependencies to be used in the program.

os: The os module provides a way to interact with the operating system, and it is used here to handle file paths and directory operations.

torch: This is the core PyTorch library, which provides tensor computation functionalities. PyTorch is commonly used for building and training neural network models.

nn (torch.nn): The nn module in PyTorch provides classes for building and training neural networks. It includes various neural network layers, loss functions, and other utilities.

tqdm: tqdm is a library for adding progress bars to loops. In this code, it is used to visualize the progress of training epochs using the tqdm function.

DataLoader (torch.utils.data.DataLoader): The DataLoader class is part of PyTorch's torch.utils.data module, and it provides an efficient way to load and iterate over datasets during training.

transforms (torchvision.transforms): The transforms module from torchvision is used for defining and applying various image transformations. In this code, it is used to compose a set of transformations on the input images.

torchvision: This is a PyTorch library specifically designed for computer vision tasks. It includes datasets, models, and utilities for working with image data.

ImageFolder (torchvision.datasets.ImageFolder): ImageFolder is a dataset class from torchvision.datasets that simplifies the loading of image datasets by assuming a specific directory structure.

SummaryWriter (torch.utils.tensorboard.SummaryWriter): The SummaryWriter class is part of PyTorch's TensorBoard integration, allowing for the logging of training metrics and visualizations that can be viewed in TensorBoard.

## 4. Model (MLP) Class

In [None]:
class MLP(nn.Module):
    def __init__(self, input_size=224*224*3, output_size=62, lr=0.001):
        super(MLP, self).__init__()

        self.layers = nn.Sequential(
            nn.Flatten(),
            nn.Linear(input_size, 1024),
            nn.BatchNorm1d(1024),
            nn.ReLU(),
            nn.Dropout(0.5),  # Add dropout for regularization
            nn.Linear(1024, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, output_size),
        )

        self.lr = lr

    def forward(self, x):
        return self.layers(x)

    def configure_optimizers(self):
        #return torch.optim.SGD(self.parameters(), self.lr, momentum=0.9)
        return torch.optim.Adam(self.parameters(), self.lr)

    def loss(self, y_hat, y):
        fn = nn.CrossEntropyLoss()
        return fn(y_hat, y)

The design of the MLP (Multi-Layer Perceptron) model is a choice commonly made for image classification tasks. Here's a brief description of the design choices:

1. **Input Layer:**

The model starts with a Flatten() layer, which is used to flatten the input image tensor. This is necessary when dealing with image data, as it transforms the multi-dimensional input into a flat vector.
2. **Hidden Layers:**

The model consists of three fully connected (linear) hidden layers with 1024, 512, and 256 neurons, respectively. These layers are responsible for learning hierarchical representations of the input data, capturing complex patterns and features.
3. **Batch Normalization:**

Batch normalization layers (nn.BatchNorm1d()) are inserted after each linear layer. Batch normalization helps stabilize and accelerate the training process by normalizing the input of each layer across mini-batches.
4. **Activation Function:**

ReLU (Rectified Linear Unit) activation functions (nn.ReLU()) are applied after each batch normalization layer. ReLU introduces non-linearity to the model, allowing it to learn complex relationships in the data.
5. **Dropout:**

Dropout layers (nn.Dropout()) are included after the first and second linear layers. Dropout is a regularization technique that helps prevent overfitting by randomly setting a fraction of input units to zero during training. This helps improve the generalization of the model.
6. **Output Layer:**

The final linear layer produces the output with a size equal to the specified output_size. For classification tasks, a common practice is to use the softmax activation function on the output layer for obtaining probability distributions over different classes.
7. **Loss Function:**

The model uses the CrossEntropyLoss (nn.CrossEntropyLoss()) as the loss function. CrossEntropyLoss is suitable for multi-class classification problems, and it combines the softmax activation with the negative log likelihood loss.
8. **Optimizer:**

The model is configured to use the Adam optimizer (torch.optim.Adam()) during training. Adam is an adaptive learning rate optimization algorithm that is well-suited for a wide range of optimization problems.

Overall, this MLP architecture is a standard choice for image classification due to its simplicity, effectiveness, and the ability to capture hierarchical features in the data. The use of batch normalization, dropout, and appropriate activation functions contributes to better training stability and generalization.

## 5. Trainer Class

In [None]:
class Trainer:
    def __init__(self, n_epochs, patience=5, log_dir='logs'):
        self.max_epochs = n_epochs
        self.patience = patience
        self.current_patience = 0
        self.best_validation_loss = float('inf')
        self.log_dir = log_dir
        self.writer = SummaryWriter(log_dir=self.log_dir)  # Use self.log_dir
        self.global_step = 0  # Initialize global_step attribute
        self.validation_data = None

    def fit(self, model, data):
        self.data = data
        self.optimizer = model.configure_optimizers()
        self.model = model

        for epoch in range(self.max_epochs):
            current_loss = self.fit_epoch(epoch)

            # Calculate validation loss
            validation_loss = self.calculate_validation_loss()

            # Log validation loss
            self.writer.add_scalar('validation_loss', validation_loss, len(self.data) * epoch)

            # Check for early stopping
            if validation_loss < self.best_validation_loss:
                self.best_validation_loss = validation_loss
                self.current_patience = 0
            else:
                self.current_patience += 1

            if self.current_patience >= self.patience:
                print(f"Early stopping at epoch {epoch + 1} due to no improvement in validation loss.")
                break

        print("Training process has finished")

    def fit_epoch(self, epoch):
        current_loss = 0.0
        epoch_acc = 0

        for i, data in enumerate(tqdm(self.data, desc=f"Epoch {epoch+1}/{self.max_epochs}")):
            inputs, target = data
            inputs = inputs.to('cuda')
            target = target.to('cuda')

            self.optimizer.zero_grad()

            outputs = self.model(inputs.to(inputs.device))

            loss = self.model.loss(outputs, target)
            loss.backward()
            self.optimizer.step()

            current_loss += loss.item()

            # Accuracy calculation
            #correct_prediction = torch.argmax(outputs, 1) == target
            #correct_prediction = correct_prediction.sum()
            #epoch_acc += correct_prediction

            # Log training loss to TensorBoard
            if i % 10 == 9:
                self.writer.add_scalar('training_loss', current_loss / 10, len(self.data) * epoch + i)
                current_loss = 0.0

    def calculate_validation_loss(self):
        self.model.eval()
        validation_loss = 0.0
        total_batches = 0

        with torch.no_grad():
            for inputs, target in self.validation_data:
                inputs, target = inputs.to('cuda'), target.to('cuda')
                outputs = self.model(inputs)
                loss = self.model.loss(outputs, target)
                validation_loss += loss.item()
                total_batches += 1

        self.model.train()
        return validation_loss / total_batches if total_batches > 0 else 0.0

    def set_validation_data(self, validation_data):
        self.validation_data = validation_data

1. **Initialization:**

The class is initialized with parameters such as the maximum number of epochs (n_epochs) and an optional directory for logging (log_dir). The default value for log_dir is set to 'logs'.
It initializes the TensorBoard SummaryWriter (self.writer) to log training metrics for visualization.
2. **Training Process (fit method):**

The fit method is responsible for training the provided model (model) using the specified training data (data).
It sets up the optimizer using the model's configure_optimizers method.
It iterates over epochs and calls the fit_epoch method for each epoch.
3. **Training Epoch (fit_epoch method):**

The fit_epoch method handles the training for a single epoch. It iterates over the provided training data in batches.
For each batch, it performs the forward and backward pass, updating the model's parameters.
It calculates and logs the training loss and accuracy to TensorBoard every 10 batches.
4. **Validation Loss Calculation (calculate_validation_loss method):**

The calculate_validation_loss method evaluates the model on the validation data and returns the average validation loss.
It sets the model to evaluation mode (self.model.eval()) to disable dropout and batch normalization during validation.
The calculated validation loss is logged for monitoring the model's performance.
5. **Validation Data Setup (set_validation_data method):**

The set_validation_data method allows setting the validation data for later evaluation during training.
6. **Device Usage:**

The trainer assumes the use of a GPU ('cuda') for training and validation. It transfers input data and model to the GPU using .to('cuda').
The model is set back to training mode after validation (self.model.train()) to re-enable dropout and batch normalization.
7. **Logging to TensorBoard:**

The training loss are logged to TensorBoard using the SummaryWriter (self.writer). This enables real-time monitoring of the training progress.

The addition of TensorBoard logging further enhances the transparency of the training process, allowing for detailed monitoring of training and validation metrics. The design promotes modularity and flexibility, making it easy to adapt the trainer for various datasets and model architectures.

8. **Early Stopping:**

Early Stopping and Validation Patience:
The Trainer class now incorporates an early stopping mechanism to prevent overfitting by monitoring the validation loss. The early stopping is implemented with a patience parameter (self.patience), which represents the number of epochs with no improvement in validation loss before stopping the training process.

During each epoch, the Trainer calculates the validation loss using the calculate_validation_loss method. If the validation loss improves (i.e., decreases), the current patience counter (self.current_patience) is reset to zero. If there is no improvement, the counter is incremented.

The early stopping condition is checked after each epoch. If the current patience exceeds the specified patience value, the training process is halted, and a message is printed indicating that early stopping occurred.

This mechanism helps prevent the model from continuing to train when the validation loss ceases to improve, avoiding overfitting on the training data.

## **Overall**

The design choices aim to provide a flexible and organized structure for training neural network models, including features like validation, logging, and GPU acceleration. The use of TensorBoard facilitates visualizing key training metrics, aiding in model evaluation and tuning.

## 6. Main Function

In [None]:
if __name__ == "__main__":
    # Set environ to use CUDA
    os.environ['TORCH_USE_CUDA_DSA'] = '1'
    # Set the path to your local path
    current_directory = os.path.dirname(os.path.realpath(__file__))
    dataset_folder = 'Data'
    local_dataset_path = os.path.join(current_directory, dataset_folder)

    transform = transforms.Compose([
      transforms.ToTensor(),
      transforms.Resize((224, 224)) #900, 1200 => 224, 224
    ])

    train_dataset = torchvision.datasets.ImageFolder(root=local_dataset_path, transform=transform)
    batch_size = 30
    trainloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True)

    validation_dataset = torchvision.datasets.ImageFolder(root=local_dataset_path, transform=transform)
    validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=batch_size, shuffle=True, num_workers=4, pin_memory=True)

    print(train_dataset.classes)

    # 2. The MLP model
    mlp_model = MLP(lr=0.003)
    mlp_model = mlp_model.to('cuda')
    mlp_model.train()

    # 3. Training the network
    # 3.1. Creating the trainer class
    trainer = Trainer(n_epochs=35)
    trainer.set_validation_data(trainloader)

    # 3.2. Training the model
    trainer.fit(mlp_model, trainloader)

    classes = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a_', 'b_', 'c_', 'd_', 'e_', 'f_', 'g_', 'h_', 'i_', 'j_', 'k_', 'l_', 'm_', 'n_', 'o_', 'p_', 'q_', 'r_', 's_', 't_', 'u_', 'v_', 'w_', 'x_', 'y_', 'z_']

    testset = torchvision.datasets.ImageFolder(root=local_dataset_path, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=True, num_workers=4, pin_memory=True)

    dataiter = iter(testloader)
    images, labels = next(dataiter)

    images = images.to('cuda')
    labels = labels.to('cuda')

    # calculate accuracy
    def calculate_accuracy(model, dataloader, device='cuda'):
        model.eval()  # Set the model to evaluation mode
        correct_predictions = 0
        total_samples = 0

        with torch.no_grad():
            for inputs, labels in dataloader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                _, predicted = torch.max(outputs, 1)
                correct_predictions += (predicted == labels).sum().item()
                total_samples += labels.size(0)

        accuracy = correct_predictions / total_samples
        model.train()  # Set the model back to training mode
        return accuracy

    #train_accuracy = calculate_accuracy(mlp_model, trainloader)
    #print(f'Training Accuracy: {train_accuracy * 100:.2f}%')

    test_accuracy = calculate_accuracy(mlp_model, testloader)
    print(f'Test Accuracy: {test_accuracy * 100:.2f}%')
    print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(images.shape[0])))

    output = mlp_model(images)
    estimated_labels = torch.max(output, 1).indices

    print('Estimated Labels: ', ' '.join(f'{classes[estimated_labels[j]]:5s}' for j in range(images.shape[0])))

The main function in this code serves as the entry point for executing the entire script. Here's a short description of the design choices made in the main function:

1. **CUDA Environment Configuration:**

The script sets an environment variable ('TORCH_USE_CUDA_DSA') to enable CUDA (GPU) usage. This ensures that the code takes advantage of GPU acceleration if available.
2. **Dataset Loading and Transformation:**

The script defines a set of transformations using transforms.Compose from torchvision.transforms to preprocess the input images. These transformations include converting images to tensors and resizing them to a specific size (224x224).
It creates instances of ImageFolder for the training, validation, and test datasets using the specified transformation.
3. **Model Initialization:**

An instance of the MLP model (mlp_model) is created with a specified learning rate (lr=0.003). The model is then moved to the GPU ('cuda') using the .to('cuda') method.
4. **Trainer Initialization and Training:**

An instance of the Trainer class (trainer) is created with a maximum number of epochs set to 25. The validation data is set to the training loader using the set_validation_data method.
The model is trained using the fit method of the Trainer class, where the training loader is provided as the training data.
5. **Accuracy Calculation on Test Set:**

The script calculates the accuracy of the trained model on a separate test set using the calculate_accuracy function. The results are printed to the console.
6. **Printing Ground Truth and Estimated Labels:**

The ground truth labels and the labels estimated by the model are printed to the console for visual inspection.

Overall, the main function orchestrates the entire training and evaluation process, from setting up the environment to training the model, calculating accuracy, and providing insights into the model's predictions on a test set. The design aims to be modular, readable, and organized, separating concerns into distinct sections for clarity.

## Parameters & Weightage

In this particular program, the parameters have been fine-tuned as follows:

1. Learning Rate: 0.003
2. Training Data Batch Size: 30
3. Validation Data Batch Size: 30
4. Test Data Batch Size: 100
5. Number of Epochs: 35

From the results of monitoring performance through TensorBoard, it was discovered that the least loss were observed for learning rates between 0.0001 to less than 1. Hence, Learning Rate is further adjusted to improve on the training speed and accuracy.

Both the training data and validation data batch size is set to 30 as an effort to optimize the training speed while not jeopardizing the accuracy of the model.

The test data batch size is set to 100 in order to test the model against a larger amount of data at once.

Lastly, for the number of epochs, several tests were done with the same configuration above, here are the findings:

1. 1 Epoch : Test Accuracy: 12.73%
2. 5 Epochs : Test Accuracy: 45.54%
3. 10 Epochs : Test Accuracy: 70.79%
4. 20 Epochs : Test Accuracy: 85.51%
5. 25 Epochs : Test Accuracy: 89.12%
6. 30 Epochs : Test Accuracy: 92.52%
7. 35 Epochs : Test Accuracy: 95.07%
8. 40 Epochs : Test Accuracy: 93.99%

Each Epoch takes roughly 35-40s to complete on my system.
As we can see from the findings, every X increase on the number of epochs resulted in diminishing returns (Probably due to overfitting). With 35-40 Epoch resulting in a loss in accuracy. Hence, we can deduce 35 Epochs might be the optimal number of epochs to train the model.

We can run it at 60 batch size for training and validation data, and at 35 Epoch and it resulted in a **Test Accuracy of 92.67%**, the loss in accuracy doesn't justify the faster speed of training brought by the increase batch size as the speed increase is minimal.

When we try to run 35 Epochs on 0.03 Learning Rate instead of 0.003 Learning Rate, we end up with a poor accuracy of **1.67%**, hence, I believe that 0.003 Learning Rate would be the optimal Learning Rate for this particular model.

Now to finetune the batch size of the training and validation model, we try running a lower number of epoch with a lower batch size.

1. 16 Batch Size, 1 Epoch : Test Accuracy 12.61%
2. 16 Batch Size, 5 Epochs : Test Accuracy 49.33%
3. 16 Batch Size, 10 Epochs : Test Accuracy 66.28%
4. 16 Batch Size, 20 Epochs : Test Accuracy 83.99%
5. 16 Batch Size, 35 Epochs : Test Accuracy: 91.38%

As we can see, having a lower batch size did not result in a better overall accuracy while slowing down the training time, hence from this we can conclude that **30 batch size** is ideal.

It is also interesting to note that LeNet CNN model performs much better than MLP model, achieving 97% Test Accuracy with just 11 Epochs on 30 Batch Size.

Final Parameters:

1. Learning Rate: 0.003
2. Training Data Batch Size: 30
3. Validation Data Batch Size: 30
4. Test Data Batch Size: 100
5. Number of Epochs: 35
