# Lab-8: Deep Learning with Lightning

This lab is design for investigating the basic architecture and parameter searching techniques in deep learning literature. We use a fundamental dataset, MNIST, with the help of PyTorch Lightning and design our model to handle image classification task.

### General Announcements

* The exercises on this sheet are graded by a maximum of **14 points**. You will be asked to implement several functions.
* Team work is not allowed! Everybody implements his/her own code. Discussing issues with others is fine, sharing code with others is not. 
* If you use any code fragments found on the Internet, make sure you reference them properly.
* You can send your questions via email to the TAs until the deadline.

### Suggestions

Please install pytorch lightning packages via conda or pip before starting the lab session. You can use the tutorials of [Pytorch Lightning](https://lightning.ai/docs/pytorch/stable/notebooks/lightning_examples/mnist-hello-world.html).

Please also check: [Tensorboard](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html)

For installing lightning: [Link](https://lightning.ai/pytorch-lightning)

In [None]:
# Basic Machine Learning Modules
import pandas
import numpy
import sklearn

# Deep Learning Modules
import torch
import lightning.pytorch as pl

# Visualization Modules
import matplotlib.pyplot as plt

# Others
from torchvision.datasets import MNIST
from torchvision import transforms
from torchmetrics import Accuracy
from lightning.pytorch.loggers import CSVLogger, TensorBoardLogger
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch.callbacks import TQDMProgressBar
import torchmetrics
import warnings
warnings.filterwarnings("ignore")

from pytorch_lightning import Trainer
from tqdm.auto import tqdm
from pytorch_lightning.callbacks import EarlyStopping, PlotLossesCallback

DATASET_PATH = r'C:\Users\tunay\OneDrive\Desktop\github_repos\MachineLearningforSciences\Datasets'  # You can change them
EXPERIMENTS_PATH = r'C:\Users\tunay\OneDrive\Desktop\github_repos\MachineLearningforSciences\Datasets'  # You can change them

# 1) Dataset (3 Points)

- Design a DataModule for MNIST.
- Use 80/20 % split for train/val sets.

In [2]:
import multiprocessing

num_cpus = multiprocessing.cpu_count()
print("Number of CPUs:", num_cpus)

Number of CPUs: 20


In [3]:
class MNISTDataModule(pl.LightningDataModule):
    def __init__(self, data_folder: str = DATASET_PATH, batch_size: int = 64, num_cpu: int = 1):
        super().__init__()
        self.path = data_folder
        self.batch_size = batch_size
        self.num_cpu = num_cpu
        self.transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
    
    def prepare_data(self) -> None:
        # Download MNIST Data in self.path

        MNIST(self.path, train=True, download=True)
        MNIST(self.path, train=False, download=True)
    
    def setup(self, stage: str = 'fit') -> None:
        if stage in ['fit', 'tune']:
            self.train_dataset = MNIST(self.path, train=True, transform=self.transform)
        
        if stage in ['fit', 'tune', 'validate']:
            self.val_dataset = MNIST(self.path, train=False, transform=self.transform) 
        
        elif stage in ['test', 'predict']:
            self.test_dataset = MNIST(self.path, train=False, transform=self.transform) 
        
        else:
            raise NotImplementedError('Unknown Stage: {}'.format(stage))
            
    def train_dataloader(self) -> torch.utils.data.DataLoader:
        return torch.utils.data.DataLoader(
            batch_size=self.batch_size, num_workers=self.num_cpu,  # DO NOT CHANGE IN VAL/TEST LOADERS
            dataset=self.train_dataset, shuffle=True,  # Could be changed in val/test loaders
        )
    def val_dataloader(self) -> torch.utils.data.DataLoader:
        # Insert your code
        return torch.utils.data.DataLoader(
            batch_size=self.batch_size, num_workers=self.num_cpu,  # DO NOT CHANGE IN VAL/TEST LOADERS
            dataset=self.val_dataset, shuffle=False,  # Could be changed in val/test loaders
        )

    
    def test_dataloader(self) -> torch.utils.data.DataLoader:
        # Insert your code
        return torch.utils.data.DataLoader(
            batch_size=self.batch_size, num_workers=self.num_cpu,  # DO NOT CHANGE IN VAL/TEST LOADERS
            dataset=self.test_dataset, shuffle=False,  # Could be changed in val/test loaders
        )
    
    def predict_dataloader(self) -> torch.utils.data.DataLoader:
        return test_dataloader()

data_module = MNISTDataModule(num_cpu=20)
data_module.prepare_data()
data_module.setup('fit')
dl = data_module.train_dataloader()
print(next(dl.__iter__()))

[tensor([[[[-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          ...,
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242]]],


        [[[-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          ...,
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242]]],


        [[[-0.4242, -0.4242, -0.4242,  ..., -0.4242, -0.4242, -0.4242],
          [-0.4242, -0.42

In [4]:
# Do not change! Only for checking
print('Shape of Images: [B x C x H x W] = ', next(dl.__iter__())[0].shape)
print('Shape of Labels: [B] = ', next(dl.__iter__())[1].shape)

Shape of Images: [B x C x H x W] =  torch.Size([64, 1, 28, 28])
Shape of Labels: [B] =  torch.Size([64])


## 2) Neural Network Architecture (2 Points)

- Design an model with 3 Convolutional layers and 1 fully-connected layer, in this order:
    - Convolution: Kernel size = 3x3, padding = 'same', number of filters = 4
    - Convolution: Kernel size = 3x3, padding = 'same', number of filters = 8
    - Convolution: Kernel size = 3x3, padding = 'same', number of filters = 4
    - Linear: No Bias, num_class = 10 in MNIST Dataset
- Use rectified linear unit (ReLU) for activation function. 
- Initialize all the weights with `xavier_uniform`.

In [5]:
class RawModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Insert your code
        self.conv1 = torch.nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3, padding ='same')
        self.conv2 = torch.nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, padding ='same')
        self.conv3 = torch.nn.Conv2d(in_channels=8, out_channels=4, kernel_size=3, padding ='same')
        self.fc = torch.nn.Linear(in_features = 4*28*28, out_features=10, bias=False)
        
        self.apply(self.initialize_weights)

    
    @staticmethod
    def initialize_weights(module: torch.nn.Module) -> None:
        # Insert your code
        if isinstance(module, (torch.nn.Conv2d, torch.nn.Linear)):   #https://stackoverflow.com/questions/49433936/how-do-i-initialize-weights-in-pytorch
            torch.nn.init.xavier_uniform_(module.weight)
        
    def forward(self, images: torch.Tensor) -> torch.Tensor:
        """
        Input: torch.Tensor | dtype=torch.float | shape=[B, C, H, W]
        Output: torch.Tensor | dtype=torch.float | shape=[B, num_class]
        """
        # Insert your code
        x = self.conv1(images)
        x = torch.nn.functional.relu(x)
        x = self.conv2(x)
        x = torch.nn.functional.relu(x)
        x = self.conv3(x)
        x = torch.nn.functional.relu(x)
        x = x.view(x.size(0),-1)
        return self.fc(x)

model = RawModel()
with torch.no_grad():
    sample_image = torch.rand(size=(4, 1, 28, 28))
    output = model(sample_image)
    print(output.shape, output.dtype)


torch.Size([4, 10]) torch.float32


## 3) Experiments (4 Points)

- Define training/validation/test step and optimizer with cross-entropy loss and Adam optimizer.
- Use accuracy scores for monitoring the experiment. (multiclass accuracy from Lightning Metrics)

In [6]:
from pytorch_lightning.utilities.types import TRAIN_DATALOADERS
import torch
import torch.nn as nn
import torchmetrics
import pytorch_lightning as pl

class MNISTExperiment(pl.LightningModule):
    def __init__(self, learning_rate: float = 1e-3):
        super().__init__()
        self.model = RawModel()
        self.learning_rate = learning_rate

        self.train_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)  # Placeholder for training scores
        self.validation_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)   # Placeholder for validation scores
        self.test_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)   # Placeholder for test scores

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.model(x)

    def training_step(self, batch, batch_idx) -> torch.Tensor:
        x, y = batch
        y_hat = self(x)

        loss = nn.CrossEntropyLoss()(y_hat, y)
        self.log('train_loss', loss)

        preds = torch.argmax(y_hat, dim=1)
        self.train_scores(preds, y)

        self.log('train_accuracy', self.train_scores, on_epoch=True, prog_bar=True)
        return loss

    def validation_step(self, batch, batch_idx) -> None:
        x, y = batch
        y_hat = self(x)

        preds = torch.argmax(y_hat, dim=1)
        self.validation_scores(preds, y)

    def test_step(self, batch, batch_idx) -> None:
        x, y = batch
        y_hat = self(x)

        preds = torch.argmax(y_hat, dim=1)
        self.test_scores(preds, y)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer
    
    def train_dataloader(self):
        transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        dataset = MNIST('data/', train=True, download=True, transform=transform)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True)
        return dataloader

experiment = MNISTExperiment()


- Define a trainer of lightning.
    - Maximum epoch = 20
    - accelerator = 'auto'
    - Use `CSVLogger` and `TensorBoardLogger`
    - Use `EarlyStopping` with patience epoch = 3
    - Use `TQDMProgressBar` with refresh rate = 10

In [None]:
plot_losses = PlotLossesCallback()

# Define the loggers
csv_logger = CSVLogger('logs', name='mnist_logs')
tensorboard_logger = TensorBoardLogger('logs', name='mnist_logs')

# Define the early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Define the trainer
trainer = Trainer(
    max_epochs=20,
    accelerator='auto',
    logger=[csv_logger, tensorboard_logger],
    callbacks=[early_stopping, plot_losses]
)


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


In [None]:
trainer = Trainer(max_epochs=20, accelerator='auto')
trainer.fit(experiment)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name              | Type               | Params
---------------------------------------------------------
0 | model             | RawModel           | 32.0 K
1 | train_scores      | MulticlassAccuracy | 0     
2 | validation_scores | MulticlassAccuracy | 0     
3 | test_scores       | MulticlassAccuracy | 0     
---------------------------------------------------------
32.0 K    Trainable params
0         Non-trainable params
32.0 K    Total params
0.128     Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=20` reached.


## 4) Results (4 Points)
- Train and test the `RawModel` and plot the score and loss values versus epoch.

- Re-design the model with dropout layer in-between the convolutional layers and re-train the model. as like `RawModel` and create a new module as `ModelWithDropout`. Try:
    - dropout probability = 0.1
    - dropout probability = 0.5
    - dropout probability = 0.9

In [24]:
# Insert your code
class ModelWithDropout(torch.nn.Module):
    def __init__(self, dropout_prob):
        super().__init__()
        # Insert your code
        self.conv1 = torch.nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3, padding ='same')
        self.drop_out1 = torch.nn.Dropout(dropout_prob)
        self.conv2 = torch.nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, padding ='same')
        self.drop_out2 = torch.nn.Dropout(dropout_prob)
        self.conv3 = torch.nn.Conv2d(in_channels=8, out_channels=4, kernel_size=3, padding ='same')
        self.drop_out3 = torch.nn.Dropout(dropout_prob)
        self.fc = torch.nn.Linear(in_features = 4*28*28, out_features=10, bias=False)
        
        self.apply(self.initialize_weights)

    
    @staticmethod
    def initialize_weights(module: torch.nn.Module) -> None:
        # Insert your code
        if isinstance(module, (torch.nn.Conv2d, torch.nn.Linear)):   #https://stackoverflow.com/questions/49433936/how-do-i-initialize-weights-in-pytorch
            torch.nn.init.xavier_uniform_(module.weight)
        
    def forward(self, images: torch.Tensor) -> torch.Tensor:
        """
        Input: torch.Tensor | dtype=torch.float | shape=[B, C, H, W]
        Output: torch.Tensor | dtype=torch.float | shape=[B, num_class]
        """
        # Insert your code
        x = self.conv1(images)
        x = torch.nn.functional.relu(x)
        x = self.drop_out1(x)
        x = self.conv2(x)
        x = torch.nn.functional.relu(x)
        x = self.drop_out2(x)
        x = self.conv3(x)
        x = torch.nn.functional.relu(x)
        x = self.drop_out3(x)
        x = x.view(x.size(0),-1)
        return self.fc(x)

model = RawModel()
with torch.no_grad():
    sample_image = torch.rand(size=(4, 1, 28, 28))
    output = model(sample_image)
    print(output.shape, output.dtype)

torch.Size([4, 10]) torch.float32


- Re-design the model with batchnorm layer in-between the convolutional layers and re-train the model.

In [25]:
class BatchNormModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Insert your code
        self.conv1 = torch.nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3, padding='same')
        self.batch_norm1 = torch.nn.BatchNorm2d(4)
        self.conv2 = torch.nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, padding='same')
        self.batch_norm2 = torch.nn.BatchNorm2d(8)
        self.conv3 = torch.nn.Conv2d(in_channels=8, out_channels=4, kernel_size=3, padding='same')
        self.batch_norm3 = torch.nn.BatchNorm2d(4)
        self.fc = torch.nn.Linear(in_features=4 * 28 * 28, out_features=10, bias=False)
        
        self.apply(self.initialize_weights)

    
    @staticmethod
    def initialize_weights(module: torch.nn.Module) -> None:
        # Insert your code
        if isinstance(module, (torch.nn.Conv2d, torch.nn.Linear)):   #https://stackoverflow.com/questions/49433936/how-do-i-initialize-weights-in-pytorch
            torch.nn.init.xavier_uniform_(module.weight)
        
    def forward(self, images: torch.Tensor) -> torch.Tensor:
        """
        Input: torch.Tensor | dtype=torch.float | shape=[B, C, H, W]
        Output: torch.Tensor | dtype=torch.float | shape=[B, num_class]
        """
        # Insert your code
        x = self.conv1(images)
        x = torch.nn.functional.relu(x)
        x = self.conv2(x)
        x = torch.nn.functional.relu(x)
        x = self.conv3(x)
        x = torch.nn.functional.relu(x)
        x = x.view(x.size(0),-1)
        return self.fc(x)

model = RawModel()
with torch.no_grad():
    sample_image = torch.rand(size=(4, 1, 28, 28))
    output = model(sample_image)
    print(output.shape, output.dtype)


torch.Size([4, 10]) torch.float32


In [None]:
# MODEL WITH BATCH NORMALIZATION TRAINING

class MNISTExperiment(pl.LightningModule):
    def __init__(self, learning_rate: float = 1e-3):
        super().__init__()
        self.model = BatchNormModel()
        self.learning_rate = learning_rate

        self.train_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)  # Placeholder for training scores
        self.validation_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)   # Placeholder for validation scores
        self.test_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)   # Placeholder for test scores

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.model(x)

    def training_step(self, batch, batch_idx) -> torch.Tensor:
        x, y = batch
        y_hat = self(x)

        loss = nn.CrossEntropyLoss()(y_hat, y)
        self.log('train_loss', loss)

        preds = torch.argmax(y_hat, dim=1)
        self.train_scores(preds, y)

        self.log('train_accuracy', self.train_scores, on_epoch=True, prog_bar=True)
        return loss

    def validation_step(self, batch, batch_idx) -> None:
        x, y = batch
        y_hat = self(x)

        preds = torch.argmax(y_hat, dim=1)
        self.validation_scores(preds, y)

    def test_step(self, batch, batch_idx) -> None:
        x, y = batch
        y_hat = self(x)

        preds = torch.argmax(y_hat, dim=1)
        self.test_scores(preds, y)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer
    
    def train_dataloader(self):
        transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        dataset = MNIST('data/', train=True, download=True, transform=transform)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True)
        return dataloader

experiment2 = MNISTExperiment()

In [None]:
plot_losses = PlotLossesCallback()

# Define the loggers
csv_logger = CSVLogger('logs', name='mnist_logs')
tensorboard_logger = TensorBoardLogger('logs', name='mnist_logs')

# Define the early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Define the trainer
trainer = Trainer(
    max_epochs=20,
    accelerator='auto',
    logger=[csv_logger, tensorboard_logger],
    callbacks=[early_stopping, plot_losses]
)

In [23]:
trainer = Trainer(max_epochs=20, accelerator='auto')
trainer.fit(experiment)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name              | Type               | Params
---------------------------------------------------------
0 | model             | RawModel           | 32.0 K
1 | train_scores      | MulticlassAccuracy | 0     
2 | validation_scores | MulticlassAccuracy | 0     
3 | test_scores       | MulticlassAccuracy | 0     
---------------------------------------------------------
32.0 K    Trainable params
0         Non-trainable params
32.0 K    Total params
0.128     Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]

In [31]:
# MODEL WITH DROPOUT TRAINING

class MNISTExperiment(pl.LightningModule):
    def __init__(self, learning_rate: float = 1e-3):
        super().__init__()
        self.model = ModelWithDropout(dropout_prob=0.1)
        self.learning_rate = learning_rate

        self.train_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)  # Placeholder for training scores
        self.validation_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)   # Placeholder for validation scores
        self.test_scores = torchmetrics.Accuracy(compute_on_step=False, task='multiclass', num_classes=10)   # Placeholder for test scores

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.model(x)

    def training_step(self, batch, batch_idx) -> torch.Tensor:
        x, y = batch
        y_hat = self(x)

        loss = nn.CrossEntropyLoss()(y_hat, y)
        self.log('train_loss', loss)

        preds = torch.argmax(y_hat, dim=1)
        self.train_scores(preds, y)

        self.log('train_accuracy', self.train_scores, on_epoch=True, prog_bar=True)
        return loss

    def validation_step(self, batch, batch_idx) -> None:
        x, y = batch
        y_hat = self(x)

        preds = torch.argmax(y_hat, dim=1)
        self.validation_scores(preds, y)

    def test_step(self, batch, batch_idx) -> None:
        x, y = batch
        y_hat = self(x)

        preds = torch.argmax(y_hat, dim=1)
        self.test_scores(preds, y)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer
    
    def train_dataloader(self):
        transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
        dataset = MNIST('data/', train=True, download=True, transform=transform)
        dataloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True)
        return dataloader

experiment2 = MNISTExperiment()

In [None]:
plot_losses = PlotLossesCallback()

# Define the loggers
csv_logger = CSVLogger('logs', name='mnist_logs')
tensorboard_logger = TensorBoardLogger('logs', name='mnist_logs')

# Define the early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Define the trainer
trainer = Trainer(
    max_epochs=20,
    accelerator='auto',
    logger=[csv_logger, tensorboard_logger],
    callbacks=[early_stopping, plot_losses]
)

In [None]:
trainer = Trainer(max_epochs=20, accelerator='auto')
trainer.fit(experiment)

## 5) Conclusion (1 Point)
Comment on your findings:
- Show the results in a table via Pandas
- Which method is better? Why?
- Are the results significant? If not, how can we get significant ones?

I had problems with training eventhough I had a powerful machine. I believe the versions did not match so pytorch could not work well with my machine. 

Bath normalization would speed up the process since it normalizes the inputs of each layer by scaling the activations and dropout temporarily drops out randomly chosen neurons which helps the neural network to overcome overfitting problem and enable the model to adjust better. 