# Deep Learning with Lightning

This lab is design for investigating the basic architecture and parameter searching techniques in deep learning literature. We use a fundamental dataset, MNIST, with the help of PyTorch Lightning and design our model to handle image classification task.

#### Instructions

* You can achieve up to **20 points** for this graded notebook. The points for each task are clearly declared in the task descriptions. Fill in the missing code fragments and answer questions whenever you see this symbol: &#x1F536;. Please do not change any of the provided code. Notice that one symbol &#x1F536; does NOT mean one-line code: sometimes it can require several code of lines.

* Team work is not allowed! Everybody implements his/her own code. Discussing issues with others is fine, sharing code with others is not. 

* If you use any code fragments found on the internet, make sure you reference them properly.

* The responsible TA for this lab are **Yuchang** and **Sara**, if you have further questions please reach out to them directly: **yuchang.jiang@uzh.ch** and **sara.zoccheddu@uzh.ch**.

* Since the lab sessions are specifically designed to answer your questions please make sure to attend those and only reach out if further questions pop up later.

* Hand in your solution via OLAT until <span style="color:#4ea373">**15.05.2025**</span>. Make sure that all cells are execute as we will not rerun any code. Any cell that is not executed will automatically result in 0 points for this task. 
</div>

#### Suggestions

Please install pytorch lightning packages via conda or pip before starting the lab session. You can use the tutorials of [Pytorch Lightning](https://lightning.ai/docs/pytorch/stable/notebooks/lightning_examples/mnist-hello-world.html).

Please also check: [Tensorboard](https://pytorch.org/tutorials/recipes/recipes/tensorboard_with_pytorch.html)

For installing lightning: [Link](https://lightning.ai/pytorch-lightning)

Please keep in mind that this is a valuable opportunity to develop self-learning skills. When working with a new package like PyTorch Lightning, always refer to the official documentation or search for solutions online when encountering errors, rather than immediately asking a friend or TA. This habit will greatly enhance your ability to troubleshoot and learn independently.

#### Task Overview

------------------------------------------------------------------------------------------

1. **Datasets** <span style="color:#4ea373">**[3pt]**</span>
2. **NN Architectures** <span style="color:#4ea373">**[5pt]**</span>
2. **Experiments** <span style="color:#4ea373">**[5pt]**</span>
2. **Results** <span style="color:#4ea373">**[4pt]**</span>
2. **Conclusions** <span style="color:#4ea373">**[3pt]**</span>

In [4]:
# Basic Machine Learning Modules
import pandas
import numpy
import sklearn

# Deep Learning Modules
import torch
import lightning.pytorch as pl

# Visualization Modules
import matplotlib.pyplot as plt

# Others
from torchvision.datasets import MNIST
from torchvision import transforms
from torchmetrics import Accuracy
from lightning.pytorch.loggers import CSVLogger, TensorBoardLogger
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch.callbacks import TQDMProgressBar
import warnings
warnings.filterwarnings("ignore")  # You can comment out this line while debuging, not in submission!

numpy.random.seed(42)
torch.set_float32_matmul_precision('medium')
torch.manual_seed(42)
pl.seed_everything(42, workers=True)

DATASET_PATH = '/Users/merterol/Desktop/iMac27_github/uzh/Computational Science/Sem 4/PHY371/MNIST'  # You can change them
EXPERIMENTS_PATH = '/Users/merterol/Desktop/iMac27_github/uzh/Computational Science/Sem 4/PHY371/MNIST_exp'  # You can change them

Seed set to 42


# 1) Dataset (3 Points)

- Design a DataModule for MNIST.
- Use 80/20 % split for train/val sets.

In [None]:
class MNISTDataModule(pl.LightningDataModule):
    def __init__(self, data_folder: str = DATASET_PATH, batch_size: int = 64, num_cpu: int = 1):
        super().__init__()
        self.path = data_folder
        self.batch_size = batch_size
        self.num_cpu = num_cpu
        self.transform = transforms.Compose([
            transforms.ToTensor(), 
            transforms.Normalize((0.1307,), (0.3081,))
        ])
    
    def prepare_data(self) -> None:
        # 🔶 TODO: Download MNIST Data in self.path

    
    def setup(self, stage: str = 'fit') -> None:
        # 🔶 TODO: Insert your code 
        # HINT: how to split the whole dataset into train, val sets? 
        # and how should those sets be used in different stages?
        
        if stage in ['fit', 'tune']:
            self.train_dataset = ...  # 🔶 TODO: Insert your code 
        
        if stage in ['fit', 'tune', 'validate']:
            self.val_dataset = ...  # 🔶 TODO: Insert your code 
        
        elif stage in ['test', 'predict']:
            self.test_dataset = ...  # 🔶 TODO: Insert your code 
        
        else:
            raise NotImplementedError('Unknown Stage: {}'.format(stage))
            
    def train_dataloader(self) -> torch.utils.data.DataLoader:
        return torch.utils.data.DataLoader(
            batch_size=self.batch_size, num_workers=self.num_cpu,  # DO NOT CHANGE IN VAL/TEST LOADERS
            dataset=self.train_dataset, shuffle=True,  # Could be changed in val/test loaders
        )
    def val_dataloader(self) -> torch.utils.data.DataLoader:
        # 🔶 TODO: Insert your code
        # HINT: check docs of 'torch.utils.data.DataLoader'.
    
    def test_dataloader(self) -> torch.utils.data.DataLoader:
        # 🔶 TODO: Insert your code
        # HINT: what's the difference between 'val_dataloader' and 'test_dataloader'?
    
    def predict_dataloader(self) -> torch.utils.data.DataLoader:
        return self.test_dataloader()

data_module = MNISTDataModule()
data_module.prepare_data()
data_module.setup('fit')
dl = data_module.train_dataloader()
print(next(dl.__iter__()))  # DO NOT CHANGE | will be used for checking

IndentationError: expected an indented block after function definition on line 43 (2743504745.py, line 47)

In [None]:
# Do not change! Only for checking.
print('Shape of Images: [B x C x H x W] = ', next(dl.__iter__())[0].shape)
print('Shape of Labels: [B] = ', next(dl.__iter__())[1].shape)

## 2) Neural Network Architecture (5 Points)

- Design an model with 3 Convolutional layers and 1 fully-connected layer, in this order:
    - Convolution: Kernel size = 3x3, padding = 'same', number of filters = 4
    - Convolution: Kernel size = 3x3, padding = 'same', number of filters = 8
    - Convolution: Kernel size = 3x3, padding = 'same', number of filters = 4
    - Linear: No Bias, num_class = 10 in MNIST Dataset
- Use rectified linear unit (ReLU) for activation function (when necessary). 
- Initialize all the weights with `xavier_uniform`.

In [None]:
class RawModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # 🔶 TODO: Insert your code
        # HINT: define model layers based on the given model design (Conv2d...)
        
        self.model = ...
        self.model.apply(self.initialize_weights)
    
    @staticmethod
    def initialize_weights(module: torch.nn.Module) -> None:
        # 🔶 TODO: Insert your code
        
    def forward(self, images: torch.Tensor) -> torch.Tensor:
        """
        Input: torch.Tensor | dtype=torch.float | shape=[B, C, H, W]
        Output: torch.Tensor | dtype=torch.float | shape=[B, num_class]
        """
        # 🔶 TODO: Insert your code

model = RawModel()
with torch.no_grad():  # DO NOT REMOVE THIS LINES BELOW
    sample_image = torch.rand(size=(4, 1, 28, 28))
    output = model(sample_image)
    print(output.shape, output.dtype)  
    print(output)


## 3) Experiments (5 Points)

- Define training/validation/test step and optimizer with cross-entropy loss and `AdamW` optimizer.
- Use accuracy scores for monitoring the experiment. (you can use multiclass accuracy from Lightning Metrics)

In [None]:
class MNISTExperiment(pl.LightningModule):
    def __init__(self, learning_rate: float = 1e-3):
        super().__init__()
        self.model = RawModel()
        self.learning_rate = learning_rate
        
        self.train_scores = ...  # 🔶 TODO: Insert your code
        # HINT: define the metric used in this classification task.
        self.validation_scores = ...  # 🔶 TODO: Insert your code
        self.test_scores = ...  # 🔶 TODO: Insert your code

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.model(x)

    def training_step(self, batch, batch_idx) -> torch.Tensor:
        x, y = batch
        y_hat = self(x)
        
         # 🔶 TODO: Insert your code: Loss Calculation
        loss = ...
        # 🔶 TODO: Insert your code: Score Calculation
        
        self.log('train_loss', loss)
        self.log('train_accuracy', ...)
        return loss
    
    def validation_step(self, batch, batch_idx) -> None:
        # 🔶 TODO: Insert your code
        self.log('validation_accuracy', ...)
    
    def test_step(self, batch, batch_idx) -> None:
        # 🔶 TODO: Insert your code
        self.log('test_accuracy', ...)

    def configure_optimizers(self):
        optimizer = ...  # 🔶 TODO: Insert your code 
        return optimizer

experiment = MNISTExperiment()

- Define a lightning `Trainer`:
    - Maximum epoch = 20
    - accelerator = 'auto'
    - Use `CSVLogger` and `TensorBoardLogger`
    - Use `EarlyStopping` with patience epoch = 3
    - Use `TQDMProgressBar` with refresh rate = 10

In [None]:
trainer = pl.Trainer(
    ...  # 🔶 Insert your code
    # HINT: check docs of pytorch lightning Trainer, and understand its flags.
)

## 4) Results (4 Points)
- Train and test the `RawModel` and plot the score and loss values versus epoch.

In [None]:
# 🔶 Insert your code
# HINT: trainer.fit(...) and trainer.test(...)

- Re-design the model with `Dropout` layer in-between the convolutional layers and re-train the model. as like `RawModel` and create a new module as `ModelWithDropout`. Try:
    - dropout probability = 0.1
    - dropout probability = 0.5
    - dropout probability = 0.9

In [None]:
# 🔶 Insert your code
# HINT: 
# re-create a new model class like 'class RawModel(torch.nn.Module)'. 
# add Dropout layers.
# then run trainer.fit(...) and trainer.test(...) with this new model architecture.

- Re-design the model with `BatchNorm` layer in-between the convolutional layers and re-train the model.

In [None]:
# 🔶 Insert your code

## 5) Conclusion (3 Point)
Comment on your findings:
- Which method is better? Why?

ANSWER: ... # 🔶 TODO

- Are the results statistically significant? If not, how can we get significant ones?

ANSWER: ... # 🔶 TODO

* Comment on different normalization techniques: `BatchNorm`, `LayerNorm` , `InstanceNorm` and `GroupNorm`. Explain the purpose of usage of them in general (for the RGB datasets, etc).

ANSWER: ... # 🔶 TODO

* Explain the operations of the `Dropout` layer in training and testing phase. Is there any difference, or are they always the same?

ANSWER: ... # 🔶 TODO

* Explain the difference between `AdamW` and `Adam` optimizers in a few sentences.

Hint: https://arxiv.org/abs/1711.05101 for more information.

ANSWER: ... # 🔶 TODO

* Compare the Kaiming (He) and Xavier (Glorot) Initialization techniques and explain their differences in a few sentences. 

Hint: You can find more information from their original papers.

ANSWER: ... # 🔶 TODO