This notebook was motivated by

[2] Kaiming He et al. ‘Deep Residual Learning for Image Recognition’. In: CoRR abs/1512.03385 (2015). arXiv: 1512.03385.
url: http: //arxiv.org/abs/1512.03385.

Implementation: Oleh Bakumenko, University of Duisburg-Essen

# Imports

In [1]:
import sys
sys.path.append("../")
import os
import numpy as np
import time
import matplotlib.pyplot as plt
import torch, torch.nn as nn
import torchvision, torchvision.transforms as tt
from torchsummary import summary
from torch.multiprocessing import Manager
torch.multiprocessing.set_sharing_strategy("file_system")
from pathlib import Path

from utility import utils as uu
from utility.eval import evaluate_classifier_model
from utility.confusion_matrix import calculate_confusion_matrix
from utility.trainLoopClassifier import *
from utility.plotImageModel import *

# Data augmentations

Data augmentation is a technique used to artificially increase the size of a dataset by transforming existing data points to create new, similar instances. This can help prevent overfitting in machine learning models, as well as improve their ability to generalize to unseen data. Common types of data augmentation include flipping, rotation, scaling, and adding noise to images.
We can generate the augmentation list with torchvision.transforms module

In [None]:
data_augments = torchvision.transforms.Compose([ 
    torchvision.transforms.RandomHorizontalFlip(p = .5),
    torchvision.transforms.RandomVerticalFlip(p = .5),
    torchvision.transforms.ColorJitter(brightness=(0.5,1.5), contrast=(1), hue=(-0.1,0.1)),
    #torchvision.transforms.RandomCrop((224, 224)),
    ])


Load the dataset from utils

In [None]:
cur_path = Path("plots_and_graphs.ipynb")
parent_dir = cur_path.parent.absolute()
masterThesis_folder = str(parent_dir.parent.absolute())+'/'
data_dir = masterThesis_folder+"data/Clean_LiTS/"

cache_me = False
if cache_me is True:
    cache_mgr = Manager()
    cache_mgr.data = cache_mgr.dict()
    cache_mgr.cached = cache_mgr.dict()
    for k in ["train", "val", "test"]:
        cache_mgr.data[k] = cache_mgr.dict()
        cache_mgr.cached[k] = False
# function from utils, credit: Institute for Artificial Intelligence in Medicine. url: https://mml.ikim.nrw/
# dataset outputs a tensor image (dimensions [1,256,256]) and a tensor target (0, 1 or 2)

ds = uu.LiTS_Classification_Dataset(
    data_dir=data_dir,
    transforms=data_augments,
    verbose=True,
    cache_data=cache_me,
    cache_mgr=(cache_mgr if cache_me is True else None),
    debug=True,
)

# Hyperparameters

In [None]:
# Default settings
batch_size = 32
learning_rate = 1e-4
weight_decay = 1e-5
epochs = 15
run_name = "ResNet34"
device = ("cuda" if torch.cuda.is_available() else "cpu")
time_me  = True

The `torch.utils.data.DataLoader` is a utility class in PyTorch that makes the loading and batching of data for training purposes faster. It simplifies the process by allowing us to specify the dataset, batch size (often 32), and whether the data should be shuffled before each epoch. Additionally, there are other parameters available to further customize the data loading process.

In [None]:
# Dataloader
dl = torch.utils.data.DataLoader(
    dataset = ds, 
    batch_size = batch_size, 
    num_workers = 4, 
    shuffle = True, 
    drop_last = False, 
    pin_memory = True,
    persistent_workers = (not cache_me),
    prefetch_factor = 1
    )

ResNet (Residual Network) is a deep neural network architecture introduced in 2015 in [2]. It was designed to address the issue of vanishing gradients in very deep networks. ResNet is named as such because it utilizes residual connections (skip connections), which enable the flow of gradients from earlier layers to later layers, even in very deep networks.

The residual connections in ResNet involve adding the input of a layer to the output of a layer that is several layers deeper. This allows the network to more easily learn identity functions. This design helps prevent the issue of vanishing gradients and enables ResNet to train much deeper networks than was previously possible. This architecture has shown significant improvements in benchmarks compared to the earlier AlexNet model.

The original ResNet was used in the ImageNet Challenge to classify 1000 classes. However, in our exercise, we only use 3 classes:
0: Image does not include the liver.
1: Liver is visible.
2: Liver is visible and a lesion is visible.

# ResNet 34

We highly recommend cross-referencing Table 1 on page 5 and Figure 5 on page 6 of reference [2] simultaneously.

To implement the normal ResNet Block, we use the following sequence: [conv -> batch_norm -> activation] * 2.

At the beginning of each new layer (as shown in Table 1, left), the image size is reduced using convolution with a kernel size of 1 and a stride of 2 (known as projection). This feature was generalized in the implementation of ResNet 50. As an example, we have decided to include both variations.

First, we start by building the blocks. Please note the downsampling operation in the ResBlockDimsReduction, as the input image $x$ has different dimensions than the output.

The class ResNetMLMed34 will inherit from torch.nn.Module, so we need to implement the init() and forward() functions. Using Table 1 and Figure 5 from [2], we define each part of resblocks2-5. The indexing follows the same convention as in Table 1, allowing for easy comparison of block numbers, kernel sizes, and number of channels.

The DimsReduction block is the first block in resblocks2-5, as it performs downsampling

A few words about the torch.nn.init part:
PyTorch initializes the parameters for Conv and batch norm randomly from uniform distribution. Initialization of the weights and biases with a normal distribution helps the model backpropagate gradients in early epochs.

Tests were conducted on smaller models with 18, 34, and 50 layers, indicating that for adaptive optimizers, weight and bias initialization has minimal effect on model performance or convergence.

In contrast, the uniform initialized ResNet 152 model exhibited poor convergence after 15 epochs, with very high error and low accuracy rates. Although initialization improved the performance, it still required tuning of hyperparameters and a better optimizer.

In [2]:
# ResBlock Class
#       - constructs a block [conv -> batch_norm -> activation] *2, which we will stack in the network
# Input:    int: n_chans - number channels
# Output:   nn.Sequential() block

class ResBlock(nn.Module):
    def __init__(self, num_chans):
        super().__init__()
        self.conv1 = nn.Conv2d(num_chans, num_chans, kernel_size=3, padding=1, bias= False)
        self.batch_norm1 = nn.BatchNorm2d(num_features=num_chans)
        self.relu = torch.nn.ReLU()
        self.conv2 = nn.Conv2d(num_chans, num_chans, kernel_size=3, padding=1, bias= False)
        self.batch_norm2 = nn.BatchNorm2d(num_features=num_chans)

        torch.nn.init.kaiming_normal_(self.conv1.weight,
                                      nonlinearity='relu')
        torch.nn.init.kaiming_normal_(self.conv2.weight,
                                      nonlinearity='relu')

        torch.nn.init.constant_(self.batch_norm1.weight, 0.5)
        torch.nn.init.zeros_(self.batch_norm1.bias)

        torch.nn.init.constant_(self.batch_norm2.weight, 0.5)
        torch.nn.init.zeros_(self.batch_norm2.bias)

    def forward(self, x):
        out = self.conv1(x)
        out = self.batch_norm1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.batch_norm2(out)
        out = self.relu(out)
        return out + x # this sum realise the skip connection


# ResBlockDimsReduction Class
#       - constructs a first block in the layer
#       - [conv -> batch_norm -> activation] *2
#       - downsampling performed with stride 2
# Input:    int: num_chans_in; int:num_chans_out
# Output:   nn.Sequential() block

class ResBlockDimsReduction(nn.Module):
    def __init__(self, num_chans_in, num_chans_out):
        super().__init__()
        self.conv1 = nn.Conv2d(num_chans_in, num_chans_out, kernel_size=3, stride=2,padding=1,bias= False)
        self.batch_norm1 = nn.BatchNorm2d(num_features=num_chans_out)
        self.relu = torch.nn.ReLU()
        self.conv2 = nn.Conv2d(num_chans_out, num_chans_out, kernel_size=3, padding=1, bias= False)
        self.batch_norm2 = nn.BatchNorm2d(num_features=num_chans_out)

        torch.nn.init.kaiming_normal_(self.conv1.weight,
                                      nonlinearity='relu')
        torch.nn.init.kaiming_normal_(self.conv2.weight,
                                      nonlinearity='relu')
        torch.nn.init.constant_(self.batch_norm1.weight, 0.5)
        torch.nn.init.zeros_(self.batch_norm1.bias)
        torch.nn.init.constant_(self.batch_norm2.weight, 0.5)
        torch.nn.init.zeros_(self.batch_norm2.bias)

        self.downsample = nn.Sequential(
            nn.Conv2d(num_chans_in, num_chans_out, kernel_size=1, stride=2,bias= False),
            nn.BatchNorm2d(num_features=num_chans_out),
            nn.ReLU()
        )


    def forward(self, x):
        out = self.conv1(x)
        out = self.batch_norm1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.batch_norm2(out)
        out = self.relu(out)
        # input and output dimensions not match, so we need to project x into the dimensions of out
        x = self.downsample(x)
        return out + x

# ResNetMLMed34 Class
#       - constructs a ResNet34 as described [2, Table 1].
# Input:    Tensor: [Batch,1,Height,Width]
# Output:   Tensor: [Batch,3]
class ResNetMLMed34(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(in_channels = 1, out_channels = 64, kernel_size =7, stride =2, padding=1, bias= False)
        self.batch_norm1 = nn.BatchNorm2d(num_features=64)
        self.pool2 = torch.nn.MaxPool2d(kernel_size = 3, stride = 2)
        self.relu = torch.nn.ReLU()
        self.resblocks2 =nn.Sequential(
            *(3 * [ResBlock(num_chans=64)]))
        self.resblocks3 = nn.Sequential(
            ResBlockDimsReduction(num_chans_in=64,num_chans_out=128),
            *(3 * [ResBlock(num_chans=128)]))
        self.resblocks4 = nn.Sequential(
            ResBlockDimsReduction(num_chans_in=128,num_chans_out=256),
            *(5 * [ResBlock(num_chans=256)]))
        self.resblocks5 = nn.Sequential(
            ResBlockDimsReduction(num_chans_in=256,num_chans_out=512),
            *(2 * [ResBlock(num_chans=512)]))
        self.avgpool6 = nn.AdaptiveAvgPool2d(output_size=(1, 1))
        self.fc = nn.Linear(in_features=512, out_features=3, bias=True)


    def forward(self, x):

        out_1 = self.conv1(x)
        out_1 = self.batch_norm1(out_1)
        out_1 = self.relu(out_1)

        out_1 = self.pool2(out_1)

        out_2 = self.resblocks2(out_1)

        out_3 = self.resblocks3(out_2)

        out_4 = self.resblocks4(out_3)

        out_5 = self.resblocks5(out_4)

        out_6 = self.avgpool6(out_5)

        out_6= self.fc(torch.flatten(out_6, start_dim=1))

        return out_6

In [3]:
model = ResNetMLMed34()
summary(model, (1, 256, 256))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 126, 126]           3,136
       BatchNorm2d-2         [-1, 64, 126, 126]             128
              ReLU-3         [-1, 64, 126, 126]               0
         MaxPool2d-4           [-1, 64, 62, 62]               0
            Conv2d-5           [-1, 64, 62, 62]          36,864
       BatchNorm2d-6           [-1, 64, 62, 62]             128
              ReLU-7           [-1, 64, 62, 62]               0
            Conv2d-8           [-1, 64, 62, 62]          36,864
       BatchNorm2d-9           [-1, 64, 62, 62]             128
             ReLU-10           [-1, 64, 62, 62]               0
         ResBlock-11           [-1, 64, 62, 62]               0
           Conv2d-12           [-1, 64, 62, 62]          36,864
      BatchNorm2d-13           [-1, 64, 62, 62]             128
             ReLU-14           [-1, 64,

In [None]:
for step, (data, targets) in enumerate(dl):
    data, targets = data.to(device), targets.to(device)
    if step ==1:
        break
model = model.to(device)
model(data).shape

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate, weight_decay = weight_decay)
criterion = nn.CrossEntropyLoss()

In [None]:
training_loop_conf_matr(
    epochs = epochs,
    optimizer = optimizer,
    model = model,
    criterion = criterion,
    ds = ds,
    dl = dl,
    batch_size = batch_size,
    run_name = run_name,
    cache_me = cache_me,
    device = device,
    time_me=True,
    time=time)