<a href="https://colab.research.google.com/github/zrghassabi/OCT-image-segmentation/blob/main/Catalyst%2C_Albumentations%2C_Pytorch_Toolbelt_showcase_Semantic_Segmentation_CamVid.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This notebook is a quick-start guide into semantic segmentation using Catalyst deep learning library. It starts with baseline training pipeline and then goes into advanced details. Topics covered:


*  Baseline training with reducing LR on plataeu
*  Baseline training with classes balancing 
*  Visualization of batches during training using Catalyst callbacks
*  Fine-tuning part of the segmentation model
*  Fine-tuning model with different loss functions
*  Fine-tuning model with multiple  LR

In this tutorial we use reduced version of CamVid dataset from https://github.com/alexgkendall/SegNet-Tutorial.  

Disclaimer: In this notebook we use replatively simple model, which does not reach SOTA results (which is about 0.81 according to https://paperswithcode.com/sota/semantic-segmentation-on-camvid). Yet it sufficient to demonstrate key concepts of Catalyst.

## Install dependencies

In [1]:
# Install required libs
! pip install --quiet torch torchvision torchcontrib
! pip install --quiet git+https://github.com/albu/albumentations
! pip install --quiet catalyst==19.7.4 pytorch_toolbelt==0.1.3

  Building wheel for torchcontrib (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 47.6 MB 1.7 MB/s 
[?25h  Building wheel for albumentations (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 212 kB 5.5 MB/s 
[K     |████████████████████████████████| 59 kB 6.0 MB/s 
[K     |████████████████████████████████| 1.8 MB 44.6 MB/s 
[K     |████████████████████████████████| 171 kB 45.2 MB/s 
[K     |████████████████████████████████| 124 kB 47.4 MB/s 
[K     |████████████████████████████████| 49 kB 4.8 MB/s 
[K     |████████████████████████████████| 10.1 MB 29.2 MB/s 
[K     |████████████████████████████████| 676 kB 38.5 MB/s 
[K     |████████████████████████████████| 53 kB 1.7 MB/s 
[?25h  Building wheel for pytorch-toolbelt (setup.py) ... [?25l[?25hdone
  Building wheel for torchnet (setup.py) ... [?25l[?25hdone
  Building wheel for visdom (setup.py) ... [?25l[?25hdone
  Building wheel for torchfile (setup.py) ... [?25l[?25h

Catalyst supports logging to Tensorboard, so let's install Tensorboard support for notebooks to see the training progress. 

In [3]:
!pip install -q tf-nightly-2.0-preview
# Load the TensorBoard notebook extension
%load_ext tensorboard

[31mERROR: Could not find a version that satisfies the requirement tf-nightly-2.0-preview (from versions: none)[0m
[31mERROR: No matching distribution found for tf-nightly-2.0-preview[0m


##  Mixed precision training support

The bigger our model are, the bigger spatial resolution of images during training, the more GPU memory we need. As memory footprint per image grows, batch size must be reduced in order to stay within available GPU memory limit. 
However, small batch size can make BatchNormalization unstable and make training converge slower. To address this, we can employ mixed precision training with is supported by Catalyst.

Under the hood, Catalyst uses NVidia Apex library, which must be installed separately.

In [5]:
# Install Apex for mixed-precision training
! rm -rf apex
! git clone https://github.com/NVIDIA/apex && cd apex && pip install --quiet -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Cloning into 'apex'...
remote: Enumerating objects: 8815, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (44/44), done.[K
remote: Total 8815 (delta 20), reused 20 (delta 4), pack-reused 8767[K
Receiving objects: 100% (8815/8815), 14.48 MiB | 4.35 MiB/s, done.
Resolving deltas: 100% (6001/6001), done.
  cmdoptions.check_install_build_global(options)
Processing /content/apex
[33m  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.[0m
Skipping wheel build for apex, due to binaries being disabled for it.
Installing collected packages: apex
  Attempting uninstall: apex
    Found existing installation: apex

# Preparation

## Clone dataset

We start by cloning a dataset locally

In [1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

DATA_DIR = './data/CamVid/'

# load repo with data if it is not exists
if not os.path.exists(DATA_DIR):
    print('Loading data...')
    os.system('git clone https://github.com/alexgkendall/SegNet-Tutorial ./data')
    print('Done!')
    
x_train_dir = os.path.join(DATA_DIR, 'train')
y_train_dir = os.path.join(DATA_DIR, 'trainannot')

x_valid_dir = os.path.join(DATA_DIR, 'val')
y_valid_dir = os.path.join(DATA_DIR, 'valannot')

x_test_dir = os.path.join(DATA_DIR, 'test')
y_test_dir = os.path.join(DATA_DIR, 'testannot')


CLASS_NAMES = ['sky', 'building', 'pole', 'road', 'pavement',
               'tree', 'signsymbol', 'fence', 'car',
               'pedestrian', 'bicyclist', 'unlabelled']

CLASS_COLORS = [(128, 128, 128), (128, 0, 0), (192, 192, 128), (128, 64, 128), (0, 0, 192),
                (128, 128, 0), (192, 128, 128), (64, 64, 128), (64, 0, 128),
                (64, 64, 0), (0, 128, 192), (0, 0, 0)]

num_classes = len(CLASS_NAMES)

Loading data...
Done!


## Imports

In [4]:
import os
from collections import OrderedDict
from functools import partial
import numpy as np
import albumentations as A
import torch
import torch.nn as nn
import cv2
from catalyst.contrib.schedulers import MultiStepLR, ReduceLROnPlateau
from catalyst.dl import SupervisedRunner, EarlyStoppingCallback, SchedulerCallback
from catalyst.utils import unpack_checkpoint, load_checkpoint
from pytorch_toolbelt.inference.functional import pad_image_tensor, unpad_image_tensor
from pytorch_toolbelt.optimization.functional import get_lr_decay_parameters
from pytorch_toolbelt.utils import fs
from pytorch_toolbelt.utils.catalyst import *
from pytorch_toolbelt.utils.fs import id_from_fname
from pytorch_toolbelt.utils.torch_utils import tensor_from_rgb_image
from pytorch_toolbelt.utils.random import set_manual_seed
from catalyst.utils import unpack_checkpoint, load_checkpoint
from pytorch_toolbelt.losses import JointLoss, MulticlassDiceLoss, MulticlassJaccardLoss
from torch.optim import Adam, SGD, ASGD, RMSprop, LBFGS
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torchvision.models import resnet34
from datetime import datetime

ModuleNotFoundError: ignored

## Dataset

In [None]:
class CamVidDataset(Dataset):
    """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.

    Args:
        images_dir (str): path to images folder
        masks_dir (str): path to segmentation masks folder
        class_values (list): values of classes to extract from segmentation mask
        augmentation (A.Compose): data transfromation pipeline
            (e.g. flip, scale, etc.)
    """

    def __init__(
            self,
            images_dir,
            masks_dir,
            transform=A.Normalize()
    ):
        self.ids = os.listdir(images_dir)
        self.images_fps = [os.path.join(images_dir, image_id) for image_id in self.ids]
        self.masks_fps = [os.path.join(masks_dir, image_id) for image_id in self.ids]
        self.transform = transform

    def __getitem__(self, i):
        # read data
        image = fs.read_rgb_image(self.images_fps[i])
        mask = fs.read_image_as_is(self.masks_fps[i])
        assert mask.max() < len(CLASS_NAMES)

        # apply augmentations
        sample = self.transform(image=image, mask=mask)
        image, mask = sample['image'], sample['mask']

        return {
            "image_id": id_from_fname(self.images_fps[i]),
            "features": tensor_from_rgb_image(image),
            "targets": torch.from_numpy(mask).long()
        }

    def __len__(self):
        return len(self.ids)



## Augmentations

In [None]:
def get_training_augmentation(blur=True, weather=True):
    return A.Compose([
        A.PadIfNeeded(384, 384, border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=11),

        A.OneOf([
#             A.GridDistortion(border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=11),
#             A.ElasticTransform(alpha_affine=10, border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=11),
            A.ShiftScaleRotate(shift_limit=0, scale_limit=0, rotate_limit=10,
                               border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=11),
            A.OpticalDistortion(border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=11),
            A.NoOp(p=0.6)
        ]),

        A.OneOf([
            A.CLAHE(),
            A.RandomBrightnessContrast(),
            A.RandomGamma(),
            A.HueSaturationValue(),
            A.NoOp()
        ]),

        A.Compose([
            A.OneOf([
                A.IAASharpen(),
                A.Blur(blur_limit=3),
                A.MotionBlur(blur_limit=3),
                A.ISONoise(),
                A.NoOp()
            ]),
        ], p=float(blur)),

        A.Compose([
            A.OneOf([
                A.RandomFog(),
                A.RandomSunFlare(src_radius=100),
                A.RandomRain(),
                A.RandomSnow(),
                A.NoOp()
            ]),
        ]) if weather else A.NoOp(),

        A.RandomSizedCrop(min_max_height=(300, 360), height=320, width=320, always_apply=True),
        A.HorizontalFlip(p=0.5),
        A.Cutout(),
        A.Normalize(),
    ])


def get_validation_augmentation():
    return A.Compose([
        A.PadIfNeeded(384, 384, border_mode=cv2.BORDER_CONSTANT, value=0, mask_value=11),
        A.Normalize()
    ])

In [None]:
train_ds = CamVidDataset(x_train_dir, y_train_dir, transform=get_training_augmentation())
valid_ds = CamVidDataset(x_valid_dir, y_valid_dir, transform=get_validation_augmentation())

print('Train dataset size', len(train_ds))
print('Valid dataset size', len(valid_ds))

# Define callbacks

iou_score = JaccardScoreCallback(mode='multiclass',
                             # We exclude last 'unlabeled' class from the evaluation
                             class_names=CLASS_NAMES[:num_classes-1],
                             classes_of_interest=np.arange(num_classes-1),
                             prefix='iou')

visualize_predictions = partial(draw_semantic_segmentation_predictions,
                                mode='side-by-side',
                                class_colors=CLASS_COLORS)

show_batches = ShowPolarBatchesCallback(visualize_predictions,
                                 targets=['tensorboard'],
                                 metric='iou',
                                 minimize=True)

Train dataset size 367
Valid dataset size 101


## Define a model

For sake of demonstation purposes, we use a LinkNet model with Resnet34 encoder. Of course one can use more advanced model, however we still can get decent results with LinkNet34 in a short amount of time. 

## LinkNet34

In [None]:
class DecoderBlockLinkNet(nn.Module):
    def __init__(self, in_channels, n_filters):
        super().__init__()

        self.relu = nn.ReLU(inplace=True)

        # B, C, H, W -> B, C/4, H, W
        self.conv1 = nn.Conv2d(in_channels, in_channels // 4, 1)
        self.norm1 = nn.BatchNorm2d(in_channels // 4)

        # B, C/4, H, W -> B, C/4, 2 * H, 2 * W
        self.deconv2 = nn.ConvTranspose2d(in_channels // 4, in_channels // 4, kernel_size=4,
                                          stride=2, padding=1, output_padding=0)
        self.norm2 = nn.BatchNorm2d(in_channels // 4)

        # B, C/4, H, W -> B, C, H, W
        self.conv3 = nn.Conv2d(in_channels // 4, n_filters, 1)
        self.norm3 = nn.BatchNorm2d(n_filters)

    def forward(self, x):
        x = self.conv1(x)
        x = self.norm1(x)
        x = self.relu(x)
        x = self.deconv2(x)
        x = self.norm2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.norm3(x)
        x = self.relu(x)
        return x


class LinkNet34(nn.Module):
    def __init__(self, num_classes=1, num_channels=3, pretrained=True):
        super().__init__()
        assert num_channels == 3
        self.num_classes = num_classes
        filters = [64, 128, 256, 512]
        resnet = resnet34(pretrained=pretrained)

        self.firstconv = resnet.conv1
        self.firstbn = resnet.bn1
        self.firstrelu = resnet.relu
        self.firstmaxpool = resnet.maxpool
        self.encoder1 = resnet.layer1
        self.encoder2 = resnet.layer2
        self.encoder3 = resnet.layer3
        self.encoder4 = resnet.layer4

        # Decoder
        self.decoder4 = DecoderBlockLinkNet(filters[3], filters[2])
        self.decoder3 = DecoderBlockLinkNet(filters[2], filters[1])
        self.decoder2 = DecoderBlockLinkNet(filters[1], filters[0])
        self.decoder1 = DecoderBlockLinkNet(filters[0], filters[0])

        # Final Classifier
        self.finaldeconv1 = nn.ConvTranspose2d(filters[0], 32, 3, stride=2)
        self.finalrelu1 = nn.ReLU(inplace=True)
        self.finalconv2 = nn.Conv2d(32, 32, 3)
        self.finalrelu2 = nn.ReLU(inplace=True)
        self.finalconv3 = nn.Conv2d(32, num_classes, 2, padding=1)

    def forward(self, x):
      
        # Encoder
        x = self.firstconv(x)
        x = self.firstbn(x)
        x = self.firstrelu(x)
        x = self.firstmaxpool(x)
        e1 = self.encoder1(x)
        e2 = self.encoder2(e1)
        e3 = self.encoder3(e2)
        e4 = self.encoder4(e3)

        # Decoder with Skip Connections
        d4 = self.decoder4(e4) + e3
        d3 = self.decoder3(d4) + e2
        d2 = self.decoder2(d3) + e1
        d1 = self.decoder1(d2)

        # Final Classification
        f1 = self.finaldeconv1(d1)
        f2 = self.finalrelu1(f1)
        f3 = self.finalconv2(f2)
        f4 = self.finalrelu2(f3)
        f5 = self.finalconv3(f4)
        
        return f5 

## Helper functions

In [None]:
def restore_from_checkpoint(checkpoint_file, model, optimizer=None):
  try:
    checkpoint = load_checkpoint(checkpoint_file)
  except FileNotFoundError:
    print('Checkpoint not found', checkpoint_file)
    return
  
  epoch = checkpoint['epoch']
  valid_metrics = checkpoint['valid_metrics']

  try:
    unpack_checkpoint(checkpoint, model=model)     
    print('Loaded model weights from epoch', epoch, 'Validation mIoU', valid_metrics['iou'])
  except Exception as e:
    print('Failed to restore model from checkpoint', checkpoint_file)
    print(e)
    
  try:
    if optimizer is not None:
      unpack_checkpoint(checkpoint, optimizer=optimizer)
  except Exception as e:
    print('Failed to restore optimizer state from checkpoint', checkpoint_file)
    print(e)

# Training

In [None]:
os.makedirs('runs', exist_ok=True)
%tensorboard --logdir runs

## Regular training

We start with training a LinkNet model using pre-trained encoder with learning rate of 1e-3 for all layers, gradually lovering it with gamma=0.5 every 30 epochs. In total, we train network for 150 epochs using default CE loss.

In [None]:
set_manual_seed(42)

mul_factor = 5
num_epochs = 50

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=32, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=32, num_workers=4, pin_memory=True, drop_last=False)

model = LinkNet34(num_classes).cuda()

optimizer = Adam(model.parameters(), lr=1e-3)
scheduler = ReduceLROnPlateau(optimizer, mode='max', patience=5, factor=0.2)
early_stopping = EarlyStoppingCallback(patience=20, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optimizer,
    callbacks=[
        iou_score, 
        early_stopping,
        show_batches,
        SchedulerCallback(reduce_metric='iou'),
    ],
    logdir='runs/linknet34/baseline',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders 

Using manual seed: 42


Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /root/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:01<00:00, 74680550.83it/s]


[2019-08-19 07:05:15,580] 
0/50 * Epoch 0 (train): _base/lr=0.0010 | _base/momentum=0.9000 | _timers/_fps=1128.2311 | _timers/batch_time=0.2502 | _timers/data_time=0.1322 | _timers/model_time=0.1179 | iou=0.1113 | iou_bicyclist=0.0002 | iou_building=0.2208 | iou_car=0.0003 | iou_fence=0.0015 | iou_pavement=0.000000E+00 | iou_pedestrian=0.000000E+00 | iou_pole=0.000000E+00 | iou_road=0.3692 | iou_signsymbol=0.000000E+00 | iou_sky=0.4618 | iou_tree=0.0136 | loss=1.4589
0/50 * Epoch 0 (valid): _base/lr=0.0010 | _base/momentum=0.9000 | _timers/_fps=924.1296 | _timers/batch_time=2.4868 | _timers/data_time=0.4168 | _timers/model_time=2.0698 | iou=0.1830 | iou_bicyclist=0.000000E+00 | iou_building=0.4897 | iou_car=0.000000E+00 | iou_fence=0.000000E+00 | iou_pavement=0.000000E+00 | iou_pedestrian=0.000000E+00 | iou_pole=0.000000E+00 | iou_road=0.6213 | iou_signsymbol=0.000000E+00 | iou_sky=0.8866 | iou_tree=0.0055 | loss=1.2750
[2019-08-19 07:07:05,258] 
1/50 * Epoch 1 (train): _base/lr=0.0010

Let's see what is the mIoU score on validation dataset for the best checkpoint.

In [None]:
baseline = load_checkpoint('runs/linknet34/baseline/checkpoints/best.pth')
baseline_epoch = baseline['epoch']
baseline_valid_metrics = baseline['valid_metrics']

del baseline

print('Baseline result mIoU:', baseline_valid_metrics['iou'], 'after', baseline_epoch, 'epochs')

## Class balancing

In [None]:
from sklearn.utils import compute_sample_weight

def get_balanced_weights(dataset:CamVidDataset):
    labels=[]
    for mask in dataset.masks_fps:
      mask = fs.read_image_as_is(mask)
      unique_labels = np.unique(mask)
      labels.append(''.join([str(int(i)) for i in unique_labels]))

    weights = compute_sample_weight('balanced', labels)
    return weights

  
set_manual_seed(43)

mul_factor = 5
num_epochs = 50

train_sampler = WeightedRandomSampler(get_balanced_weights(train_ds), len(train_ds) * mul_factor)

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=32, num_workers=8, 
                                   pin_memory=True, 
                                   drop_last=True,
                                   sampler=train_sampler)

data_loaders['valid'] = DataLoader(valid_ds, batch_size=32, num_workers=4, 
                                   pin_memory=True,
                                   drop_last=False)

model = LinkNet34(num_classes).cuda()
optimizer = Adam(model.parameters(), lr=1e-3)

# Use baseline model as starting point
restore_from_checkpoint('runs/linknet34/checkpoints/best.pth', model, optimizer)

scheduler = ReduceLROnPlateau(optimizer, mode='max', patience=5, factor=0.2)
early_stopping = EarlyStoppingCallback(patience=20, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optimizer,
    callbacks=[
        iou_score, 
        early_stopping,
        show_batches,
        SchedulerCallback(reduce_metric='iou'),
    ],
    logdir='runs/linknet34/baseline_balanced',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders 

## Training with CE + Lovazsh loss

In [None]:
set_manual_seed(42)

model = LinkNet34(num_classes).cuda()

baseline_checkpoint = load_checkpoint('runs/linknet34/baseline/checkpoints/best.pth')
unpack_checkpoint(baseline_checkpoint, model=model)
print('Loaded model weights from baseline')

model = LinkNet34(num_classes).cuda()

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=32, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=32, num_workers=4, pin_memory=True, drop_last=False)

criterion = JointLoss(nn.CrossEntropyLoss(), LovashLoss(classes=np.arange(11)), 1.0, 0.5)
optimizer = Adam(model.parameters(), lr=1e-4)
scheduler = ReduceLROnPlateau(optimizer, patience=5)
early_stopping = EarlyStoppingCallback(patience=12, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    callbacks=[iou_score, early_stopping],
    logdir='runs/linknet34/bce_dice',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders 

In [None]:
set_manual_seed(42)

model = LinkNet34(num_classes).cuda()

baseline_checkpoint = load_checkpoint('runs/linknet34/baseline/checkpoints/best.pth')
unpack_checkpoint(baseline_checkpoint, model=model)
print('Loaded model weights from baseline')

model = LinkNet34(num_classes).cuda()

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=32, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=32, num_workers=4, pin_memory=True, drop_last=False)

criterion = JointLoss(nn.CrossEntropyLoss(), MulticlassJaccardLoss(classes=np.arange(11)), 1.0, 0.5)
optimizer = Adam(model.parameters(), lr=1e-4)
scheduler = ReduceLROnPlateau(optimizer, patience=5)
early_stopping = EarlyStoppingCallback(patience=12, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    callbacks=[iou_score, early_stopping],
    logdir='runs/linknet34/bce_jaccard',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders 

In [None]:
del scheduler, optimizer, runner, model, data_loaders 

# Fine-tuning model

## Fine-tuning decoder

In [None]:
from pytorch_toolbelt.utils.torch_utils import set_trainable
from pytorch_toolbelt.utils.torch_utils import maybe_cuda, count_parameters

set_manual_seed(42)

mul_factor = 5
num_epochs = 50

model = LinkNet34(num_classes).cuda()

# Freeze encoder
set_trainable(model.firstconv, trainable=False, freeze_bn=True)
set_trainable(model.firstbn, trainable=False, freeze_bn=True)
set_trainable(model.encoder1, trainable=False, freeze_bn=True)
set_trainable(model.encoder2, trainable=False, freeze_bn=True)
set_trainable(model.encoder3, trainable=False, freeze_bn=True)
set_trainable(model.encoder4, trainable=False, freeze_bn=True)

baseline_checkpoint = load_checkpoint('runs/linknet34/baseline/checkpoints/best.pth')
unpack_checkpoint(baseline_checkpoint, model=model)
print('Loaded model weights from baseline')

print(count_parameters(model))

train_ds = CamVidDataset(x_train_dir, y_train_dir, transform=get_training_augmentation())
valid_ds = CamVidDataset(x_valid_dir, y_valid_dir, transform=get_validation_augmentation())

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=32, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=32, num_workers=4, pin_memory=True, drop_last=False)

optimizer = SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-5, momentum=0.9, weight_decay=1e-4)
scheduler = ReduceLROnPlateau(optimizer, mode='max', patience=5, factor=0.2)
early_stopping = EarlyStoppingCallback(patience=12, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optimizer,
    callbacks=[
        iou_score, 
        early_stopping,
        show_batches,
        SchedulerCallback(reduce_metric='iou'),
    ],
    logdir='runs/linknet34/finetune',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders  

## Fine-tuning with FocalLoss

In [None]:
from catalyst.utils import unpack_checkpoint, load_checkpoint
from pytorch_toolbelt.losses.functional import sigmoid_focal_loss, reduced_focal_loss
from torch.nn.modules.loss import _Loss

set_manual_seed(42)

class FocalLoss(_Loss):
    def __init__(self, alpha=0.5, gamma=2, ignore=None):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.ignore = ignore

    def forward(self, label_input, label_target):
        """Compute focal loss for multi-class problem.
        Ignores anchors having -1 target label
        """
        num_classes = label_input.size(1)
        loss = 0

        # Filter anchors with -1 label from loss computation
        if self.ignore is not None:
            not_ignored = label_target != self.ignore
            
        for cls in range(num_classes):
            cls_label_target = (label_target == cls).long()
            cls_label_input = label_input[:, cls, ...]

            if self.ignore is not None:
                cls_label_target = cls_label_target[not_ignored]
                cls_label_input = cls_label_input[not_ignored]

            loss += sigmoid_focal_loss(cls_label_input, cls_label_target, gamma=self.gamma, alpha=self.alpha)
        return loss

      
mul_factor = 5
num_epochs = 50

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=32, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=32, num_workers=4, pin_memory=True, drop_last=False)

num_classes = len(CLASS_NAMES)
model = LinkNet34(num_classes).cuda()

try:
  baseline_checkpoint = load_checkpoint('runs/linknet34/finetune/checkpoints/best.pth')
  unpack_checkpoint(baseline_checkpoint, model=model)
  print('Loaded model weights from finetune')
except:
  print('Failed to load previous state. Training from scratch')
  
# model runner
runner = SupervisedRunner()


optimizer = Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10,20,30,40], gamma=0.5)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=FocalLoss(alpha=None),
    optimizer=optimizer,
    callbacks=[
        iou_score, 
        early_stopping,
        show_batches
    ],
    logdir='runs/linknet34/focal',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders  

## Train and test with TTA

In [None]:
from pytorch_toolbelt.inference import tta
      
set_manual_seed(42)

mul_factor = 5
num_epochs = 50

model = LinkNet34(num_classes).cuda()

try:
  baseline_checkpoint = load_checkpoint('runs/linknet34/baseline/checkpoints/best.pth')
  unpack_checkpoint(baseline_checkpoint, model=model)
  print('Loaded model weights from baseline')
except:
  print('Failed to load previous state. Training from scratch')

model = tta.MultiscaleTTAWrapper(model, [0.8, 1.0, 1.5])
# model = tta.TTAWrapper(model, tta.fliplr_image2mask)

print(count_parameters(model))

train_ds = CamVidDataset(x_train_dir, y_train_dir, transform=get_training_augmentation())
valid_ds = CamVidDataset(x_valid_dir, y_valid_dir, transform=get_validation_augmentation())

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=8, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=8, num_workers=4, pin_memory=True, drop_last=False)

optimizer = Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)
scheduler = ReduceLROnPlateau(optimizer, patience=5)
early_stopping = EarlyStoppingCallback(patience=12, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optimizer,
    callbacks=[iou_score, early_stopping],
    logdir='runs/linknet34/tta',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders 

In [None]:
baseline = load_checkpoint('runs/linknet34/baseline/checkpoints/tta.pth')
baseline_epoch = baseline['epoch']
baseline_valid_metrics = baseline['valid_metrics']

del baseline

print('TTA result mIoU:', baseline_valid_metrics['iou'],'after',baseline_epoch,'epochs')

## Train with deep supervision

In [None]:
# Cleanup after ourselves to free up GPU memory
try:
  del scheduler
except:
  pass

try:
  del optimizer
except:
  pass

try:
  del runner
except:
  pass

try:
  del model
except:
  pass

try:
  del data_loaders
except:
  pass

In [None]:
class DecoderBlockLinkNet(nn.Module):
    def __init__(self, in_channels, n_filters):
        super().__init__()

        self.relu = nn.ReLU(inplace=True)

        # B, C, H, W -> B, C/4, H, W
        self.conv1 = nn.Conv2d(in_channels, in_channels // 4, 1)
        self.norm1 = nn.BatchNorm2d(in_channels // 4)

        # B, C/4, H, W -> B, C/4, 2 * H, 2 * W
        self.deconv2 = nn.ConvTranspose2d(in_channels // 4, in_channels // 4, kernel_size=4,
                                          stride=2, padding=1, output_padding=0)
        self.norm2 = nn.BatchNorm2d(in_channels // 4)

        # B, C/4, H, W -> B, C, H, W
        self.conv3 = nn.Conv2d(in_channels // 4, n_filters, 1)
        self.norm3 = nn.BatchNorm2d(n_filters)

    def forward(self, x):
        x = self.conv1(x)
        x = self.norm1(x)
        x = self.relu(x)
        x = self.deconv2(x)
        x = self.norm2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.norm3(x)
        x = self.relu(x)
        return x
      
class LinkNet34(nn.Module):
    def __init__(self, num_classes=1, num_channels=3, pretrained=True):
        super().__init__()
        assert num_channels == 3
        self.num_classes = num_classes
        filters = [64, 128, 256, 512]
        resnet = resnet34(pretrained=pretrained)

        self.firstconv = resnet.conv1
        self.firstbn = resnet.bn1
        self.firstrelu = resnet.relu
        self.firstmaxpool = resnet.maxpool
        self.encoder1 = resnet.layer1
        self.encoder2 = resnet.layer2
        self.encoder3 = resnet.layer3
        self.encoder4 = resnet.layer4

        # Decoder
        self.decoder4 = DecoderBlockLinkNet(filters[3], filters[2])
        self.decoder3 = DecoderBlockLinkNet(filters[2], filters[1])
        self.decoder2 = DecoderBlockLinkNet(filters[1], filters[0])
        self.decoder1 = DecoderBlockLinkNet(filters[0], filters[0])
      
        self.center_logits = nn.Conv2d(filters[3], num_classes, kernel_size=1)
        self.coarse_logits = nn.Conv2d(filters[0], num_classes, kernel_size=1)
        
        # Final Classifier
        self.finaldeconv1 = nn.ConvTranspose2d(filters[0], 32, 3, stride=2)
        self.finalrelu1 = nn.ReLU(inplace=True)
        self.finalconv2 = nn.Conv2d(32, 32, 3)
        self.finalrelu2 = nn.ReLU(inplace=True)
        self.finalconv3 = nn.Conv2d(32, num_classes, 2, padding=1)

    def forward(self, x):
      
        # Encoder
        x = self.firstconv(x)
        x = self.firstbn(x)
        x = self.firstrelu(x)
        x = self.firstmaxpool(x)
        e1 = self.encoder1(x)
        e2 = self.encoder2(e1)
        e3 = self.encoder3(e2)
        e4 = self.encoder4(e3)

        # Decoder with Skip Connections
        d4 = self.decoder4(e4) + e3
        d3 = self.decoder3(d4) + e2
        d2 = self.decoder2(d3) + e1
        d1 = self.decoder1(d2)

        # Final Classification
        f1 = self.finaldeconv1(d1)
        f2 = self.finalrelu1(f1)
        f3 = self.finalconv2(f2)
        f4 = self.finalrelu2(f3)
        f5 = self.finalconv3(f4)
                
        return {
          'logits': f5,
          'center_logits': self.center_logits(e4),
          'coarse_logits': self.coarse_logits(d1)
        }


In [None]:
from catalyst.dl import CriterionCallback
import cv2

class DSVCamVidDataset(CamVidDataset):
  
    def __getitem__(self, i):
        image = fs.read_rgb_image(self.images_fps[i])
        mask = fs.read_image_as_is(self.masks_fps[i])
        assert mask.max() < len(CLASS_NAMES)

        # apply augmentations
        sample = self.transform(image=image, mask=mask)
        image, mask = sample['image'], sample['mask']
        
        h, w = mask.shape[:2]
        
        # Coarse targets has stride of 2
        coarse_targets = cv2.resize(mask, (w // 2, h // 2), interpolation=cv2.INTER_NEAREST)
        center_targets = cv2.resize(coarse_targets, (w // 32, h // 32), interpolation=cv2.INTER_NEAREST)
        
        return {
            "image_id": id_from_fname(self.images_fps[i]),
            "features": tensor_from_rgb_image(image),
            "targets": torch.from_numpy(mask).long(),
            "coarse_targets": torch.from_numpy(coarse_targets).long(),
            "center_targets": torch.from_numpy(center_targets).long(),
        }



set_manual_seed(42)

mul_factor = 5
num_epochs = 150

train_ds = DSVCamVidDataset(x_train_dir, y_train_dir, transform=get_training_augmentation())
valid_ds = DSVCamVidDataset(x_valid_dir, y_valid_dir, transform=get_validation_augmentation())

sample = train_ds[0]
print(sample['targets'].size(), sample['coarse_targets'].size(),  sample['center_targets'].size())

data_loaders = OrderedDict()
data_loaders['train'] = DataLoader(train_ds, batch_size=16, num_workers=8, pin_memory=True, drop_last=True,
                                   sampler=WeightedRandomSampler(np.ones(len(train_ds)), len(train_ds) * mul_factor))
data_loaders['valid'] = DataLoader(valid_ds, batch_size=16, num_workers=4, pin_memory=True, drop_last=False)

model = LinkNet34(num_classes).cuda()

optimizer = Adam(model.parameters(), lr=1e-3)
scheduler = ReduceLROnPlateau(optimizer, patience=5)
early_stopping = EarlyStoppingCallback(patience=12, metric='iou', minimize=False)

# model training
runner = SupervisedRunner()
runner.train(
    model=model,
    criterion=nn.CrossEntropyLoss(),
    optimizer=optimizer,
    callbacks=[
      CriterionCallback(input_key='targets', output_key='logits', prefix='loss'),
      CriterionCallback(input_key='coarse_targets', output_key='coarse_logits', prefix='coarse_loss', multiplier=0.5),
      CriterionCallback(input_key='center_targets', output_key='center_logits', prefix='center_loss', multiplier=0.25),
      iou_score, 
      early_stopping
    ],
    logdir='runs/linknet34/dsv',
    loaders=data_loaders,
    num_epochs=num_epochs,
    scheduler=scheduler,
    verbose=False,
    main_metric='iou',
    minimize_metric=False
)

# Cleanup after ourselves to free up GPU memory
del scheduler, optimizer, runner, model, data_loaders