### DeepLab basic experiment

#### Input
- Full-sized 256x256 Sentinel-2 and Sentinel-1 images from the summer subset
- 12 bands: B2, B3, B4, B8, DVV, DVH, B5, B6, B7, B8a, B11, and B12.


#### Label
- LCCS land use images
- 8 classes instead of 11: 20 and 25 combined; 30, 35, and 36 combined

#### Training parameters
- Categorical cross-entropy loss
- Adam optimizer with 0.0001 starting learning rate
- ReduceOnPlateau learning rate scheduler
- Batch size: 4



In [1]:
import pathlib
import numpy as np
import matplotlib.pyplot as plt

import torch
import torchmetrics
import torch.nn as nn
import torchvision.models as models

import pytorch_lightning as pl

import models
from utils import SEN12MSDataset
from utils import sen12ms_dataLoader as sen12ms

In [2]:
DATASET_PATH = pathlib.Path('/home/dubrovin/Projects/Data/SEN12MS/')
SEASON = sen12ms.Seasons.SUMMER

assert DATASET_PATH.is_dir(), 'Incorect location for the dataset'

In [3]:
class DeepLab_base(pl.LightningModule):
    
    def __init__(self):
        super(DeepLab, self).__init__()
        
        self.net = models.DeepLab(backbone='resnet', pretrained_backbone=True, output_stride=16, sync_bn=False, n_in=12, num_classes=8)
            
        self.criterion = nn.CrossEntropyLoss()
        self.accuracy = torchmetrics.Accuracy()
    
    def forward(self, x):
        x = self.net(x)
        return x
    
    def setup(self, stage):
        dataset = SEN12MSDataset(DATASET_PATH, SEASON)
        n_val_examples = int(len(dataset) * 0.1)
        splits = [len(dataset) - n_val_examples, n_val_examples]
        self.train_data, self.val_data = torch.utils.data.random_split(dataset, splits)
    
    def train_dataloader(self):
        dataloader = torch.utils.data.DataLoader(self.train_data, batch_size=4, shuffle=True, num_workers=12, pin_memory=True)
        return dataloader
    
    def val_dataloader(self):
        dataloader = torch.utils.data.DataLoader(self.val_data, batch_size=4, num_workers=12, pin_memory=True)
        return dataloader
    
    def training_step(self, batch, batch_idx):
        x, y = batch
#         x = x[:, 2:]
        pred = self(x)
        loss = self.criterion(pred, y)
        self.pred_accuracy = accuracy
        try:
            accuracy = self.accuracy(pred.softmax(dim=1), y)
        except:
            accuracy = self.pred_accuracy
        
        batch_dict = {
            'loss': loss,
            'accuracy': accuracy,
        }
        
        return batch_dict
    
    def training_epoch_end(self, train_step_outputs):
        average_loss = torch.tensor([x['loss'] for x in train_step_outputs]).mean()
        average_accuracy = torch.tensor([x['accuracy'] for x in train_step_outputs]).mean()
        
        # log to TebsorBoard
        self.logger.experiment.add_scalar('Loss/train', average_loss, self.current_epoch)
        self.logger.experiment.add_scalar('Accuracy/train', average_accuracy, self.current_epoch)
    
    def validation_step(self, batch, batch_idx):
        batch_dict = self.training_step(batch, batch_idx)
        return batch_dict
    
    def validation_epoch_end(self, val_step_outputs):
        average_loss = torch.tensor([x['loss'] for x in val_step_outputs]).mean()
        average_accuracy = torch.tensor([x['accuracy'] for x in val_step_outputs]).mean()
        
        # log to TebsorBoard
        self.logger.experiment.add_scalar('Loss/validation', average_loss, self.current_epoch)
        self.logger.experiment.add_scalar('Accuracy/validation', average_accuracy, self.current_epoch)
        
        # log to the system for ReduceLROnPlateau and EarlyStopping / ModelCheckpoint
        self.log('system/val_loss', average_loss)
        self.log('system/val_acc', average_accuracy)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=0.0001)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
        return {'optimizer': optimizer, 'scheduler': scheduler, 'monitor': 'system/val_loss'}

In [4]:
stop_early = pl.callbacks.EarlyStopping(
    monitor='system/val_loss',
    patience=4,
    mode='min',
)

checkpoint_acc = pl.callbacks.ModelCheckpoint(
    monitor='system/val_acc',
    mode='max',
    every_n_val_epochs=1,
    dirpath='./best_models/',
    filename=r'deeplab_base_v0_val_acc={system/val_acc:.2f}',
    auto_insert_metric_name=False,
    save_weights_only=False,
)

In [5]:
model = DeepLab_base()
logger = pl.loggers.TensorBoardLogger('runs', 'deeplab_base', default_hp_metric=False)

trainer = pl.Trainer(
    logger=logger,
    gpus=0, 
    callbacks=[stop_early, checkpoint_acc],
    profiler='simple',
    num_sanity_val_steps=0,
    fast_dev_run=True,
)

GPU available: True, used: False
TPU available: False, using: 0 TPU cores
Running in fast_dev_run mode: will run a full train, val and test loop using 1 batch(es).


In [6]:
trainer.fit(model)


  | Name      | Type             | Params
-----------------------------------------------
0 | densenet  | FCDenseNet103    | 9.3 M 
1 | criterion | CrossEntropyLoss | 0     
2 | accuracy  | Accuracy         | 0     
-----------------------------------------------
9.3 M     Trainable params
0         Non-trainable params
9.3 M     Total params
37.291    Total estimated model params size (MB)


Epoch 0:  50%|█████     | 1/2 [00:21<00:21, 21.21s/it, loss=2.18, v_num=]
Validating: 0it [00:00, ?it/s][A
Validating:   0%|          | 0/1 [00:00<?, ?it/s][A
Epoch 0: 100%|██████████| 2/2 [00:30<00:00, 15.13s/it, loss=2.18, v_num=]
Epoch 0: 100%|██████████| 2/2 [00:30<00:00, 15.15s/it, loss=2.18, v_num=]

FIT Profiler Report

Action                             	|  Mean duration (s)	|Num calls      	|  Total time (s) 	|  Percentage %   	|
--------------------------------------------------------------------------------------------------------------------------------------
Total                              	|  -              	|_              	|  30.545         	|  100 %          	|
--------------------------------------------------------------------------------------------------------------------------------------
run_training_epoch                 	|  30.308         	|1              	|  30.308         	|  99.225         	|
run_training_batch                 	|  18.638         	|1              	|  18.638         	|  61.017         	|
optimizer_step_and_closure_0       	|  18.637         	|1              	|  18.637         	|  61.014         	|
training_step_and_backward         	|  18.453         	|1              	|  18.453         	|  60.414         	|
backward                           


