## Dark side of the volume
*https://thinkonward.com/app/c/challenges/dark-side*  

## Contents

1. Approach
2. Hardware requirements and Time
3. Software requirements and Installation
4. Inference
5. Training

## 1. Approach

#### 1.1 Summary
For this challenge I trained an ensemble of 5 2D Unet models with large EfficientNet encoders (two B8, L2, L, XL) using [segmentation_model_pytorch](https://github.com/qubvel/segmentation_models.pytorch) framework. As training data I used volume slices over axis 0 and axis 1 (i.e. 1-channel images of size 300 x 1259). I used `smp.losses.DiceLoss()` and Adam optimizer. It took from 10 to 20 epochs to converge depending on the model. I used the `ReduceLROnPlateau` schedule and mixed precision training. I created a 10-fold split of data and trained only a single first fold for each model. Folds were created based on `GroupKFold` on the volume level i.e. validation is performed on completely unseen volumes. As a final ensemble I averaged predictions from 5 models, 2 axes, and 2 image orientations (original and up-down flip).

#### 1.2 Details
Given that images (slices) in our task have large resolution I was interested in researching the effect of encoder size. It turned out that scaling law worked very well in this case and all models from EfficientNet-B0 up to EfficientNet-B8 gave consistently better scores. Eventually B8 gave the best single model score of 0.9167. I trained EfficientNet-L2 which stands out for its size (0.5B) but the score of 0.9098 was not better than B8. Possibly more careful tuning of hyperparameters could push the score higher.  

All models except one were trained on horizontally oriented images and one was trained on vertically oriented images. Model trained on vertical images gave comparable score. It is worth mentioning the same architecture (B8) trained on vertical images took less vRAM and trained faster compared to horizontal images.  

I tried `smp.losses.DiceLoss()` and `smp.losses.JaccardLoss()` both giving almost the same scores for the same architecture. All final models were trained using Dice loss in multiclass setting i.e. `smp.losses.DiceLoss(mode='multiclass')` treating a binary task as a multiclass task with 2 classes. 
I compared two approaches regarding image selection. As a first one I trained on images where the mask has at least one positive pixel (value 1). As the second one I trained on all images including those where the mask is empty (all zeros). Second approach was significantly better.

In the table below we can see Dice loss values and LB scores for each model.


| Fold | Encoder ID                               | Dice loss | Leaderboard | Image orientation |
|------|------------------------------------------|-----------|-------------|-------------------|
| 0    | `tu-tf_efficientnet_l2.ns_jft_in1k`      | 0.0633    | 0.9098      | hor               |
| 1    | `timm-efficientnet-b8`                   | 0.0606    | 0.9167      | hor               |
| 2    | `tu-tf_efficientnet_b8.ap_in1k`          | 0.0587    | 0.9150      | ver               |
| 3    | `tu-tf_efficientnetv2_l.in1k`            | 0.0621    | 0.9155      | hor               |
| 4    | `tu-tf_efficientnetv2_xl.in21k_ft_in1k`  | 0.0576    | 0.9074      | hor               |
|      |                                          |           |             |                   |
|      |  `ensemble`                              |           | 0.9247      |                   |


**Directory structure:**
```
solution
|
|-- models
|
|-- test
|   |-- 2023-10-05_01b243eb
|   |-- 2023-10-05_023d576f
|   |-- ...
|
|-- train
|   |-- 2023-10-05_0283ecc5
|   |-- 2023-10-05_03b796af
|   |-- ...
|
|-- data.py
|-- infer.py
|-- LICENSE.txt
|-- requirements.txt
|-- solution.ipynb
|-- train.py
|-- utils.py
```

## 2. Hardware requirements and Time

Hardware:

* 12x CPU
* 32 GB RAM
* 1x RTX-3090-24GB GPU
* 500 GB SSD

Time:

* Training data creation: **4 hours**  
* Training time:          **240 hours**  
* Test data creation: **0.5 hour**
* Inference time:     **3 hours**

## 3. Software requirements and Installation

* Ubuntu 22.04
* Python: 3.10.12 (Conda)
* CUDA 12.4

**Dataset setup**  

Solution package has empty `train` and `test` dirs. Please extract train and/or test data in corresponding locations so that we have default dir structure outlined in the section 1 above. Also you can just point the `--input_dir` parameter of the scripts to any data location.

In [None]:
%cd ~/solution
!pip install -r requirements.txt

## 4. Inference

#### 4.1 Create test data

Please replace `--input_dir` value with the path to the directory containing holdout volumes.  
Same structure is expected i.e. each `.npy` file resides inside its own subdirectory.  

In [None]:
!python data.py \
--input_dir=test \
--output_dir_axis_0=test_img_axis_0 \
--output_dir_axis_1=test_img_axis_1 \
--has_label=0

#### 4.2 Run inference 

Input dirs in the following command are the output dirs created with `data.py` script on the previous step.

In [None]:
!python infer.py \
--input_dir_axis_0=test_img_axis_0 \
--input_dir_axis_1=test_img_axis_1 \
--batch_size=16 \
--model_dir=models \
--submission_path=submission.npz

## 5. Training

#### 5.1 Create training data
***Notes.*** 
1. If the mask has no positive pixels we don't save it as a `.png` file. Instead we will create it during training on the fly.
2. I excluded 8 volumes which have completely empty masks from training data. They are listed at the bottom of `data.py`. It was just the initial heuristic which went to the final solution. These volumes could be used for training, although they most probably wouldn't have a significant effect.
3. 12 volumes which have `2024` in their names need a transposition `np.transpose(volume, [1, 0, 2])` to match their corresponding masks.

In [None]:
!python data.py \
--input_dir=train \
--output_dir_axis_0=train_img_axis_0 \
--output_dir_axis_1=train_img_axis_1 \
--has_label=1

#### 5.2 Run training

***Notes.***

1. All models were trained on images from both axes. If you are interested in an axis-specific model, just set
   `--input_dir=train_img_axis_0` or `--input_dir=train_img_axis_1`.

2. Training script saves `model-*.bin` file after each complete epoch and `ckpt-*.bin` file each hour. After training is complete there will be a single file called `model-*.bin` which is the final model. Each checkpoint contains all states (model, optimizer, etc.) so it is possible to continue training if there was an interruption. Please set the corresponding fold index (specified in checkpoint filename) and the last available checkpoint file. For example, to continue training we need to set: `--initial_fold=0` and `--ckpt=checkpoints/model-f0-e005-0.0775.bin`. All other parameters remain the same.

3. Inference script uses model files containing weights only. So to export model after training we need to run a short snippet (set `file` variable with actual final model name):
```
import torch
file = 'model.bin'
torch.save(
    torch.load(file, map_location=torch.device('cpu'))['state_dict'], 
    file.replace('.bin', '-ready.bin'))
```

4. To avoid saving intermediate checkpoints by time just set the argument `--save_seconds` with some large number.

#### EfficientNet-L2

In [None]:
!python train.py \
--input_dir=train_img_axis_* \
--output_dir=checkpoints_1 \
--encoder_name=tu-tf_efficientnet_l2.ns_jft_in1k \
--n_epochs=7 \
--batch_size=2 \
--accum=20 \
--lr=1e-4 \
--vertical=0 \
--aug=1 \
--save_seconds=3600

#### EfficientNet-B8

In [None]:
!python train.py \
--input_dir=train_img_axis_* \
--output_dir=checkpoints_2 \
--encoder_name=timm-efficientnet-b8 \
--n_epochs=16 \
--batch_size=5 \
--accum=8 \
--lr=1e-3 \
--vertical=0 \
--aug=2 \
--save_seconds=3600

#### EfficientNet-B8 (vertical images)

In [None]:
!python train.py \
--input_dir=train_img_axis_* \
--output_dir=checkpoints_3 \
--encoder_name=tu-tf_efficientnet_b8.ap_in1k \
--n_epochs=20 \
--batch_size=7 \
--accum=6 \
--lr=1e-3 \
--vertical=1 \
--aug=2 \
--save_seconds=3600

#### EfficientNet-v2-L

In [None]:
!python train.py \
--input_dir=train_img_axis_* \
--output_dir=checkpoints_4 \
--encoder_name=tu-tf_efficientnetv2_l.in1k \
--n_epochs=9 \
--batch_size=10 \
--accum=4 \
--lr=1e-3 \
--vertical=0 \
--aug=2 \
--save_seconds=3600

#### EfficientNet-v2-XL

In [None]:
!python train.py \
--input_dir=train_img_axis_* \
--output_dir=checkpoints_5 \
--encoder_name=tu-tf_efficientnetv2_xl.in21k_ft_in1k \
--n_epochs=11 \
--batch_size=6 \
--accum=8 \
--lr=1e-3 \
--vertical=0 \
--aug=2 \
--save_seconds=3600

In [None]:
# END