# Train a Mask-RCNN

Uses `detectron2` to train as Mask-RCNN to segment synthetic yeast cells. A default data set is downloaded, but one can also inject the `create-synthetic-dataset-for-training` notebook. In `labels.json` / `labels.umsgpack`.

Be very carefull running the code, creating a model needs almost all of Colab's 12 GB RAM, rerunning things several times may cause out memory crashes. Training takes about 8 hours for 20.000 iterations. However, performance should have converged around 4.000 iterations.

This notebook was tested on Google Colab.

## Install and import

Installs the appropriate libraries, mainly `detectron2`.

<font color='black' size='6'>Ensure to </font><font color='red' size='6'>**restart the runtime**</font><font color='black' size='6'> on Colab after everything is installed successfully, to ensure everything was imported correctly.</font>

In [1]:
!pip3 install -U Pillow

%load_ext tensorboard
import os
import numpy

# Install detectron2 based on the installed version of torch,
# we assume torchvision is already installed as is the case on Google Colab.
try:
  import detectron2
except ImportError:
  import torch
  torch_version, cuda_version = torch.__version__.split('+cu')
  torch_version = '.'.join(torch_version.split('.')[:2])
  if (torch_version, cuda_version) not in {('1.8', '101')}:
    warnings.warn(
        f'Untested version combination: cuda ({cuda_version}), torch ({torch_version})\n'
        'Check https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md\n'
        'and https://pytorch.org/\n'
        'on how to install detectron2 with adequate torch and torchvision '
        'if installation fails.'
    )

  !pip3 install -U pyyaml
  !pip3 install detectron2 -f "https://dl.fbaipublicfiles.com/detectron2/wheels/cu{cuda_version}/torch{torch_version}/index.html"
  import detectron2

try:
  import umsgpack
  from download import download
except ImportError:
  !pip3 install umsgpack download
  import umsgpack
  from download import download

# Install the yeastcells-detection-maskrcnn from github if unavailable.
try:
  # raise ImportError()
  from yeastcells.train import create_model, train
except ImportError:
  !test -e yeastcells-detection-maskrcnn || git clone https://github.com/ymzayek/yeastcells-detection-maskrcnn.git
  !cd yeastcells-detection-maskrcnn; git pull origin main
  !pip3 install ./yeastcells-detection-maskrcnn
  from yeastcells.train import create_model, train

from google.colab import files

Requirement already up-to-date: Pillow in /usr/local/lib/python3.7/dist-packages (8.1.2)


## Load data

A default data set is downloaded, but you could use one created with the `create-synthetic-dataset-for-training` notebook. For example by mounting your Google Drive to both notebooks. This file is reasonably large, and the download might time out. Please be patient and try again. If the download abort midway, try again as it will continue where it left off.

In [None]:
data_path = f'/content/synthetic-yeast-cells-data'

download(
    'https://datascience.web.rug.nl/synthetic-yeast-cells-data-v10.zip',
    'synthetic-yeast-cells-data-v10.zip')

# Unzip again if there aren't 1000 files in the data path (heuritic)
# Patient users may unzip always
# If the download is clipped, just restart the cell and it will continue.
file_count_estimate = !ls '{data_path}/'* | cat
if len(file_count_estimate) < 1000:
 os.makedirs(data_path, exist_ok=True)
 !cd '{data_path}' && unzip '/content/synthetic-yeast-cells-data-v10.zip'

## Tensorboard

Monitor learning curves at the time series tab.

In [4]:
%tensorboard --logdir /content/tensorboard/

<IPython.core.display.Javascript object>

## Training

In [5]:
#set path to model_final.pth
version = 'v1'
run = 1
model_path = f'/content/model-{version}'

#load model
config = create_model(
    model_path,
    device='cuda:0',
    data_workers=2,
    batch_size=2,
    learning_rate=0.00025,
    max_iter=20000,
    # max_iter=2000,
    pretrained="COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml",
    tensorboard=f'/content/tensorboard/yeast-cells-mask-rcnn-run-{run}'
)

In [None]:
trainer = train(
    config,
    data_path
)

[32m[03/18 14:55:31 d2.engine.defaults]: [0mModel:
GeneralizedRCNN(
  ... removed model details manually ...
)
[32m[03/18 14:55:31 d2.data.build]: [0mRemoved 0 images with no usable annotations. 20000 images left.
[32m[03/18 14:55:33 d2.data.build]: [0mDistribution of instances among all 1 categories:
[36m|  category  | #instances   |
|:----------:|:-------------|
| yeast_cell | 1977043      |
|            |              |[0m
[32m[03/18 14:55:33 d2.data.dataset_mapper]: [0m[DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[32m[03/18 14:55:33 d2.data.build]: [0mUsing training sampler TrainingSampler
[32m[03/18 14:55:33 d2.data.common]: [0mSerializing 20000 elements to byte tensors and concatenating them all ...
[32m[03/18 14:55:40 d2.data.common]: [0mSerialized dataset takes 1055.97 MiB


model_final_f10217.pkl: 178MB [00:17, 10.4MB/s]                           
Skip loading parameter 'roi_heads.box_predictor.cls_score.weight' to the model due to incompatible shapes: (81, 1024) in the checkpoint but (2, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.cls_score.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (2,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.weight' to the model due to incompatible shapes: (320, 1024) in the checkpoint but (4, 1024) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.box_predictor.bbox_pred.bias' to the model due to incompatible shapes: (320,) in the checkpoint but (4,) in the model! You might want to double check if this is expected.
Skip loading parameter 'roi_heads.mask_head.predictor.weight' to the model due to i

[32m[03/18 14:56:00 d2.engine.train_loop]: [0mStarting training from iteration 0
[32m[03/18 14:56:27 d2.utils.events]: [0m eta: 7:31:05  iter: 19  total_loss: 9.604  loss_cls: 0.6163  loss_box_reg: 0.4482  loss_mask: 0.6877  loss_rpn_cls: 6.81  loss_rpn_loc: 0.9876  time: 1.3529  data_time: 0.0360  lr: 4.9953e-06  max_mem: 2563M
[32m[03/18 14:56:55 d2.utils.events]: [0m eta: 7:28:17  iter: 39  total_loss: 4.303  loss_cls: 0.6114  loss_box_reg: 0.5155  loss_mask: 0.68  loss_rpn_cls: 1.654  loss_rpn_loc: 0.8095  time: 1.3548  data_time: 0.0089  lr: 9.9902e-06  max_mem: 2563M
[32m[03/18 14:57:20 d2.utils.events]: [0m eta: 7:22:46  iter: 59  total_loss: 2.838  loss_cls: 0.6135  loss_box_reg: 0.5202  loss_mask: 0.6668  loss_rpn_cls: 0.3622  loss_rpn_loc: 0.7236  time: 1.3223  data_time: 0.0085  lr: 1.4985e-05  max_mem: 2563M
... Removed many steps manually ...
[32m[03/18 19:12:37 d2.utils.events]: [0m eta: 3:11:50  iter: 11519  total_loss: 0.9401  loss_cls: 0.09366  loss_box_reg: 

##Download

Download the resulting `final_model.pth`.

In [None]:
files.download(f'/content/tensorboard/yeast-cells-mask-rcnn-run-{run}/final_model.pth')