# "Simple Does It" Grabcut Training for Instance Segmentation

This notebook performs training of the SDI Grabcut weakly supervised model for **instance segmentation**. Following the instructions provided in Section "6. Instance Segmentation Results" of the "Simple Does It" paper, we use the Berkeley-augmented Pascal VOC segmentation dataset that provides per-instance segmentation masks for VOC2012 data.

The Berkley augmented dataset can be downloaded from [here](
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz)

The SDI Grabcut training is done using a **4-channel input** VGG16 network pre-trained on ImageNet, so make sure to run the [`VGG16 Net Surgery`](net_surgery.ipynb) notebook first!

To monitor training, run:
```
# On Windows
tensorboard --logdir E:\repos\tf-wss\tfwss\models\vgg_16_4chan_weak
# On Ubuntu
tensorboard --logdir /media/EDrive/repos/tf-wss/tfwss/models/vgg_16_4chan_weak
http://<hostname>:6006
```

In [1]:
""""
model_train.ipynb

SDI Grabcut weakly supervised model trainer (instance segmentation)

Written by Phil Ferriere

Licensed under the MIT License (see LICENSE for details)

Based on:
  - https://github.com/scaelles/OSVOS-TensorFlow/blob/master/osvos_parent_demo.py
    Written by Sergi Caelles (scaelles@vision.ee.ethz.ch)
    This file is part of the OSVOS paper presented in:
      Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixe, Daniel Cremers, Luc Van Gool
      One-Shot Video Object Segmentation
      CVPR 2017
    Unknown code license
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import sys
import tensorflow as tf
slim = tf.contrib.slim

  from ._conv import register_converters as _register_converters


In [2]:
# Import model files
import model
from dataset import BKVOCDataset

## Configuration

In [3]:
# Model paths
# Pre-trained VGG_16 downloaded from http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
imagenet_ckpt = 'models/vgg_16_4chan/vgg_16_4chan.ckpt'
segnet_stream = 'weak'
ckpt_name = 'vgg_16_4chan_' + segnet_stream
logs_path = 'models/' + ckpt_name

# Training parameters
gpu_id = 0
iter_mean_grad = 10
max_training_iters_1 = 15000
max_training_iters_2 = 30000
max_training_iters_3 = 50000
save_step = 5000
test_image = None
display_step = 100
ini_lr = 1e-8
boundaries = [10000, 15000, 25000, 30000, 40000]
values = [ini_lr, ini_lr * 0.1, ini_lr, ini_lr * 0.1, ini_lr, ini_lr * 0.1]

## Dataset load

In [4]:
# Load the Berkeley-augmented Pascal VOC 2012 segmentation dataset
if sys.platform.startswith("win"):
    dataset_root = "E:/datasets/bk-voc/benchmark_RELEASE/dataset"
else:
    dataset_root = '/media/EDrive/datasets/bk-voc/benchmark_RELEASE/dataset'
dataset = BKVOCDataset(phase='train', dataset_root=dataset_root)

In [5]:
# Display dataset configuration
dataset.print_config()


Configuration:
  in_memory            False
  data_aug             False
  use_cache            False
  use_grabcut_labels   True
  phase                train
  samples              20172


## Offline Training

In [6]:
# Train the network with strong side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(0, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 1, learning_rate, logs_path, max_training_iters_1, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, test_image_path=test_image,
                           ckpt_name=ckpt_name)

Network Layers:
   name = weak/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = weak/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = weak/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = weak/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = weak/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = weak/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = weak/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = weak/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = weak/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = weak/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = weak/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = weak/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_3/Relu:0, shape = (1

In [7]:
# Train the network with weak side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(max_training_iters_1, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 2, learning_rate, logs_path, max_training_iters_2, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, resume_training=True,
                           test_image_path=test_image, ckpt_name=ckpt_name)

Network Layers:
   name = weak/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = weak/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = weak/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = weak/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = weak/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = weak/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = weak/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = weak/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = weak/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = weak/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = weak/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = weak/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_3/Relu:0, shape = (1

In [8]:
# Train the network without side outputs supervision
with tf.Graph().as_default():
    with tf.device('/gpu:' + str(gpu_id)):
        global_step = tf.Variable(max_training_iters_2, name='global_step', trainable=False)
        learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
        model.train_parent(dataset, imagenet_ckpt, 3, learning_rate, logs_path, max_training_iters_3, save_step,
                           display_step, global_step, segnet_stream, iter_mean_grad=iter_mean_grad, resume_training=True,
                           test_image_path=test_image, ckpt_name=ckpt_name)

Network Layers:
   name = weak/conv1/conv1_1/Relu:0, shape = (1, ?, ?, 64)
   name = weak/conv1/conv1_2/Relu:0, shape = (1, ?, ?, 64)
   name = weak/pool1/MaxPool:0, shape = (1, ?, ?, 64)
   name = weak/conv2/conv2_1/Relu:0, shape = (1, ?, ?, 128)
   name = weak/conv2/conv2_2/Relu:0, shape = (1, ?, ?, 128)
   name = weak/pool2/MaxPool:0, shape = (1, ?, ?, 128)
   name = weak/conv3/conv3_1/Relu:0, shape = (1, ?, ?, 256)
   name = weak/conv3/conv3_2/Relu:0, shape = (1, ?, ?, 256)
   name = weak/conv3/conv3_3/Relu:0, shape = (1, ?, ?, 256)
   name = weak/pool3/MaxPool:0, shape = (1, ?, ?, 256)
   name = weak/conv4/conv4_1/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv4/conv4_2/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv4/conv4_3/Relu:0, shape = (1, ?, ?, 512)
   name = weak/pool4/MaxPool:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_1/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_2/Relu:0, shape = (1, ?, ?, 512)
   name = weak/conv5/conv5_3/Relu:0, shape = (1

2018-02-10 17:35:19.317685 Iter 45600: Training Loss = 7019.5088
2018-02-10 17:38:03.406183 Iter 45700: Training Loss = 2725.4863
2018-02-10 17:40:47.390936 Iter 45800: Training Loss = 1581.2068
2018-02-10 17:43:32.264228 Iter 45900: Training Loss = 140.3235
2018-02-10 17:46:16.494487 Iter 46000: Training Loss = 1638.9476
2018-02-10 17:49:00.334807 Iter 46100: Training Loss = 1850.7594
2018-02-10 17:51:44.084685 Iter 46200: Training Loss = 0.4077
2018-02-10 17:54:28.920501 Iter 46300: Training Loss = 1457.1787
2018-02-10 17:57:12.672795 Iter 46400: Training Loss = 1645.0813
2018-02-10 17:59:57.259274 Iter 46500: Training Loss = 0.4077
2018-02-10 18:02:41.196020 Iter 46600: Training Loss = 85.9722
2018-02-10 18:05:25.667725 Iter 46700: Training Loss = 37406.7969
2018-02-10 18:08:01.146626 Iter 46800: Training Loss = 0.4077
2018-02-10 18:10:32.594511 Iter 46900: Training Loss = 0.4077
2018-02-10 18:13:04.378672 Iter 47000: Training Loss = 49.4253
2018-02-10 18:15:35.880831 Iter 47100: Tr

## Training losses & learning rate
You should see training curves similar to the following:
![](img/vgg_16_4chan_weak_dsn2_loss.png)

![](img/vgg_16_4chan_weak_dsn3_loss.png)

![](img/vgg_16_4chan_weak_dsn4_loss.png)

![](img/vgg_16_4chan_weak_dsn5_loss.png)

![](img/vgg_16_4chan_weak_main_loss.png)

![](img/vgg_16_4chan_weak_total_loss.png)

![](img/vgg_16_4chan_weak_learning_rate.png)