# Augmenting a dataset for semantic segmentation

In this notebook, we illustrate how CLODSA can be employed to augment a dataset of images devoted to semantic segmentation. In particular, we use the dataset provided for the [2018 Data Science Bowl](https://www.kaggle.com/c/data-science-bowl-2018), that is divoted to find the nucle in divergent images to advance medical discovery - from now on we will call this dataset, the Nuclei dataset.

The Nuclei training dataset consists of 670 nucle images and their corresponding lbels. For illustration purposes, we take a subset of the Nucle dataset containing 100 images. Such a subset can be downloaded by executing the following command. 

We can check the amount of images in each one of the folders.

In [None]:
!git config --global --unset http.proxy 
!git config --global --unset https.proxy
!git config --global user.name "zebak12"
!git config --global user.email "zebakarin@gmail.com"
!git clone https://zebak12:april98%40ZEBA@github.com/zebak12/oral-cancer-data.git

In [None]:
!pip3 install clodsa

In [None]:
print("Number of images")
!ls ./oral-cancer-data/train/images/ | wc -l
print("Number of masks")
!ls ./oral-cancer-data/train/labels | wc -l

## Augmentation techniques

For this example, we consider the augmentation techniques applied in the work ["U-Net: Convolutional Networks for Biomedical Image Segmentation"](https://arxiv.org/abs/1505.04597), where they present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. Using such an approach, they won the [ISBI challenge for segmentation of neuronal structures in electron microscopic stacks](http://brainiac2.mit.edu/isbi_challenge/home). 

The augmentation techniques applied in that work are:
- Shifting.
- Rotation.
- Elastic deformations.

In addition, we also apply gamma correction. 

## Loading the necessary libraries

The first step in the pipeline consists in loading the necessary libraries to apply the data augmentation techniques in CLODSA.

In [None]:
from matplotlib import pyplot as plt
from clodsa.augmentors.augmentorFactory import createAugmentor
from clodsa.transformers.transformerFactory import transformerGenerator
from clodsa.techniques.techniqueFactory import createTechnique
import cv2
import numpy as np
%matplotlib inline

## Creating the augmentor object

As explained in the documentation of CLODSA, we need to specify some parameters for the augmentation process, and use them to create an augmentor object.  

_The kind of problem_. In this case, we are working in a semantic segmentation problem.

In [None]:
PROBLEM = "semantic_segmentation"

_The annotation mode_. The annotation is provided by the name of the folder containing the image. 

In [None]:
ANNOTATION_MODE = "folders"

_The input path_. The input path containing the images. 

In [None]:
# parent_path = '/home/ritesh/Desktop/Codes/ER/datasets/er_unet/train/'

In [None]:
INPUT_PATH = 'oral-cancer-data/test/'

_The generation mode_. In this case, linear, that is, all the augmentation techniques are applied to all the images of the original dataset. 

In [None]:
GENERATION_MODE = "linear"

_The output mode_. The generated images will be stored in a new folder called augmented_images_nuclei.  

In [None]:
OUTPUT_MODE = "folders"
OUTPUT_PATH= "augmented_images_nuclei/"
LABELS_EXTENSION = ".jpg"

Using the above information, we can create our augmentor object. 

In [None]:
augmentor = createAugmentor(PROBLEM,ANNOTATION_MODE,OUTPUT_MODE,GENERATION_MODE,INPUT_PATH,{"outputPath":OUTPUT_PATH,"labelsExtension":LABELS_EXTENSION})

## Adding the augmentation techniques

Now, we define the techniques that will be applied in our augmentation process and add them to our augmentor object. To illustrate the transformations, we will use the following image of the dataset. 



In [None]:
img = cv2.imread("oral-cancer-data/train/images/0.jpg")
label = cv2.imread("oral-cancer-data/train/labels/0.jpg")
print(type(img))
# changing to the BGR format of OpenCV to RGB format for matplotlib
plt.figure()
plt.imshow(img[:,:,::-1])
plt.title("Nuclei")
plt.figure()
plt.imshow(label[:,:,::-1])
plt.title("Label")
dst = cv2.addWeighted(img,0.7,label,0.3,0)
plt.figure()
plt.imshow(dst[:,:,::-1])
plt.title("Blending")

First of all, we must define a transformer generator.

In [None]:
transformer = transformerGenerator(PROBLEM)

_Rotations:_

In [None]:
rotateRandom = createTechnique("rotate",{})
augmentor.addTransformer(transformer(rotateRandom))
for angle in [90]:
    rotate = createTechnique("rotate", {"angle" : angle})
    augmentor.addTransformer(transformer(rotate))

Showing the result of applying the transformation.

In [None]:
rotationGenerator = transformer(rotateRandom)
rotateImg,rotateLabel = rotationGenerator.transform(img,label)
plt.figure()
plt.imshow(rotateImg[:,:,::-1])
plt.title("Nuclei")
plt.figure()
plt.imshow(rotateLabel[:,:,::-1])
plt.title("Label")
dst = cv2.addWeighted(rotateImg,0.7,rotateLabel,0.3,0)
plt.figure()
plt.imshow(dst[:,:,::-1])
plt.title("Blending")

**Flipping**

In [None]:
for flip_tp in [1]:
    flip = createTechnique("flip", {"flip" : flip_tp})
    augmentor.addTransformer(transformer(flip))
# hflip = createTechnique("flip",{"flip":1})
# augmentor.addTransformer(transformer(hflip))

In [None]:
flipGenerator = transformer(flip)
flipImg,flipLabel = flipGenerator.transform(img,label)
plt.figure()
plt.imshow(flipImg[:,:,::-1])
plt.title("Nuclei")
plt.figure()
plt.imshow(flipLabel[:,:,::-1]*60)
plt.title("Label")
dst = cv2.addWeighted(flipImg,0.7,flipLabel,0.3,0)
plt.figure()
plt.imshow(dst[:,:,::-1])
plt.title("Blending")

#### Shifting

In [None]:
translation = createTechnique("translation", {"x":15,"y":-5})
augmentor.addTransformer(transformer(translation))

Showing the result of applying the transformation.

In [None]:
translationGenerator = transformer(translation)
translationImg,translationLabel = translationGenerator.transform(img,label)
plt.figure()
plt.imshow(translationImg[:,:,::-1])
plt.title("Nuclei")
plt.figure()
plt.imshow(translationLabel[:,:,::-1])
plt.title("Label")
dst = cv2.addWeighted(translationImg,0.7,translationLabel,0.3,0)
plt.figure()
plt.imshow(dst[:,:,::-1])
plt.title("Blending")

In [None]:
crop = createTechnique("crop", {"percentage":0.8})
augmentor.addTransformer(transformer(crop))

In [None]:
elasticGenerator = transformer(crop)
elasticImg,elasticLabel = elasticGenerator.transform(img,label)
plt.figure()
plt.imshow(elasticImg[:,:,::-1])
plt.title("Nuclei")
plt.figure()
plt.imshow(elasticLabel[:,:,::-1])
plt.title("Label")
dst = cv2.addWeighted(elasticImg,0.7,elasticLabel,0.3,0)
plt.figure()
plt.imshow(dst[:,:,::-1])
plt.title("Blending")

In [None]:
gamma = createTechnique("gamma",{"gamma":1.5})
augmentor.addTransformer(transformer(gamma))

In [None]:
gammaGenerator = transformer(gamma)
gammaImg,gammaLabel = gammaGenerator.transform(img,label)
plt.figure()
plt.imshow(gammaImg[:,:,::-1])
plt.title("Nuclei")
plt.figure()
plt.imshow(gammaLabel[:,:,::-1])
plt.title("Label")
dst = cv2.addWeighted(gammaImg,0.7,gammaLabel,0.3,0)
plt.figure()
plt.imshow(dst[:,:,::-1])
plt.title("Blending")

#### None
(to keep also the original image)

In [None]:
none = createTechnique("none",{})
augmentor.addTransformer(transformer(none))

## Applying the augmentation process

Finally, we apply the augmentation process (this might take some time depending on the number of images of the original dataset and the number of transformations that will be applied). 

In [None]:
augmentor.applyAugmentation()

Finally, we can check the amount of images in the output folder.

In [None]:
print("Number of images in augmented nuclei folder")
!ls augmented_images_nuclei/images/ | wc -l
print("Number of images in augmented nueclei label folder")
!ls augmented_images_nuclei/labels/ | wc -l

In [None]:
!zip -r augmented_images_nuclei.zip augmented_images_nuclei
from google.colab import files
files.download('augmented_images_nuclei.zip')