# Semantic Segmentation using SegNet 
## Introduction
This notebook is a summary of a personal project demonstrating how convolutional neural networks (CNNs) can be used for an image segmentation task. Specifically, we describe semantic labelling of specific targets within a dataset. Semantic segmentation is a common segmentation technique where different objects are distinguised from each other based on class type only, though each instance within the class is not delineated. 

<br>
<img src="typesofseg.png" width=1000 height=1000 />


##  Dataset 
I've been particularly interested in how supervised learning can be used in digital pathology and other medical/research image modailties. For this project, I used a **darkfield microscopy dataset** populated with labelled red blood cells (RBC's) and Spirochaeta bacteria. My goal was to construct and test a CNN architecture from the groud up which could label RBC's from bacteria within reasonable accuracy. For me, automating labelling of bacterium species seemed like a great starting point for exploring the capabilities of supervised learning in a diagnostic capacity. 

I later discuss the results on my implementation in PyTorch. 

### Images and mask labels

A few examples from the dataset of of colorized cells and bacterium species (left) and corresponding labelled masks which would be used for training (right) 
<br>
<img src="1.png" width=400 height=400 />
<img src="2.png" width=400 height=400 />
<img src="3.png" width=400 height=400 />

### Overview
<img src="overview.png" width=700 height=700 />

## SegNet 
SegNet is an CNN configuration originally developed at the University of Cambridge for semantic segmentation. At the time of its publication, SegNet introduced a rather novel *encoding-decoding architecture* compared to its contemporaries, which used fully connected layers (and consequently many more parameters to train.) As for why I chose this architecture, it came down to my personal interest with its implementation. 

### Architecture
The encoding - decoding stragety aims to reduce the number of learnable parameters by sucessively reducing spatial resolution in the encoder stage, followed by restoring this resolution for the final label projection. We trade spatial resolution for feature map depth at each encoder stage to encourage the network to infer more complex abstractions (ie. patterns in the image data.) The decoder network inverts this process by restoring resolution and collapsing feature map depth to project a final prediction. 

The complete architecture is illustrated below from the original SegNet paper: 
<img src="segnet_architecture.png" width=800 height=800 />

- **Convolutional layer**: 
A layer where the input image is convolved with a pre-determined filter of size K, stride S, and padding P. Convolutions are essentially linear combination operations with the filter providing the weights. 
<img src="05_convolutions_example.gif" width="500" align="center"> _Stride_ refers to how many units the kernel jumps either vertically or horizontally along the input image during each convolution operation. _Padding_ refers to how many rows and columns of zeros are added around the input matrix prior to convolution. Since colvolutions inherently reduce the input size, we can pad in case we would like the same output dimension. The filter weights are leanable parameters, meaning they can be modified as part of the networks learning process (back propagation.) 


- **Batch normalization layer**: 
A layer which normalizes the input images provided to it. This layer typically accepts a 'mini-batch' of images prior to normalizing the images using their mean and standard deviation. This normalization is important so that the feature spaces are comprable in range, as otherwise we run the risk of encoutering *vanishing and exploding gradients* in the backpropagation step (i.e gradients are too small/large and training slows down or effectively stops.) 


- **ReLU**: 
Short form for rectified linear unit function. It is an element-wise activation function which applies a very simple scaling to its input: <img src="relu.png" width=300 height=300 />
Each element in an input layer fed into a ReLU is subject to this activation function. This function is only piecewise linear (note the elbow at the origin.) ReLU adds a non-linearity to the network and allows the netowrk to learn non-linear relationships in the data.


- **Max Pooling and Max Unpooling**: 
_Max Pooling_ is a downsampling operation where the maximum element is taken in a sliding kernel of size K, with stride S, and padding P (of the input image.) Max pooling is important in the SegNet architecture as part of the encoder stage to reduce spatial resolution. _Max Unpooling_ is the inverse of this operation, which is significant in the decoder stage where we attempt to restore spatial resolution prior to our pixel-wise class predictions. <img src="pooling.png" width="500" align="center">


- **Softmax**:  
Function which turns an input vector of 'n' values into values that add up to '1'. The softmax is best demonstrated using a simple example: Assume some vector K = [a, b, c] 
### $Softmax(k) =$ [$\frac{e^a}{e^a + e^b + e^c}$, $\frac{e^b}{e^a + e^b + e^c}$, $\frac{e^c}{e^a + e^b + e^c}$] 

    The softmax function has the effect of scaling values into a probability distribution, where negative values return small pobabilities, and vice versa. As such, we can use it to produce classification decisions for any number of mutually exclusive classes. Since our output from SegNet will contain one channel per class,  we use it to produce a probability distribution for each channel, with each channel representing the probability that a pixel belongs to a its class. Once we obtain this, we simply label any pixel based on which channel assigned the highest probability to it. 

## Training and validation using PyTorch
The complete code can be found at under the SegNet folder. Here, I describe the complete workflow of my PyTorch implemetation, including the transformations used on the training data, the performance of the model, and the decisions I made in tuning the model. 

#### Pytorch overview
Our PyTorch SegNet project consists of a several key components: 
1. Dataset
2. DataLoaders
3. Data transformations
4. Model architecture
5. Loss function and optimizer
6. Hyperparameters
7. Validating accuracy

#### 1. Dataset
I write a custom dataset named *SegNetDataSet* to load my images and target masks, inheriting from the *Dataset* class. The code can be found here. My dataset class accepts transform arguments for both the images and target masks separately, since there are some transformations I do not apply to the target masks (ex: no batch normalization to the target mask.)  

```python
# Load custom dataset
dataset = SegNetDataSet(r'C:\Users\vajra\Documents\GitHub\ML_playground\PyTorch\segnet\archive', 
                        data_transforms=data_transforms, target_transforms=target_transforms)
```

#### 2. DataLoaders
I use the standard DataLoader class to create two loaders, namely a training and test loader for the training and test sets respectively. 

```python
# Produce test and train sets
train_set, test_set = torch.utils.data.random_split(dataset, [329, 37]) # 90% 10% split between train and test 

train_loader = DataLoader(dataset=train_set, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(dataset=test_set, batch_size=batch_size, shuffle=False)
```

#### 3. Data Transformations
