# Dealing with Images (Enhancing and Segmenting) [Notebook 3]

## Introduction

This project dives into Encoders-Decoders, where these models are used to edit and generate full images. How these models can be adapted for a wider range of applications such as image denoising or object and instance segmentation. The project will also introduce new concepts like Unpooling, Transposed and Atrous Convolutions layers to the network architecture and its utility for high-dimensional data. Encoders-Decoders can be used for semantic segmentation for driverless cars, where it would help in defining the objects surrounding the vehicle like, roads, other vehicles, people or trees etc. 

## Breakdown of this Project:
- Introduction to Encoders-Decoders. (Notebook 1)
- Encoders-Decoders trained for pixel-level prediction. (Notebook 1)
- Layers such as Unpooling, Transposed and Atrous Convolutions to output high-dimensional data. (Notebook 2)
- FCN and U-Net Architectures for semantic segmentation. (Notebook 3)
- Instance segmentation (extension of Faster-RCNN with Mask-RCNN) (Notebook 4)

## Requirements:
1) Tensorflow 2.0 (GPU prefferably) \
2) CV2 (OpenCV) \
3) Cython \
4) Eigen \
5) PyDenseCRF

For "PyDenseCRF" for windows, LINK: https://github.com/lucasb-eyer/pydensecrf\

It can be installed directly with the following in command prompt or terminal-equivalent: __conda install -c conda-forge pydensecrf__.

If Conda-Forge __does not work__, try: 
- going to: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pydensecrf
- Download: pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl
- Where "cp37" in the filename is the python version of 3.7, make sure you download the correct one.
- Place the downloaded "pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl" file in your working directory drive.
- Open Command Prompt and type in: pip install pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl
- Or if you placed it in a folder or different location: pip install <FILEPATH>\pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl

## Dataset:
    
The dataset can be obtain from the link: https://www.cityscapes-dataset.com/dataset-overview/.

Quoted from the website: "The Cityscapes Dataset focuses on semantic understanding of urban street scenes." It consists of >5,000 images with fine-grained semantic labels, 20,000 images with coarser annotations that were shot from the view point of driving a car around different cities in Germany. 


### Import the required libraries:

In [1]:
%matplotlib inline

import tensorflow as tf
import numpy as np
import math
import timeit
import time
import os
import matplotlib.pyplot as plt

# Run on GPU:
os.environ["CUDA_VISIBLE_DEVICES"]= "0" 
tf.config.list_physical_devices('GPU')

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [2]:
# Set the random set seed number: for reproducibility.
Seed_nb = 42

# Set to run or not run the code block: for code examples only. (0 = run code, and 1 = dont run code)
dont_run = 0

### GPU Information:

In [3]:
# sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
# devices = sess.list_devices()
# devices

### Use RTX_GPU Tensor Cores for faster compute: FOR TENSORFLOW ONLY

Automatic Mixed Precision Training in TF. Requires NVIDIA DOCKER of TensorFlow.

Sources:
- https://developer.nvidia.com/automatic-mixed-precision
- https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#framework

When enabled, automatic mixed precision will do two things:

- Insert the appropriate cast operations into your TensorFlow graph to use float16 execution and storage where appropriate(this enables the use of Tensor Cores along with memory storage and bandwidth savings). 
- Turn on automatic loss scaling inside the training Optimizer object.

In [4]:
# os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'

EXAMPLE CODE: 

In [5]:
# # Graph-based example:
# opt = tf.train.AdamOptimizer()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# train_op = opt.miminize(loss)

# # Keras-based example:
# opt = tf.keras.optimizers.Adam()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# model.compile(loss=loss, optimizer=opt)
# model.fit(...)

### Use RTX_GPU Tensor Cores for faster compute: FOR KERAS API

Source:
- https://www.tensorflow.org/guide/keras/mixed_precision
- https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/experimental/Policy

In [6]:
# from tensorflow.keras.mixed_precision import experimental as mixed_precision

In [7]:
# # Set for MIXED PRECISION:
# policy = mixed_precision.Policy('mixed_float16')
# mixed_precision.set_policy(policy)

# print('Compute dtype: %s' % policy.compute_dtype)
# print('Variable dtype: %s' % policy.variable_dtype)

### To run this notebook without errors, the GPU will have to be set accordingly:

In [8]:
# physical_devices = tf.config.list_physical_devices('GPU') 
# physical_devices
# tf.config.experimental.set_memory_growth(physical_devices[0], True) 

## 1 - What is Semantic Segmentation?:

__Semantic Segmentation__ is the task of segmenting images into meaningful parts, it covers segmentation of both objects and instances. This task is different comapred to image classification or object detection tasks, where more fundamentally, it requires the return of a pixel-level dense predictions whereby it assigns a label to each of the pixel of the input images.

## 2 - Encoders-Decoders for Object Segmentation:

Segmenting objects in a scene of an image can be described as the mapping of images from a colour domain to a class domain. It assigns one of the target classes to each pixel and returns a label map of the same height and width. To perform this kind of operation with an Encoder-Decoder will require further considerations as it is not as straightforward.

## 2.1 - Decoding as label maps:

If an Encoder-Decoder network was constructed to output label maps where each of the pixel value would be a class (i.e. 1 for house or 2 for car), the model would only output very poor results. A better implementation is to directly output categorical values instead. Previously, for the task of image classification that consist of "N" number of categories, the final layer of the network would outtput "N" logits for each class, these scores were then converted to probabilities with the Softmax function and finished by picking the largest probability value with Argmax function. This mechanism can also be applied to Semantic Segmentation, where it would be at a pixel level instead rather than the overall image level. 

##### The image below represents the task of Image Segmentation:

<img src="Description Images/Semantic_Segmentation_Overall.PNG" width="750">

The diagram above shows the process of the Encoder-Decoder model taking an input image and outputing the predicted label maps. This process can be broken into three parts. Note that the example used here is labeled as a low-resolution prediction map, where in practice, the predicted segmentation label map should have the same resolution as the original input.

##### The image below represents part 1 of Image Segmentation:

<img src="Description Images/Semantic_Segmentation_1.PNG" width="750">

Image Ref -> https://www.jeremyjordan.me/semantic-segmentation/

First, the main goal of the model would be to take an input such as a RGB coloured image tensor with the shape of (Height x Width x 3) or greyscale tensor with the shape of (Height x Width x 1), and to output a segmentation label map, where each of the pixel would have a class label that is represented as an integer (Height x Width x 1). 

##### The image below represents part 2 of Image Segmentation:

<img src="Description Images/Semantic_Segmentation_2.PNG" width="750">

Image Ref -> https://www.jeremyjordan.me/semantic-segmentation/

Second, the following shows an intermediate stage composing of individual masks for each of the class labels. By setting the number of output channels equal to the number of classes, the Encoder-Decoder model can obtain the output tensor. In doing so, it also means that it can be trained as a classifier. The loss is computed with cross-entropy loss, where it compares the softmax values with the one-hot-encoded ground truth label maps. These (Height x Width x N) prediction can be transformed into per-pixel labels by selecting the highest value along the channel axis. This essentially means that an output prediction image can be formed by collapsing the segmenation map by taking the argmax value along the channel axis (or depth-wise pixel vector).

##### The image below represents part 3 of Image Segmentation:

<img src="Description Images/Semantic_Segmentation_3.PNG" width="750">

Image Ref -> https://www.jeremyjordan.me/semantic-segmentation/

Third, by overlaying these predictions into a single channel will form the target prediction image, that is refered to as the __mask__, where each of the specific class is highlighted over regions of the image.





## 2.2 - Training the model with segmentation losses and metrics:




In [None]:
# <img src="Description Images/.png" width="750">

# Image Ref -> 

# <img src="Description Images/.png" width="750">

# Image Ref -> 