# Dealing with Images (Enhancing and Segmenting)

## Introduction

This project dives into Encoders-Decoders, where these models are used to edit and generate full images. How these models can be adapted for a wider range of applications such as image denoising or object and instance segmentation. The project will also introduce new concepts like Unpooling, Transposed and Atrous Convolutions layers to the network architecture and its utility for high-dimensional data. Encoders-Decoders can be used for semantic segmentation for driverless cars, where it would help in defining the objects surrounding the vehicle like, roads, other vehicles, people or trees etc. 

## Breakdown of this Notebook:
- Introduction to Encoders-Decoders.
- Encoders-Decoders trained for pixel-level prediction.
- Layers such as Unpooling, Transposed and Atrous Convolutions to output high-dimensional data.
- FCN and U-Net Architectures for semantic segmentation.
- Instance segmentation (extension of Faster-RCNN with Mask-RCNN)

## Requirements:
1) Tensorflow 2.0 (GPU prefferably) \
2) CV2 (OpenCV) \
3) Cython \
4) Eigen \
5) PyDenseCRF

For "PyDenseCRF" for windows, LINK: https://github.com/lucasb-eyer/pydensecrf\

It can be installed directly with the following in command prompt or terminal-equivalent: __conda install -c conda-forge pydensecrf__.

If Conda-Forge __does not work__, try: 
- going to: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pydensecrf
- Download: pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl
- Where "cp37" in the filename is the python version of 3.7, make sure you download the correct one.
- Place the downloaded "pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl" file in your working directory drive.
- Open Command Prompt and type in: pip install pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl
- Or if you placed it in a folder or different location: pip install <FILEPATH>\pydensecrf-1.0rc2-cp37-cp37m-win_amd64.whl

## Dataset:



## 


### Import the required libraries:

In [1]:
%matplotlib inline

import tensorflow as tf
import numpy as np
import timeit
import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2

In [2]:
import os
# from IPython.display import display, Image
import matplotlib.pyplot as plt

# %matplotlib inline

# Set up the working directory for the images:
image_folderName = 'Description Images'
image_path = os.path.abspath(image_folderName) + '/'

In [3]:
# Set the random set seed number: for reproducibility.
Seed_nb = 42

# Set to run or not run the code block: for code examples only. (0 = run code, and 1 = dont run code)
dont_run = 0

### GPU Information:

In [4]:
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
devices = sess.list_devices()
devices

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:09:00.0, compute capability: 7.5



[_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 1884025239219475482),
 _DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 6586313605, 13382654521099684224)]

### Use RTX_GPU Tensor Cores for faster compute: FOR TENSORFLOW ONLY

Automatic Mixed Precision Training in TF. Requires NVIDIA DOCKER of TensorFlow.

Sources:
- https://developer.nvidia.com/automatic-mixed-precision
- https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#framework

When enabled, automatic mixed precision will do two things:

- Insert the appropriate cast operations into your TensorFlow graph to use float16 execution and storage where appropriate(this enables the use of Tensor Cores along with memory storage and bandwidth savings). 
- Turn on automatic loss scaling inside the training Optimizer object.

In [5]:
# os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'

EXAMPLE CODE: 

In [6]:
# # Graph-based example:
# opt = tf.train.AdamOptimizer()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# train_op = opt.miminize(loss)

# # Keras-based example:
# opt = tf.keras.optimizers.Adam()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# model.compile(loss=loss, optimizer=opt)
# model.fit(...)

### Use RTX_GPU Tensor Cores for faster compute: FOR KERAS API

Source:
- https://www.tensorflow.org/guide/keras/mixed_precision
- https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/experimental/Policy

In [7]:
from tensorflow.keras.mixed_precision import experimental as mixed_precision

In [8]:
# # Set for MIXED PRECISION:
# policy = mixed_precision.Policy('mixed_float16')
# mixed_precision.set_policy(policy)

# print('Compute dtype: %s' % policy.compute_dtype)
# print('Variable dtype: %s' % policy.variable_dtype)

## 1 - Introduction to Encoders-Decoders:

The Encoder-Decoder architecture is composed of, like its name suggests, an encoder at one half and a decoder at ther other half. The Encoder is essentially a function that will map the input data into a latent space. This latent space is composed of a structured set of values that is defined by the encoder. The Decoder takes the mapped elements from the latent space and maps them into the predefined target domain. Typically, there are many applications for Ecoders-Decoders, such as communications (transmiters and receivers), electronics, and so on. It largely serves the purpose as an information converter. For the purposes of Machine Learning, these kinds of network were used for text translation, whereby the Encoder network would ingest a source language input (like spanish) and learns to project the data into a latent space and have the language's meaning encoded as a feature vector (a.k.a codes), this is then followed by the Decoder network (trained alongside the Encoder) that would convert these encoded feautre vectors into the target language (like english).

Below shows an example of the Ecoder-Decoder Network:

<img src="Description Images/EncoderDecoderNetwork.png" width="750">

Image Ref -> https://www.pyimagesearch.com/2020/02/24/denoising-autoencoders-with-keras-tensorflow-and-deep-learning/ and https://www.researchgate.net/figure/t-SNE-visualizations-for-clustering-on-MNIST-dataset-in-a-Original-pixel-space-b_fig1_322674846

Here, the Encoder can be seen to be trained on the MNIST Digits dataset, where it converts the 28x28 images into the latent space of vectors (codes) of 32 values. The Decoder is then trained to recover the images from the latent space of vectors. By plotting (using the t-SNE method) the codes with the class labels, it can be seen that it shows the similarities or its structure inherent of the dataset. These are the semantic information that was referred to earlier. Encoders serves to extract these semantic information that are inherent within the dataset and then it is decoded (by the Decoder) to decompress the information accordingly.

## 2 - Auto-Encoding:

These Auto-Encoders are a special kind of Encoder-Decoder, where for such a task that the input and target domains presents to be the same, like images, the Auto-Encoder should properly encode and decode these images without having any impact on the quality. It should perform all of these despite its inherent bottleneck in desgin. Overall, the process begins with the inputs being converted into a compressed representation into the latent space by the Encoder network, where these feature vectors (compressed representations) will be reconstructed by the Decoder network. The distance between the input and output data is usually the loss calculation that would be minimised. The training process for Auto-Encoders are simpler than others largely due to no requirements for ground truth labels, this is because the input images themselves are used as the ground truth. These kinds of model are typically called __Self-Supervised Models__.

One good example of this is JPEG tools and that these are also AutoEncoders. Similar to the mentioned function above, these JPEG tools will encode the images firstly then proceeds to decode them while retaining as much quality as possible. For images, the loss computation is the distance as cross-entropy loss or L1 (Manhattan) or L2 (Euclidean) loss between the input and the target images.

## 2.1 - Usage of Auto-Encoders:

List of Auto-Encoder usage:

1) __Depth Regression__: It can be used to estimate the distance between the camera and the image content of interests at a pixel level. These kinds are important operations for applications such as Augmented-Reality, which allows for the construction of 3D representation of the surroundings and its interactions with the environment.

2) __Semantic Segmentation__: This is one of the more common use cases, where the model will be trained to return the estimated class for each of the pixels in the iamge. This is a highly important example as it can be used to define objects of interests such as trees, people, vehicles and so on for driverless cars.

3) __Artistic Tasks__: These kinds of Encoder-Decoders can be used for transforming images into pseudo-realistic output images, like estimating objects under the sea without the blue colour of the sea water, or estimating the day time representation of a scene with photos taken at night.

4) __Generative Tasks__: AutoEncoders can also be structurally configured for generative tasks, where the latent space can be structured in such a way during training that when the feature vector is selected it can be decoded to form a picture. This will lead to more advanced models like Generative Adversarial Networks (GANS).

5) __Denoising AEs__: These types of AutoEncoders can transform lossy inputs to return the original versions. Since the input is lossy, the models will then be trained to cancel the lossy operation in order to recover the missing information that will ouptut the original images. The applications for these networks will be for upscaling or super-resolution of images, where it has the added benefits of removing artifacts that are traditionally caused by bilinear interpolation.

## 2.2 - Image Denoising Example:

This section will demonstrate the ease of constructing an example of Encoder-Decoder network with Keras. The architecture here is a symmetrical form of the Encoders-Decoders with the lower dimensional bottlenecks. Both the inputs and targets are set as "x_train". The Sigmoid function is also used in the last layer to output values between 0 and 1.

In [11]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input

In [14]:
if dont_run == 1:
    
    # Example Code of Encoder-Decoder:
    inputs = Input( shape = [img_height * img_width] )
    
    # Encoder layers:
    encoder_1 = Dense(units = 128, activation = 'relu')(inputs)
    code = Dense(units = 64, activation = 'reult')(encoder_1)
    
    # Decoder Layers:
    decoder_1 = Dense(units = 64, activation = 'relu')(code)
    preds = Dense(units = 128, activation = 'sigmoid')(decoder_1)
    
    # Model:
    autoEncoder = Model(inputs, preds)
    
    # Training phase: notice, the x_train are both input and targets.
    autoEncoder.compile(loss = 'binary_crossentropy')
    autoEncoder.fit(x_train, x_train)

For the purposes of denoising an image, the model above can be trained by passsing in a noisy copy of the training images through the network.

In [15]:
if dont_run == 1:
    # Noisy input images:
    x_noisy = x_train + np.random.normal(loc = .0,
                                         scale = .5,
                                         size = x_train.shape)

    # Fit the model:
    autoEncoder.fit(x_noisy, x_train)