# Object Detection Models

## Introduction

For this project, the aim would be to go through the techniques that are used for object detection in a scene or an image. The techniques that are explored here are the __You Only Look Once (YOLO)__ and __Regions with Convolutional Neural Networks (R-CNN)__. 

The process of detecting ojects in an image or video stream coupled with their bounding boxes is what object detection. Object detection is also called object locatlisation. A bounding box is a small rectangle that surrounds the object in question/interest. Here, the input for the algorithm is usually an image and the output would be a list of bounding boces and the object classes/labels. For each of the bounding boxes, the model should be able to output the corresponding predicted class/label and its confidence that the guess it correct. 

Object detection in general are widely used in industry. For example, these models can be used in the following:
1. Self driving car - for perceiving vehicles and pedestrians.
2. Content moderation - to locate forbidden objects in the scene and its respectiv size.
3. Healthcare - detecting tumors or dangerous unwanted tissues from radiographs.
4. Manufacturing - used in assembly robots of the manufacturing chain to put together or repair products.
5. Security - to detect threats, threspasses, or count people.
6. Wildlife Conservation - to monitor the population of animals.

## Breakdown of this Notebook:
- History of the object detection techniques.
- The main approaches in object detection.
- Implementing the YOLO Architecture for fast object detection task.
- Improving upon YOLO with the Faster R-CNN architecture.
- Utilising the Faster R-CNN with the TensorFlow Object Detection API.

## Dataset:



## 




### Import the required libraries:

In [None]:
%matplotlib inline

import tensorflow as tf
import numpy as np
import timeit
import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2

In [None]:
import os
from IPython.display import display, Image
import matplotlib.pyplot as plt

# %matplotlib inline

# Set up the working directory for the images:
image_folderName = 'Description Images'
image_path = os.path.abspath(image_folderName) + '/'

In [None]:
# Set the random set seed number: for reproducibility.
Seed_nb = 42

### GPU Information:

In [None]:
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
devices = sess.list_devices()
devices

### Use RTX_GPU Tensor Cores for faster compute: FOR TENSORFLOW ONLY

Automatic Mixed Precision Training in TF. Requires NVIDIA DOCKER of TensorFlow.

Sources:
- https://developer.nvidia.com/automatic-mixed-precision
- https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#framework

When enabled, automatic mixed precision will do two things:

- Insert the appropriate cast operations into your TensorFlow graph to use float16 execution and storage where appropriate(this enables the use of Tensor Cores along with memory storage and bandwidth savings). 
- Turn on automatic loss scaling inside the training Optimizer object.

In [6]:
# os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'

EXAMPLE CODE: 

In [7]:
# # Graph-based example:
# opt = tf.train.AdamOptimizer()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# train_op = opt.miminize(loss)

# # Keras-based example:
# opt = tf.keras.optimizers.Adam()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# model.compile(loss=loss, optimizer=opt)
# model.fit(...)

### Use RTX_GPU Tensor Cores for faster compute: FOR KERAS API

Source:
- https://www.tensorflow.org/guide/keras/mixed_precision
- https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/experimental/Policy

In [8]:
from tensorflow.keras.mixed_precision import experimental as mixed_precision

In [9]:
# Set for MIXED PRECISION:
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

print('Compute dtype: %s' % policy.compute_dtype)
print('Variable dtype: %s' % policy.variable_dtype)

Compute dtype: float16
Variable dtype: float32


## 1 - History of Object Detection:

In the past, classical computer vision techniques for object detection uses image descriptors, this is where to detect an object like a bike would require several images of this object. The term descriptors refers to the bike object that would be extracted from the image and that these descriptors represents different parts of the bike. As the algorithm looks for the object (bike), it will try to find the descriptors in the target images. 

The most common technique at the time was the usage of the floating window. It is where small rectangular areas of the images were examined one by one, where the part that matches the descriptors the most would be considered to contain the object of interest. This technique have a few advantage at the time, where it was robust to rotation and colour changes in the images, it also does not require a lot training data samples and overall, it worked for most objects. The drawback was that the level of accuracy was not good enough. Soon Neural Networks outpaced this tradiational technique.

Modern algorithms can be seen to have better performance, where this refers to the following:
1. Bounding Box Precision - it provides the correct bounding box where it is not too large or narrow.
2. Recall - it is able to find all the objects.
3. Class Prediction - it is able to output the correct class/label for each of the found object.
4. Speed of the algorithm - it is where the models are getting faster and faster at computing the results so that it can be used in real time. (real time = 5fps for computer vision tasks.)

## 2 - Evaluating the Object Detection model's Performance:

Before diving into YOLO's architecture or going further, it is important to cover some basics in model evaluation that are related to the YOLO model. Evaluating an object detection model will require more than accuracy of the prediction against the ground truth, these extra metrics to be included here are called __Precision__ and __Recall__. These will serve as the basis to compute other metrics that are important to object detection. 

## 2.1 - Precision and Recall Metrics:

To begin with, here are the formulas for these metrics:

$$ precision = \frac{TP}{TP - FP} $$

and

$$ recall = \frac{TP}{TP - FN} $$

where here,

- TP = Number of True Positives (like how many prediction are matching the ground truth of the same class) 
- FP = Number of False Positives (like how many prediction that do not match the ground truth for the same class)
- FN = Number of False Negatives (like how many ground truths do not have a matching prediction)

Additionally, which is not included in the formulas above, but are used in other areas/topics:
- TN = Number of True Negatives (like how many ground truths are actualy a negative example)

If the predictions are exactly matching the ground truthss, both FP and FN won't exist. In this case following the formulas, both precision and recall would equal to 1, leading to a perfect score. 

When a model's predictions comes from non-robust features during its training, the precision of the overall model will decrease because there will be a rise in false positives. In contrast, if the model is too strict where it detects only under precise conditions are met, the recall will decrease due to a rise in false negatives.

## 2.2 - Precision and Recall Curve:

Precision-Recall curves are used to summarise the trade off between the TP rate and the posistive predictive calue for a model by using different probability thresholds. The ideal model should have both high precision and high recall, however, most models often have a trade off between the two. One way to tell how well the model is performing is to compare the area under the curve (AUC) of the chart, where if the area is larger then it performs better or is a better classifier. Below shows a chart that compares two models where the BLUE line model is a better model compared to the GREEN line model. 

Source: 
- https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/
- https://www.geeksforgeeks.org/precision-recall-curve-ml/

<img src="Description Images/Precision Recall Curve.png" width="350">

Image Ref -> https://www.geeksforgeeks.org/precision-recall-curve-ml/

Since the general idea is to visualise the performance (precision and recall) at each threshold of confidence, where for every output bounding box, the model will also output a confidence (between 0 and 1, where 1 is more confident). In practice it is a good idea to remove less confident predictions and this is done by setting a certain threshold (T = 0.48 or 0.4) so that any predictions below this number will be removed. 

Changing the value of the threshold will impact the model's precision and recall, where for:
1. __If T = close to 1__: The Precision will be high and the Recall will be low. This means that many objects will be filtered out and the model will miss it as only the confident predictions are kept.
2. __If T = close to 0__: The Recall will be high and the Precision will be low. This means that most of the predictions will be kept and there will not be any false negatives. This directly contributes to the rise of false positive results where the model is less confident in its predictions. 

The choice of this threshold is also high reliant on the problem at hand (classification task). For example, for a model to detect pedestrians, the ideal model would have a high recall rate so not to misss any passers-by, even if the car have to stop for no reason. For a model to detect investment opportunities, the ideal model will have a high precision rate to make sure that it doesn't miss good opportunities and avoids the wrong opportunities.

## 2.3 - Average Precision and Mean Average Precision:

For convenience in practice, often a single number to describe what is happening is very useful. In these cases there are
two terms to consider using: 

1. __Average Precision (AP)__: is the area under th curve and it is always between 0 and 1. This was meantioned earlier. It gives a good indicator of the model performance for a __Single__ class.
2. __Mean Average Precision (mAP)__: is the mean of the average precision for __each__ of the classes. It provides an overall __global scor__. 

## 2.4 - Average Precision Threshold: Jaccard Index (or Intersection over Union, Io)

The __Jaccard Index__ or __Intersection over Union (IoU)__ is the common metric that is used to measure when a prediction and ground truth are matching. This also defines the TP and FP metrics earlier. The following equation defines the IoU a:

$$ IoU(A, B) =  \frac{\lvert{A \cap B}\rvert}{\lvert{A \cup B}\rvert} = \frac {\lvert{A \cap B}\rvert}{\lvert{A}\rvert - \lvert{B}\rvert - \lvert{A \cap B}\rvert} $$

Where here,
- $\lvert{A}\rvert$, is the cardinality of set A. The number of elements A contains.
- $\lvert{B}\rvert$, is the cardinality of set B. The number of elements B contains.
- $\lvert{A \cap B}\rvert$, is the numerator for number of elements that are in common between A and B. Here $A \cap B$ is the intersection between the two sets.
- $\lvert{A \cup B}\rvert$, is the demoninator for total number of elements that A and B sets covers. Here $A \cup B$ is the union of the sets.

The following diagram shows Precision, Recall and IoU:

<img src="Description Images/IoU intersection.PNG" width="600">

Image Ref -> http://www.gabormelli.com/RKB/Bounding_Box_Intersection_over_Union_(IoU)_Measure

Generally, the IoU are used in the computation rather than just the intersections, because the intersections value is absolute but not relative, meaning that two big boxes have more overlapping pixels than two smaller boxes, so the ratio is used instead as it is relative. IoU will always be between 0 and , where 0 if the two boxes does not overlap and 1 if the boxes overlaps completely.

## 3 - YOLO Detection Algorithm: 