# Object Detection Models

## Introduction

For this project, the aim would be to go through the techniques that are used for object detection in a scene or an image. The techniques that are explored here are the __You Only Look Once (YOLO)__ and __Regions with Convolutional Neural Networks (R-CNN)__. 

The process of detecting ojects in an image or video stream coupled with their bounding boxes is what object detection. Object detection is also called object locatlisation. A bounding box is a small rectangle that surrounds the object in question/interest. Here, the input for the algorithm is usually an image and the output would be a list of bounding boces and the object classes/labels. For each of the bounding boxes, the model should be able to output the corresponding predicted class/label and its confidence that the guess it correct. 

Object detection in general are widely used in industry. For example, these models can be used in the following:
1. Self driving car - for perceiving vehicles and pedestrians.
2. Content moderation - to locate forbidden objects in the scene and its respectiv size.
3. Healthcare - detecting tumors or dangerous unwanted tissues from radiographs.
4. Manufacturing - used in assembly robots of the manufacturing chain to put together or repair products.
5. Security - to detect threats, threspasses, or count people.
6. Wildlife Conservation - to monitor the population of animals.

## Breakdown of this Notebook:
- History of the object detection techniques.
- The main approaches in object detection.
- Implementing the YOLO Architecture for fast object detection task.
- Improving upon YOLO with the Faster R-CNN architecture.
- Utilising the Faster R-CNN with the TensorFlow Object Detection API.

## Dataset:



## 




### Import the required libraries:

In [None]:
%matplotlib inline

import tensorflow as tf
import numpy as np
import timeit
import time
from absl import app, flags, logging
from absl.flags import FLAGS
import cv2

In [None]:
import os
from IPython.display import display, Image
import matplotlib.pyplot as plt

# %matplotlib inline

# Set up the working directory for the images:
image_folderName = 'Description Images'
image_path = os.path.abspath(image_folderName) + '/'

In [None]:
# Set the random set seed number: for reproducibility.
Seed_nb = 42

### GPU Information:

In [None]:
sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True))
devices = sess.list_devices()
devices

### Use RTX_GPU Tensor Cores for faster compute: FOR TENSORFLOW ONLY

Automatic Mixed Precision Training in TF. Requires NVIDIA DOCKER.

Sources:
- https://developer.nvidia.com/automatic-mixed-precision
- https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#framework

When enabled, automatic mixed precision will do two things:

- Insert the appropriate cast operations into your TensorFlow graph to use float16 execution and storage where appropriate(this enables the use of Tensor Cores along with memory storage and bandwidth savings). 
- Turn on automatic loss scaling inside the training Optimizer object.

In [6]:
# os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'

EXAMPLE CODE: 

In [7]:
# # Graph-based example:
# opt = tf.train.AdamOptimizer()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# train_op = opt.miminize(loss)

# # Keras-based example:
# opt = tf.keras.optimizers.Adam()
# opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
# model.compile(loss=loss, optimizer=opt)
# model.fit(...)

### Use RTX_GPU Tensor Cores for faster compute: FOR KERAS API

Source:
- https://www.tensorflow.org/guide/keras/mixed_precision
- https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/experimental/Policy

In [8]:
from tensorflow.keras.mixed_precision import experimental as mixed_precision

In [9]:
# Set for MIXED PRECISION:
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

print('Compute dtype: %s' % policy.compute_dtype)
print('Variable dtype: %s' % policy.variable_dtype)

Compute dtype: float16
Variable dtype: float32


## 1 - History of Object Detection:

In the past, classical computer vision techniques for object detection uses image descriptors, this is where to detect an object like a bike would require several images of this object. The term descriptors refers to the bike object that would be extracted from the image and that these descriptors represents different parts of the bike. As the algorithm looks for the object (bike), it will try to find the descriptors in the target images. 

The most common technique at the time was the usage of the floating window. It is where small rectangular areas of the images were examined one by one, where the part that matches the descriptors the most would be considered to contain the object of interest. This technique have a few advantage at the time, where it was robust to rotation and colour changes in the images, it also does not require a lot training data samples and overall, it worked for most objects. The drawback was that the level of accuracy was not good enough. Soon Neural Networks outpaced this tradiational technique.

Modern algorithms can be seen to have better performance, where this refers to the following:
1. Bounding Box Precision - it provides the correct bounding box where it is not too large or narrow.
2. Recall - it is able to find all the objects.
3. Class Prediction - it is able to output the correct class/label for each of the found object.
4. Speed of the algorithm - it is where the models are getting faster and faster at computing the results so that it can be used in real time. (real time = 5fps for computer vision tasks.)

## 2 - Evaluating the Object Detection model's Performance:




