# Object Detection Using YOLO
- Environment Used: TensorFlow 2.6 Python 3.8 CPU

## Problem Statement:
You have been provided with a trained model of YOLOv3 on the MS COCO dataset. Your task is to create an object detection program using this model for the various objects in the dataset.

## Step 1: Importing the Required Packages

In [None]:
import numpy as np
from numpy import expand_dims
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from matplotlib import pyplot
from matplotlib.patches import Rectangle

## Step 2: Creating Bounding Boxes
- Define the bounding box's coordinates (**xmin**, **ymin**, **xmax**, **ymax**)
- Set optional parameters like objness (Objectiveness score) and classes (List or array of class probabilities)
- Initialize label and score attributes to **-1**, indicating their uninitialized state
- Calculate the label by finding the index with the maximum class probability if not previously computed
- Return the identified label
- Determine the score using the probability of the detected label if not previously determined
- Return the computed score

In [None]:
class BoundBox:
    def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax
        self.objness = objness
        self.classes = classes
        self.label = -1
        self.score = -1

    def get_label(self):
        if self.label == -1:
            self.label = np.argmax(self.classes)

        return self.label

    def get_score(self):
        if self.score == -1:
            self.score = self.classes[self.get_label()]

        return self.score

## Step 3: Performing Bounding Box Decoding for YOLO Object Detection

- Define a function to apply sigmoid activation function to a given value

- Define a function to decode the network output and generate bounding boxes
- Get the dimensions of the network output
- Reshape the network output
- Apply sigmoid activation to the relevant elements in the network output
- Iterate over each grid cell and anchor box
- Check the objectness score of the bounding box
- Skip if objectness score is below the threshold
- Retrieve the x, y, w, h values of the bounding box
- Calculate the center position and unit in terms of image width and height
- Calculate the width and height of the bounding box in terms of image width and height
- Retrieve the class probabilities
- Create a bounding box object with the calculated values
- Append the bounding box to the list


In [None]:
def _sigmoid(x):
    return 1. / (1. + np.exp(-x))

In [None]:
def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    grid_h, grid_w = netout.shape[:2]
    nb_box = 3
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))
    nb_class = netout.shape[-1] - 5
    boxes = []
    netout[..., :2]  = _sigmoid(netout[..., :2])
    netout[..., 4:]  = _sigmoid(netout[..., 4:])
    netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
    netout[..., 5:] *= netout[..., 5:] > obj_thresh

    for i in range(grid_h*grid_w):
        row = i / grid_w
        col = i % grid_w
        for b in range(nb_box):
            # 4th element is objectness score
            objectness = netout[int(row)][int(col)][b][4]
            if(objectness.all() <= obj_thresh): continue
            # first 4 elements are x, y, w, and h
            x, y, w, h = netout[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w # center position, unit: image width
            y = (row + y) / grid_h # center position, unit: image height
            w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
            h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
            # last elements are class probabilities
            classes = netout[int(row)][col][b][5:]
            box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
            boxes.append(box)
    return boxes

In [None]:
def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
    new_w, new_h = net_w, net_h
    for i in range(len(boxes)):
        x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
        y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
        boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
        boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
        boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
        boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)

In [None]:
def _interval_overlap(interval_a, interval_b):
    x1, x2 = interval_a
    x3, x4 = interval_b
    if x3 < x1:
        if x4 < x1:
            return 0
        else:
            return min(x2,x4) - x1
    else:
        if x2 < x3:
             return 0
        else:
            return min(x2,x4) - x3

In [None]:
def bbox_iou(box1, box2):
    intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
    intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
    intersect = intersect_w * intersect_h
    w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
    w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
    union = w1*h1 + w2*h2 - intersect
    return float(intersect) / union

In [None]:
def do_nms(boxes, nms_thresh):
    if len(boxes) > 0:
        nb_class = len(boxes[0].classes)
    else:
        return
    for c in range(nb_class):
        sorted_indices = np.argsort([-box.classes[c] for box in boxes])
        for i in range(len(sorted_indices)):
            index_i = sorted_indices[i]
            if boxes[index_i].classes[c] == 0: continue
            for j in range(i+1, len(sorted_indices)):
                index_j = sorted_indices[j]
                if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
                    boxes[index_j].classes[c] = 0

## Step 4: Loading and Preparing an Image
- Use the **load_img** function to read the image specified by the filename
- Extract the original width and height of the image
- Use the **img_to_array** function to transform the image into a numpy array
- Convert the image data type to float32
- Scale the pixel values to the range [0, 1] by dividing them by 255
- The function should return the processed image ready for model input, as well as the original width and height of the image.

In [None]:
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height

## Step 5: Applying Threshold to Obtain Previous Results

- Define a function to filter and retrieve boxes, labels, and scores based on a threshold
- Enumerate all boxes
- Enumerate all possible labels
- Check if the threshold for the label is high enough
- Append the box, label, and score to the respective lists


In [None]:
def get_boxes(boxes, labels, thresh):
    v_boxes, v_labels, v_scores = list(), list(), list()
    # enumerate all boxes
    for box in boxes:
        # enumerate all possible labels
        for i in range(len(labels)):
            # check if the threshold for this label is high enough
            if box.classes[i] > thresh:
                v_boxes.append(box)
                v_labels.append(labels[i])
                v_scores.append(box.classes[i]*100)
                # don't break, many labels may trigger for one box
    return v_boxes, v_labels, v_scores

## Step 6: Visualizing All Results
- Load and plot the image
- Get the context for drawing boxes
- Plot each box
- Get coordinates and calculate width and height of the box
- Create the shape
- Draw the box
- Draw text and score in the top-left corner
- Show the plot

In [None]:
def draw_boxes(filename, v_boxes, v_labels, v_scores):
    data = pyplot.imread(filename)
    # plot the image
    pyplot.imshow(data)
    # get the context for drawing boxes
    ax = pyplot.gca()
    # plot each box
    for i in range(len(v_boxes)):
        box = v_boxes[i]
        # get coordinates
        y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
        # calculate width and height of the box
        width, height = x2 - x1, y2 - y1
        # create the shape
        rect = Rectangle((x1, y1), width, height, fill=False, color='white')
        # draw the box
        ax.add_patch(rect)
        # draw text and score in top left corner
        label = "%s (%.3f)" % (v_labels[i], v_scores[i])
        pyplot.text(x1, y1, label, color='white')
    # show the plot
    pyplot.show()

## Step 7: Loading the Pretrained YOLO v3 Model
- Utilize the **load_model** function from the Keras library
- Load the saved YOLOv3 model from the file **yolov3.h5** into a model object

In [None]:
model = load_model('yolov3.h5')
model.summary()

## Step 8: Defining the Expected Input Shape for the Model

- Define the expected input width (input_w) and height (input_h) for the model. For YOLOv3, it is often set to 416x416.
- Specify the path and filename of the image you want to process
- Use the load_image_pixels function (defined earlier) to resize and normalize the image
- Utilize the predict method of the loaded YOLOv3 model to make predictions on the processed image. This returns a list of arrays corresponding to the detection outputs.
- Print the shapes of the returned arrays using a list comprehension
- This will give an insight into how many detections were made, the number of bounding boxes, class probabilities, and so on.

In [None]:
input_w, input_h = 416, 416

In [None]:
photo_filename = 'zebra.jpg'

In [None]:
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))

In [None]:
yhat = model.predict(image)

In [None]:
print([a.shape for a in yhat])

## Step 9: Defining the Anchors
- The anchors is a list containing three sublists, corresponding to the different scales at which detections are made in the YOLOv3 model.
- Each sub-list contains pairs of width and height values for the anchor boxes.

In [None]:
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]

## Step 10: Defining Probability Threshold for Detected Objects
- Set a class detection threshold at 0.6
- Initialize an empty list for storing bounding boxes
- Iterate over network outputs, decode bounding boxes using anchors, and append to the list

In [None]:
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
    # decode the output of the network
    boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)


## Step 11: Adjusting Bounding Box Dimensions
- Invoke the __correct_yolo_boxes__ function
- Pass the bounding boxes, original image dimensions, and expected input dimensions
- Adjust the bounding box coordinates to align with the original image shape

In [None]:
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)

## Step 12: Eliminating Redundant Bounding Boxes
- Invoke the do_nms function
- Provide the list of bounding boxes and a threshold of 0.5
- Filter out boxes based on Non-Maximum Suppression to retain the most relevant predictions

In [None]:
do_nms(boxes, 0.5)

## Step 13:  Establishing Object Labels for Prediction Results
- List all potential object names that the YOLO model can identify
- Assign this list to the variable named **labels**


In [None]:
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
    "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
    "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
    "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
    "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
    "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
    "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
    "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

## Step 14: Obtaining Details of Detected Objects
- Invoke the get_boxes function to filter and retrieve boxes, associated labels, and confidence scores based on the defined threshold
- Store these details in v_boxes, v_labels, and v_scores for visualization and interpretation

In [None]:
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)

## Step 15: Displaying Detected Objects and Confidence Scores
- Iterate over the list of detected objects
- Print the label and confidence score of each recognized object to summarize the findings

In [None]:
for i in range(len(v_boxes)):
    print(v_labels[i], v_scores[i])

In [None]:
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

**Observation**
- The code detects objects in the image, assigns them a confidence score, and labels them with the appropriate name.