# Object Detection with SSD MobileNet V2 on Pascal VOC 2007

In this notebook, we will adapt an image classification task to an object detection task using a subset of the Pascal VOC 2007 dataset. We will use a pre-trained SSD MobileNet V2 model for this task.

## Setup and Installation

First, we need to install the necessary libraries.


!pip install tensorflow tensorflow-hub tensorflow-datasets matplotlib



## Import Libraries

Let's import the libraries we will use.


In [1]:
!pip install tensorflow tensorflow-hub tensorflow-datasets matplotlib
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches



## Load and Explore the Pascal VOC 2007 Dataset

We will load a subset of the dataset and explore some examples.


In [None]:
def load_data(split='train'):
    dataset, info = tfds.load('voc/2007', split=split, shuffle_files=True, with_info=True)
    return dataset, info

# Load a small subset of the training and validation datasets
train_dataset, train_info = load_data('train[:10%]')
validation_dataset, validation_info = load_data('validation[:10%]')

# Get class names
class_names = train_info.features["objects"]["label"].names
print("Class names:", class_names)


Downloading and preparing dataset 868.85 MiB (download: 868.85 MiB, generated: Unknown size, total: 868.85 MiB) to /root/tensorflow_datasets/voc/2007/4.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

## Display Example Images with Ground Truth Bounding Boxes

Let's visualize some examples from the dataset with their ground truth bounding boxes.

In [None]:
def display_examples(dataset, n=3):
    for example in dataset.take(n):
        image = example["image"]
        plt.figure(figsize=(5, 5))
        plt.imshow(image)
        plt.title("Image with Ground Truth Bounding Boxes")

        # Draw ground truth boxes
        for box in example["objects"]["bbox"]:
            ymin, xmin, ymax, xmax = box
            rect = patches.Rectangle((xmin * image.shape[1], ymin * image.shape[0]),
                                    (xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0],
                                    linewidth=1, edgecolor='g', facecolor='none')
            plt.gca().add_patch(rect)

        plt.show()

display_examples(train_dataset)


## Load the Pre-trained SSD MobileNet V2 Model

We will load a pre-trained SSD MobileNet V2 model from TensorFlow Hub.


In [None]:
detector = hub.load("https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2")

## Run Object Detection and Visualize Results

We will run the object detection model on the images and visualize the results along with the ground truth bounding boxes.

In [None]:
def run_detector_and_visualize(example):
    image = example["image"]
    ground_truth_boxes = example["objects"]["bbox"]

    # Preprocess and run detection
    converted_img = tf.image.convert_image_dtype(image, tf.uint8)[tf.newaxis, ...]
    result = detector(converted_img)
    result = {key: value.numpy() for key, value in result.items()}

    # Visualize results
    plt.figure(figsize=(10, 7))
    plt.imshow(image)

    # Ground truth boxes
    for box in ground_truth_boxes:
        ymin, xmin, ymax, xmax = box
        rect = patches.Rectangle((xmin * image.shape[1], ymin * image.shape[0]),
                                (xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0],
                                linewidth=1, edgecolor='g', facecolor='none', label='Ground Truth')
        plt.gca().add_patch(rect)

    # Predicted boxes
    for i, score in enumerate(result['detection_scores'][0]):
        if score > 0.5:  # Confidence threshold
            ymin, xmin, ymax, xmax = result['detection_boxes'][0][i]
            class_id = int(result['detection_classes'][0][i])

            if class_id < len(class_names):
                label = class_names[class_id]

            rect = patches.Rectangle((xmin * image.shape[1], ymin * image.shape[0]),
                                    (xmax - xmin) * image.shape[1], (ymax - ymin) * image.shape[0],
                                    linewidth=1, edgecolor='r', facecolor='none', label='Predicted')
            plt.gca().add_patch(rect)
            plt.text(xmin * image.shape[1], ymin * image.shape[0] - 5, f'{label}: {score:.2f}', color='white', backgroundcolor='r')

    plt.legend()
    plt.show()

# Run and visualize detection on a few images
for example in train_dataset.take(2):  # Process 2 images
    run_detector_and_visualize(example)


## Evaluate Model Performance

We will evaluate the model performance by calculating metrics such as precision and recall.

In [None]:
def calculate_iou(box1, box2):
    ymin1, xmin1, ymax1, xmax1 = box1
    ymin2, xmin2, ymax2, xmax2 = box2

    # Calculate intersection area
    intersect_ymin = max(ymin1, ymin2)
    intersect_xmin = max(xmin1, xmin2)
    intersect_ymax = min(ymax1, ymax2)
    intersect_xmax = min(xmax1, xmax2)
    intersect_area = max(0, intersect_ymax - intersect_ymin) * max(0, intersect_xmax - intersect_xmin)

    # Calculate union area
    box1_area = (ymax1 - ymin1) * (xmax1 - xmin1)
    box2_area = (ymax2 - ymin2) * (xmax2 - xmin2)
    union_area = box1_area + box2_area - intersect_area

    return intersect_area / union_area

def evaluate_model_performance(dataset, detector, iou_threshold=0.5, num_samples=100):
    true_positives = 0
    false_positives = 0
    false_negatives = 0

    for example in dataset.take(num_samples):
        image = example["image"].numpy()
        gt_boxes = example["objects"]["bbox"].numpy()
        gt_labels = example["objects"]["label"].numpy()

        converted_img = tf.image.convert_image_dtype(image, tf.uint8)[tf.newaxis, ...]
        result = detector(converted_img)
        result = {key: value.numpy() for key, value in result.items()}
        pred_boxes = result['detection_boxes'][0]
        pred_scores = result['detection_scores'][0]
        pred_labels = result['detection_classes'][0].astype(int)

        for i, score in enumerate(pred_scores):
            if score < 0.5:  # Confidence threshold
                continue

            pred_box = pred_boxes[i]
            pred_box = [pred_box[1], pred_box[0], pred_box[3], pred_box[2]]

            best_iou = 0
            for j, gt_box in enumerate(gt_boxes):
                iou = calculate_iou(gt_box, pred_box)
                if iou > best_iou:
                    best_iou = iou
                    gt_index = j

            if best_iou > iou_threshold:
                if pred_labels[i] == gt_labels[gt_index]:
                    true_positives += 1
                else:
                    false_positives += 1
            else:
                false_positives += 1

        false_negatives += len(gt_boxes) - true_positives

    precision = true_positives / (true_positives + false_positives) if true_positives + false_positives > 0 else 0
    recall = true_positives / (true_positives + false_negatives) if true_positives + false_negatives > 0 else 0

    print(f"Model Performance (IoU Threshold = {iou_threshold:.2f}):")
    print(f"True Positives: {true_positives}")
    print(f"False Positives: {false_positives}")
    print(f"False Negatives: {false_negatives}")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")

# Evaluate model performance on the training dataset
evaluate_model_performance(train_dataset, detector)


## Conclusion

In this notebook, we demonstrated how to adapt an image classification task to an object detection task using a pre-trained SSD MobileNet V2 model. We visualized example images with ground truth and predicted bounding boxes, and evaluated the model's performance using precision and recall metrics.




Difference between image classification and object detection?
Image Classification: Labels the whole image with one label.
Object Detection: Finds and labels multiple objects in an image with bounding boxes.

**Why choose SSD MobileNet V2?**
Advantages: Fast and uses less power.
Limitations: May not be as accurate as more complex models.

Code Interpretation
**Role of find_images_with_classes function?**
It helps find images that have specific classes from a large dataset.

**Effect of threshold in plot_detections function?**
A higher threshold shows fewer objects; a lower threshold shows more objects.

**Purpose of heatmap visualization?**
It shows how confident the model is about its detections with color intensity.
Observing Results and Limitations

**Which objects does the model detect well?**
The model detects larger and clearer objects better. Smaller or hidden ones are harder.

Ar**e there inaccuracies in bounding boxes?**
Yes, sometimes the boxes miss or misplace objects due to overlap or other issues.

**Accuracy with the full Pascal VOC 2007 dataset?**
It would likely be more accurate with more data to learn from.

**Critical Thinking**
**Detect specific objects like animals or vehicles?**
Filter the results to only show detections of those specific classes.

**Steps to train your own object detection model?**
-Collect data, choose a model, train it, evaluate, and adjust as needed.

**Real-world uses for this model despite limitations?**
-Useful in situations needing quick and efficient detection with less computational power.
