<a href="https://colab.research.google.com/github/umeshrawat/AI_Math_Vedas/blob/master/CV_2_P_2_Faster_R_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

IMPORTANT: It is highly recommended that this notebook be run on Google Colab itself as it contains CUDA usage, which may not be supported in the local environment of some computers.

# Problem to be solved
In this notebook, our goal is to **add artificial intelligence into a traffic system**. This traffic system has to monitor the vehicles that are passing through the intersection where it is deployed. Specifically, it has to find all vehicles, and also special vehicles like ambulances, buses, cars, motorcycles and trucks, and understand the vehicle characteristics (size, color, license plate etc). This system has a camera on a motor that rotates slowly, and can take images of the roads from different viewpoints. Some sample images from the system are shown below
![](https://drive.google.com/file/d/1a31C5GzvbXyVsPLO9E6pdeeRnO5HN341/view?usp=share_link)
![](https://drive.google.com/file/d/1eVjslBgj9j1l55KVJCcZxa-ADm35Di7Q/view?usp=share_link)

<figure>
<table>
<tr>
<td>
<img src='https://drive.google.com/uc?export=view&id=1a31C5GzvbXyVsPLO9E6pdeeRnO5HN341' />
</td>
<td>
<img src='https://drive.google.com/uc?export=view&id=1eVjslBgj9j1l55KVJCcZxa-ADm35Di7Q' />
</td>
</tr>
</table>
</figure>

As an ML engineer, you are asked to find all the vehicles, their category and size in the images from the traffic system.

# Our Solution
The solution we will adopt for this problem is to train an object detection model which can predict the different vehicle categories. We will be training the Faster RCNN and YOLO models we learned in the Computer Vision 2 lecture in this exercise.



# Dataset used
To train the object detection models, we need to either collect images with bounding boxes using human annotators or use publicly available datasets. In this exercise, we will be following the latter approach. We use the Open Images Vehicles datasets. Luckily this dataset contains all the categories we are interested in. This dataset is available as a zip file `Vehicles-OpenImages.v1-416x416.coco.zip`.



# Requirements
Before trying this notebook, ensure that the following requirements are met.
1. Copy the dataset file `Vehicles-OpenImages.v1-416x416.coco.zip` to your Google Drive at the path `CV-2/assets/P3`

# Preparing the Data

## Mounting the dataset directory
This code is required for all notebooks. We assume that the datasets are in Google Drive, and mount the directory containing dataset into a local directory. Hence we can continue reading the data as though it was present in the machine running the colab.

In [None]:
import os

from google.colab import drive
drive.mount('/content/drive', force_remount=True)
assets_dir = '/content/drive/MyDrive/CV-2/assets/P3/'

## Importing common python packages

In [None]:
!pip install -q torch_snippets
from torch_snippets import Glob, Report
import torch
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from torchvision import models
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import numpy as np
device = 'cuda' if torch.cuda.is_available() else 'cpu'

## Unzipping the dataset

We unzip the dataset and generate labels for each image.

In [None]:
source_path = assets_dir + "Vehicles-OpenImages.v1-416x416.coco.zip"
destination_dir = "/content/Vehicles/"
!unzip "{source_path}" -d {destination_dir}

## Visualizing the annotations

In [None]:
import json

with open('/content/Vehicles/train/_annotations.coco.json',) as f:
    annotations = json.load(f)

## Load label info

In [None]:
targets2label = {}
for cat in annotations['categories']:
    targets2label[cat['id']] = cat['name']

label2targets = {target: label for label, target in targets2label.items()}
num_classes = len(targets2label)

## Populate image tensors
In the below code snippet, the image is first converted to a PyTorch tensor using torch.tensor, and then its dimensions are permuted using permute(2, 0, 1). This operation changes the order of the dimensions, so the original shape of (height, width, channels) is changed to (channels, height, width).

In [None]:
import numpy as np
import pandas as pd
from tqdm import tqdm

from matplotlib import image
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import cv2

import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms

In [None]:
def preprocess_img(img):
    img = torch.tensor(img).permute(2, 0 ,1)
    return img.to(device).float()

### PyTorch Dataset

Datasets are the collections of your training, validation, and test data. They consist of input samples and their corresponding target labels (for supervised learning). In PyTorch, datasets are typically created using custom classes inheriting from `torch.utils.data.Dataset`. You load your data into this class, allowing easy access during training. PyTorch provides built-in datasets like MNIST, CIFAR-10, and ImageNet, but custom datasets can also be created to work with specific data.

### Creating a custom dataset class

Creating a custom dataset class in PyTorch offers significant advantages in data management and flexibility for various machine learning tasks. By defining a custom dataset class, researchers and developers can handle diverse data formats, pre-processing steps, and data augmentations in a unified and organized manner. This ensures that data loading, transformation, and augmentation are seamlessly integrated into the model training pipeline.

Custom dataset classes can be used with PyTorch's DataLoader, which allows batch processing, parallel data loading, and shuffling, optimizing the data loading process for improved training efficiency. Additionally, custom dataset classes facilitate the incorporation of specific data splits (e.g., train, validation, test) and enable seamless integration with various PyTorch models.

Moreover, custom dataset classes are especially beneficial when working with datasets that may not adhere to conventional formats, such as medical images, audio data, or custom annotation formats. By implementing a custom dataset class, developers can tailor data loading and pre-processing to suit their specific needs and ensure compatibility with the chosen model architecture.

Overall, the creation of a custom dataset class empowers researchers and developers to efficiently handle complex data, customize data transformations, and seamlessly integrate their datasets with PyTorch's data handling utilities, thereby streamlining the model training process and enhancing the overall performance of machine learning models.

Below we have a custom dataset class to handle the Vehicles Dataset we have unzipped above.

In [None]:
class VehicleDetectionDataset(torch.utils.data.Dataset):
    def __init__(self, images_path, std=False):
        super(VehicleDetectionDataset, self).__init__()
        self.images_path = Glob(images_path+"*jpg")
        self.std = std
        with open(images_path+'_annotations.coco.json',) as f:
            self.annotations = json.load(f)

    def __len__(self):
        return len(self.images_path)

    def __getitem__(self, idx):
        # First get the path, filename and id of the idx th image.
        file_path = str(self.images_path[idx])
        file_name = file_path.split("/")[4]
        img_id = None
        for img_name in self.annotations['images']:
            if img_name['file_name'] == file_name:
                img_id = img_name['id']

        # Load the box information of the idx th image.
        bbox, areas, iscrowd, labels = [], [], [], []
        for box in self.annotations['annotations']:
            if box['image_id'] == img_id:
                bbox.append(box['bbox'])
                areas.append(box['area'])
                iscrowd.append(box['iscrowd'])
                labels.append(box['category_id'])

        # Load the image.
        img = cv2.imread(str(file_path), cv2.IMREAD_UNCHANGED)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32)

        # Convert images and box to np arrays and then to Torch tensors.
        bbox = np.array(bbox)
        areas = np.array(areas)
        iscrowd = np.array(iscrowd)
        labels = np.array(labels)
        bbox[:, 2] = bbox[:, 0] + bbox[:, 2]
        bbox[:, 3] = bbox[:, 1] + bbox[:, 3]
        bbox = torch.from_numpy(bbox)
        bbox = torch.as_tensor(bbox, dtype=torch.int64)
        if self.std:
            img = img/255.0
        target = {}
        labels = torch.tensor(labels, dtype=torch.int64)
        iscrowd = torch.tensor(iscrowd, dtype=torch.int64)
        image_id = torch.tensor([idx])
        areas = torch.as_tensor(areas.astype(np.float), dtype=torch.double)

        target["boxes"] = bbox
        target["labels"] = labels
        target["area"] = areas
        target["iscrowd"] = iscrowd
        target["image_id"] = image_id

        img = np.array(img)
        if img.shape[0] != 3:
            img = img.reshape(3, img.shape[0], img.shape[1])

        img = torch.from_numpy(img)
        img = torch.as_tensor(img, dtype=torch.double)

        return img, target

In [None]:
dataset = VehicleDetectionDataset("/content/Vehicles/train/")

# Visualize the 300th sample.
img, target = dataset.__getitem__(300)

img = img.reshape(img.shape[1], img.shape[2], 3)
img = torch.as_tensor(img, dtype=torch.int)
print(img.shape, target["boxes"], target["labels"])

fig, ax = plt.subplots(figsize=(16,8))
ax.imshow(img)
for lab in target['boxes']:
    rect = patches.Rectangle((lab[0], lab[1]), lab[2]-lab[0], lab[3]-lab[1],
                             linewidth=1, edgecolor='r', facecolor='none')
    ax.annotate(targets2label[int(target["labels"][0])],(lab[0], lab[1]),
                color='red', fontsize=15,backgroundcolor="w")

    ax.add_patch(rect)

plt.show()

### Creating the Torch Dataset and Dataloader.

Dataloaders, are utilities that enable efficient data loading and batching. They take a dataset as input and allow users to define batch sizes, shuffle the data, and apply transformations to the samples. Dataloaders are especially useful when dealing with large datasets, as they enable the model to process data in small batches, reducing memory requirements and speeding up training. They are key components in PyTorch that facilitate data handling and preparation for machine learning tasks.



In [None]:
def collate_fn(batch):
    return tuple(zip(*batch))

train_dataset = VehicleDetectionDataset("/content/Vehicles/train/", std=True)
val_dataset = VehicleDetectionDataset("/content/Vehicles/valid/", std=True)
test_dataset = VehicleDetectionDataset("/content/Vehicles/test/", std=True)

train_dataloader = torch.utils.data.DataLoader(
        train_dataset, batch_size=4, shuffle=True, num_workers=0,
        collate_fn = collate_fn)

val_dataloader = torch.utils.data.DataLoader(
        val_dataset, batch_size=1, shuffle=False, num_workers=0,
        collate_fn = collate_fn)

test_dataloader = torch.utils.data.DataLoader(
        test_dataset, batch_size=1, shuffle=False, num_workers=0,
        collate_fn = collate_fn)

# Defining the Faster RCNNModel
We will not be implementing the Faster RCNN model from scratch. Instead, we will load the architecture definition provided by PyTorch in the torchvision library.

In [None]:
def get_model():
    """
    Returns a pre-trained Faster R-CNN model based on ResNet-50 backbone with a modified box predictor for the specified number of classes.

    Returns:
    torch.nn.Module: A pre-trained Faster R-CNN model based on ResNet-50 backbone with the specified number of classes.
    """
    model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True) # Load a pre-trained Faster R-CNN model with ResNet-50 backbone

    # Remove the existing box_predictor since we have a different number of categories.
    # To do this, first get the input to the box predictor, and define our own box_predictor
    # processing that input.
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes+1)
    return model

In [None]:
# Test the model
imgs, targets = next(iter(train_dataloader))

imgs = list(img.to(device).float() for img in imgs)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

model = get_model().to(device).float()
model.eval()
model(imgs, targets)

In [None]:
# Define the model and optimizer
faster_rcnn_model = get_model().to(device) # Get the pre-trained Faster R-CNN model and move it to the specified device

### Optimizer

Optimizers are algorithms that adjust the model's parameters during training to minimize the loss function. Common optimizers include SGD (Stochastic Gradient Descent), Adam, and RMSprop.

In [None]:
faster_rcnn_optimizer = torch.optim.SGD(faster_rcnn_model.parameters(), lr=0.005, weight_decay=5e-4, momentum=0.9) # Define the optimizer with SGD

### Scheduler

A scheduler adjusts the learning rate dynamically during training, allowing fine-tuning.

Cosine Annealing: The learning rate starts high and is annealed down to a minimum value following a cosine curve. It helps the model explore the search space broadly at the beginning of training and then refine the search space as it converges.

T_max: This parameter defines the total number of iterations it takes to complete one cycle of the cosine function. The learning rate will follow a cosine curve for the first T_max iterations and then restart the cycle.

Here's a conceptual explanation:

At the start of training, the learning rate is relatively high, allowing the model to explore a larger area of the loss landscape.
As training progresses (over the T_max iterations), the learning rate decreases following a cosine curve.
When T_max iterations are completed, the learning rate is at its minimum.
The scheduler then restarts the cosine curve, and the learning rate starts to increase again, allowing the model to explore broadly for the next cycle.
This approach often helps models converge more efficiently by first exploring broadly and then refining their parameters as training progresses.

In [None]:
faster_rcnn_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(faster_rcnn_optimizer, T_max = 200)

### Train and evaluation methods

The train_batch and validate_batch methods are used to train each batch of inputs from the training and validation dataloaders respectively.

In [None]:
def train_batch(epoch, model, optim, log):
    """
    Train the model on a single epoch.

    Parameters:
        batch (tuple): A tuple containing the input images and their corresponding target annotations.
        model (torch.nn.Module): The neural network model to be trained.
        optim (torch.optim.Optimizer): The optimizer used for updating the model's parameters.
        log: The log object used to generate the Report for the training and validation phase
    """
    print("epoch ", epoch)
    model.train()
    for i, batch in enumerate(train_dataloader):
        N = len(train_dataloader) # Number of batches in the training dataloader
        imgs, targets = batch # Unpack the input images and their corresponding target annotations from the batch
        imgs = list(img.to(device).float() for img in imgs) # Move the input images to the specified device
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets] # Move the target annotations to the specified device
        optim.zero_grad() # Zero the gradients in the optimizer
        losses = model(imgs, targets) # Forward pass: Get the output losses from the model
        loss = sum(loss for loss in losses.values()) # Compute the total loss by summing individual losses
        loss.backward() # Backward pass: Compute gradients of the total loss with respect to model parameters
        optim.step() # Update the model parameters using the optimizer
        # Extract individual losses from the total loss for logging
        classifier_loss, regression_loss, loss_objectness, loss_rpn_box_reg = [losses[k] for k in
                                                                  ['loss_classifier', 'loss_box_reg', 'loss_objectness', 'loss_rpn_box_reg']]
        # Record the training loss and individual losses to the log
        log.record(e + (i+1)/N, train_loss=loss.item(), train_classifier_loss=classifier_loss.item(),
                   train_regression_loss=regression_loss.item(), train_loss_objectness=loss_objectness.item(),
                   train_loss_rpn_box_reg = loss_rpn_box_reg.item())

In [None]:
@torch.no_grad()
def validate_batch(epoch, model, log):
    """
    Validate the model on a single batch of data without performing gradient updates.

    Parameters:
        batch (tuple): A tuple containing the input images and their corresponding target annotations.
        model (torch.nn.Module): The neural network model to be validated.
        log: The log object used to generate the Report for the training and validation phase
    """
    model.train()
    for i, batch in enumerate(val_dataloader):
        N = len(val_dataloader)
        imgs, targets = batch
        imgs = list(img.to(device).float() for img in imgs)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
        losses = model(imgs, targets)

        loss = sum(loss for loss in losses.values())
        # Extract individual losses from the total loss for logging
        classifier_loss, regression_loss, loss_objectness, loss_rpn_box_reg = [losses[k] for k in
                                                                  ['loss_classifier', 'loss_box_reg',
                                                                   'loss_objectness', 'loss_rpn_box_reg']]
        # Record the validation loss and individual losses to the log
        log.record(e + (i+1)/N, val_loss=loss.item(), val_classifier_loss=classifier_loss.item(),
                   val_regression_loss=regression_loss.item(), val_loss_objectness=loss_objectness.item(),
                   val_loss_rpn_box_reg = loss_rpn_box_reg.item())


### Training the Model

In Fast R-CNN object detection, the model typically computes and returns four losses:

Classification Loss: This loss measures the accuracy of the predicted class labels for each object region. It is usually computed using a classification loss function such as cross-entropy loss.

Localization Loss: This loss measures the accuracy of the predicted bounding box coordinates for each object region. It is commonly calculated using a regression loss function such as smooth L1 loss.

Objectness Loss: This loss is specific to region proposal networks (RPN) used in the Faster R-CNN framework. It measures the accuracy of the predicted objectness scores, which indicate the likelihood of a region containing an object.

RPN Localization Loss: This loss is also specific to the Faster R-CNN framework and is applicable only for the region proposal network (RPN). It measures the accuracy of the predicted bounding box coordinates for the proposed regions.

These losses are combined and optimized during training to guide the model in learning accurate object detection and localization.

Note: The below code will take a long time to run on the basic GPU and so in order to save time you can load the model weights and play around with it.

In [None]:
n_epochs = 1
log = Report(n_epochs) # Create a Report object to store and visualize training progress
for e in range(n_epochs):
    # Training phase
    train_batch(epoch=e, model=faster_rcnn_model.float(), optim=faster_rcnn_optimizer, log=log) # Train the model on a single epoch
    # Validation phase
    validate_batch(epoch=e, model=faster_rcnn_model.float(), log=log)
    faster_rcnn_scheduler.step()
    log.report_avgs(e+1) # Report the average losses for the current epoch

## Computing validation loss on a previously trained model

Since the training takes a long time, we will load a previously trained model and do evaluation.

In [None]:

file_path = assets_dir +'faster_rcnn_model_v2.pt'

loaded_model = models.detection.fasterrcnn_resnet50_fpn(pretrained=False, num_classes=num_classes)
# Load the saved state dictionary
saved_state_dict = torch.load(file_path)

n_epochs = 2
log = Report(n_epochs) # Create a Report object to store and visualize training progress
for e in range(n_epochs):
    # Validation phase
    validate_batch(epoch=e, model=faster_rcnn_model.float(), log=log)
    log.report_avgs(e+1)

# Plot the training and validation loss curves
log.plot_epochs(['val_loss'])

## Saving a trained model

In [None]:
model_save_path = assets_dir + "faster_rcnn_model_v2.pt"
torch.save(faster_rcnn_model.state_dict(), model_save_path)

### Validating the model

After training the model we can run the validation data on this and see how it performs. We are taking the first 5 images. You can play around with the number and see the results.

In [None]:
from torchvision.ops import nms
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

In [None]:
def sample_prediction(model, idx):
    model.eval()
    cpu = torch.device("cpu")
    model.to(cpu)
    sample_image = test_dataset.__getitem__(idx)[0]
    groundtruth_boxes = test_dataset.__getitem__(idx)[1]["boxes"]
    sample_image = torch.unsqueeze(sample_image, 0)
    sample_image.to(cpu)
    outputs = model(sample_image)
    outputs = [{k: v for k, v in t.items()} for t in outputs]
    boxes = outputs[0]["boxes"].detach().numpy()
    scores = outputs[0]["scores"].detach().numpy()
    labels = outputs[0]["labels"].detach().numpy()
    sample_image = sample_image.reshape(sample_image.shape[2],sample_image.shape[3], 3)
    return sample_image, boxes, labels, scores, groundtruth_boxes



In [None]:
def nms():
    iou = 0.99
    iou_threshold = 1 - iou
    NMS = torchvision.ops.nms(torch.tensor(boxes), torch.tensor(scores), iou_threshold)
    best_boxes, best_scores, best_labels = [], [], []
    for nms in NMS:
        best_boxes.append(boxes[nms])
        best_scores.append(scores[nms])
        best_labels.append(labels[nms])
    return best_boxes, best_scores, best_labels

In [None]:
loaded_model.eval()
loaded_model.double()
for i in range(10):
    sample_image, boxes, labels, scores, groundtruth_boxes = sample_prediction(loaded_model, i)
    best_boxes, best_scores, best_labels = nms()
    fig, ax = plt.subplots(figsize=(12,18))
    ax.imshow(sample_image)
    for box, score, label in zip(best_boxes, best_scores, best_labels):
        rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1],
                                linewidth=1, edgecolor='r', facecolor='none')
        ann_text = targets2label[int(label)]+" "+str(score)[:4]
        ax.annotate(ann_text,(box[0], box[1]), color='red', fontsize=15,backgroundcolor="w")
        ax.add_patch(rect)
    plt.show()