<h1><center> Detection of Objects for Drone Data

# About the Data

We have drone dataset which has about <b>8000 images</b> and one annotation file which is in json format. In the annotation file we have class which has 5 attributes. The attributes are Head, Nut, Thread, Pin and washer. 

# Objective

The aim of this project is to detect the class attributes in our image dataset using the MaskRCNN. Initially we're spliting the image into two parts traing and validation. According to the split of the images, we're splitting the annotation into two json files. One of the annotation file is used for training the model. Based on the best produced model output, validation of the images using the other annotation file is done. 

# About MaskRCNN

Mask R-CNN is an object detection model based on deep convolutional neural networks (CNN)

The model can return both the bounding box and a mask for each detected object in an image.

The model was originally developed in Python using the Caffe2 deep learning library. The original source code is available on GitHub.

To support the Mask R-CNN model with more popular libraries, such as TensorFlow, there is a popular open-source project called Mask_RCNN that offers an implementation based on Keras and TensorFlow 1.14.

Mask RCNN is a deep neural network aimed to solve instance segmentation problem in machine learning or computer vision. 

In other words, it can separate different objects in a image or a video. You give it a image, it gives you the object bounding boxes, classes and masks.

When using Mask RCNN we could actually implement different layers in neural network to learn features with different scales, just like the anchors and ROIAlign, instead of treating layers as black box.


<h2>Platforms Used</h2>

Google Colab: The entire project was buit on google colab. We used the tensor flow gpu package in order get faster computation . Since google colab is easy to use and provides gpu for ease we chose google colab. 

via annotation: all the data images we annotated using via annotation tool. polygon was used to annote the classes.
there are total 6 classes including the backgroung. Nut,Thread,Pin, Washer ,Head are the 5 classes other than the background. all these 5 objects were annotated in via tool using the polygon.

In [None]:
from google.colab import drive 
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%cd /content/drive/My\ Drive/project/main/Mask_RCNN

/content/drive/My Drive/project/main/Mask_RCNN


In [None]:
!pip uninstall -y tensorflow
!pip install tensorflow-gpu==1.14
!pip install keras==2.2.4



# Load Libraries

In [None]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available:  1


In [None]:
import os
import sys
import json
import datetime
import numpy as np
import skimage.draw
import cv2
from mrcnn.visualize import display_instances
import matplotlib.pyplot as plt

ROOT_DIR = os.path.abspath("/content/drive/MyDrive/project")

sys.path.append(ROOT_DIR)  
from mrcnn.config import Config
from mrcnn import model as modellib, utils

COCO_WEIGHTS_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")


DEFAULT_LOGS_DIR = os.path.join(ROOT_DIR, "logs")

# Customizing the configuration
Default mask-RCNN has very different configuration. For example, in our data we have 5 classes so the number of classes is 1+5. 1 here is used for representing the background class. Similarly for all the other parameter we're changing the values according to our data. we reduced anhor scale by half since the object we are recognizing are small.



In [None]:
class CustomConfig(Config):
    NAME = "object"
    IMAGES_PER_GPU = 2
    NUM_CLASSES = 1 + 5  
    STEPS_PER_EPOCH = 5000
    DETECTION_MIN_CONFIDENCE = 0.7
    DETECTION_NMS_THRESHOLD = 0.4
    ROI_POSITIVE_RATIO = 0.6
    RPN_ANCHOR_SCALES = (16, 32, 64, 128, 256)
    RPN_NMS_THRESHOLD = 0.6
    RPN_TRAIN_ANCHORS_PER_IMAGE=512


# Customizing Dataset
Here we're loading the dataset and assigning the class attribute value, creating object and training our model according to the customised configuration. the original images are very huge. so we split the images using via annotation splitter later the dataset was devided manually into train and val in 3:2 ratio.



In [None]:
class CustomDataset(utils.Dataset):

    def load_custom(self, dataset_dir, subset):
        
        self.add_class("object", 1, "Nut")
        self.add_class("object", 2, "Pin")
        self.add_class("object", 3, "Thread")
        self.add_class("object", 4, "Washer")
        self.add_class("object", 5, "Head")

        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)

        
        annotations1 = json.load(open(os.path.join(dataset_dir, "newann.json")))
        annotations = list(annotations1.values())  
        annotations = [a for a in annotations if a['regions']]
        
        for a in annotations:
            
            polygons = [r['shape_attributes'] for r in a['regions']] 
            objects = [s['region_attributes']['Class'] for s in a['regions']]
            print("objects:",objects)
            name_dict = {"Nut": 1,"Pin": 2,"Thread": 3,"Washer":4,"Head":5}
            num_ids = [name_dict[a] for a in objects]
     
            
            print("numids",num_ids)
            image_path = os.path.join(dataset_dir, a['filename'])
            image = skimage.io.imread(image_path)
            height, width = image.shape[:2]

            self.add_image(
                "object",  
                image_id=a['filename'],  
                path=image_path,
                width=width, height=height,
                polygons=polygons,
                num_ids=num_ids)

    def load_mask(self, image_id):
        image_info = self.image_info[image_id]
        if image_info["source"] != "object":
            return super(self.__class__, self).load_mask(image_id)

        
        info = self.image_info[image_id]
        if info["source"] != "object":
            return super(self.__class__, self).load_mask(image_id)
        num_ids = info['num_ids']
        mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                        dtype=np.uint8)
        for i, p in enumerate(info["polygons"]):
        	rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])

        	mask[rr, cc, i] = 1

       
        num_ids = np.array(num_ids, dtype=np.int32)
        return mask, num_ids

    def image_reference(self, image_id):
        info = self.image_info[image_id]
        if info["source"] == "object":
            return info["path"]
        else:
            super(self.__class__, self).image_reference(image_id)


def train(model):
    dataset_train = CustomDataset()
    dataset_train.load_custom(dataset, "train")
    dataset_train.prepare()

    dataset_val = CustomDataset()
    dataset_val.load_custom(dataset, "val")
    dataset_val.prepare()

   
    print("Training network heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=3,
                layers='heads')

# Load Training Data 
Here we're loading the dataset, calling the customconfig and the utmost important thing which is to create model load_weights.

Here we used 5000 images per epoch to make to sure that images are properly trained. there are about 5000 images in the training set. so per epoch we are sending all the 5000 images . the batch size is equal to the tarining set images so with 3 epoches we can make sure that every image is visited 2-3 times atleast for better accuracy. in total we are training 15k images. we believe that with more epoches we get better results. but google colab dosent give much access for a free account. since we had limitations we stoped at 3 epoches. 

In [None]:

dataset="/content/drive/MyDrive/project/dataset"


config = CustomConfig()

model = modellib.MaskRCNN(mode="training", config=config,model_dir="/content/drive/MyDrive/project")
model.load_weights("/content/drive/MyDrive/project/mask_rcnn_coco.h5", by_name=True,exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])
train(model)



[1;30;43mStreaming output truncated to the last 5000 lines.[0m
objects: ['Head', 'Thread', 'Nut', 'Thread']
numids [5, 3, 1, 3]
objects: ['Nut', 'Nut', 'Nut', 'Thread', 'Nut', 'Nut', 'Nut', 'Thread', 'Nut', 'Nut', 'Nut', 'Nut', 'Nut', 'Nut', 'Nut', 'Thread', 'Nut', 'Nut', 'Thread', 'Nut', 'Thread', 'Thread', 'Thread', 'Thread', 'Nut', 'Nut', 'Thread', 'Nut', 'Thread']
numids [1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 3, 3, 3, 3, 1, 1, 3, 1, 3]
objects: ['Thread', 'Nut', 'Nut', 'Thread', 'Nut', 'Thread', 'Nut', 'Nut', 'Head', 'Nut', 'Thread', 'Nut', 'Thread', 'Nut', 'Head', 'Nut', 'Nut', 'Head', 'Nut', 'Nut', 'Thread', 'Nut']
numids [3, 1, 1, 3, 1, 3, 1, 1, 5, 1, 3, 1, 3, 1, 5, 1, 1, 5, 1, 1, 3, 1]
objects: ['Nut']
numids [1]
objects: ['Head', 'Thread', 'Nut', 'Thread']
numids [5, 3, 1, 3]
objects: ['Nut', 'Nut', 'Nut', 'Nut', 'Nut', 'Nut', 'Thread']
numids [1, 1, 1, 1, 1, 1, 3]
objects: ['Thread', 'Nut']
numids [3, 1]
objects: ['Nut', 'Nut', 'Washer', 'Thread', 'Was

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/3
Epoch 2/3
Epoch 3/3


below is the conifiguration we used for training the model 

In [None]:
config.display()


Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                18
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE         