# Introduction

Model created in this test task uses Mask R-CNN API from <a href=https://github.com/matterport/Mask_RCNN>here</a> and transfer learning to detect safety cones in given images.

Images were extracted from the videos using VLC media player due to issues with OpenCV for Python and labeled using labelImg from <a href=https://github.com/tzutalin/labelImg>here</a>.

Annotations created using labelImg were saved to an annotations folder in this notebooks directory.

# Part 1 - Dataset preparation
All folders and libraries were set up before running code in this notebook. 

Required libraries for this notebook are available in <a href=https://github.com/witcher346/Object-Detection>this</a> repository, in requirements.txt

## Part 1.1 - Import libraries and split images to train/test split
### Import libraries for Mask R-CNN API preparation
Also Mask R-CNN and COCO weights must be either git cloned from <a href=https://github.com/matterport/Mask_RCNN>here</a> or downloaded and unzipped into this notebooks folder for the code to run

In [7]:
from os import listdir, makedirs
from shutil import copy
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from random import shuffle
from Mask_RCNN.mrcnn.utils import Dataset
from Mask_RCNN.mrcnn.utils import extract_bboxes
from mrcnn.visualize import display_instances
from Mask_RCNN.mrcnn.config import Config
from Mask_RCNN.mrcnn.model import MaskRCNN

### Method to split the images into train/test sets

In [3]:
def train_test_split(images_path='images', 
                     train_set_dst='images/train', 
                     test_set_dst='images/test', 
                     train_set_size=22):
    try:
        # try to create these directories in case they are not created
        makedirs(f'{images_path}/train') 
        makedirs(f'{images_path}/test')
        
    except FileExistsError:
        print('Directories already exist. Trying to split images.')
        
    finally:
        # only include images from images folder
        images = [img for img in listdir(images_path) if '.jpg' in img] 
        # randomly shuffle the images
        shuffle(images) 
        
        # create train/test sets from random shuffle
        train_set = images[:train_set_size] 
        test_set = images[train_set_size:]
        
        # distribute the images
        for image in train_set:
            copy(f'{images_path}\\{image}', train_set_dst) 
        for annotation in test_set:
            copy(f'{images_path}\\{image}', test_set_dst)
        
    print('Split successful.')

## Part 1.2 - Dataset class interpretation
### Use inheritance to redefine Dataset class from Mask R-CNN API

In [None]:
# class that defines and loads the safety cones dataset
class SafetyCones(Dataset):
    ROOT = 'C:/Users/User/Jupyter Notebooks/Object Detection/'
    
    # load the dataset definitions
    def load_dataset(self, is_train=True):
        # create train/test split
        train_test_split()
        # define one class
        self.add_class("dataset", 1, "safety_cone")
        # define data locations depending on is_train flag
        images_dir = 'images/'
        if is_train:
            images_dir += 'train/'
        else:
            images_dir += 'test/'
            
        annotations_dir = 'annotations/'
        # find all images
        for filename in listdir(images_dir):
            # extract image id
            if not filename.endswith('.xml'):
                image_id = filename[:-4]
                img_path = images_dir + filename
                # create annotation path
                ann_path = annotations_dir + image_id + '.xml'
                # add to dataset
                self.add_image('dataset', image_id=image_id, 
                               path=self.ROOT+img_path, 
                               annotation=self.ROOT+ann_path)
 
    # extract bounding boxes from an annotation file
    def extract_boxes(self, filename):
        # load and parse the file
        tree = ElementTree.parse(filename)
        # get the root of the document
        root = tree.getroot()
        # extract each bounding box coordinates on an image
        boxes = []
        for box in root.findall('.//bndbox'):
            xmin = int(box.find('xmin').text)
            ymin = int(box.find('ymin').text)
            xmax = int(box.find('xmax').text)
            ymax = int(box.find('ymax').text)
            coors = [xmin, ymin, xmax, ymax]
            boxes.append(coors)
        # extract image dimensions
        width = int(root.find('.//size/width').text)
        height = int(root.find('.//size/height').text)
        return boxes, width, height
 
    # load the masks for an image
    def load_mask(self, image_id):
        # get details of image
        info = self.image_info[image_id]
        # define box file location
        path = info['annotation']
        # load XML
        boxes, w, h = self.extract_boxes(path)
        # create one array for all masks, each on a different channel
        masks = zeros([h, w, len(boxes)], dtype='uint8')
        # create masks
        class_ids = []
        for i in range(len(boxes)):
            box = boxes[i]
            row_s, row_e = box[1], box[3]
            col_s, col_e = box[0], box[2]
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('safety_cone'))
        return masks, asarray(class_ids, dtype='int32')
 
    # load an image reference
    def image_reference(self, image_id):
        info = self.image_info[image_id]
        return info['path']

# define a configuration for the model
class SafetyConeConfig(Config):
    # Give the configuration a recognizable name
    NAME = "cone_cfg"
    # Number of classes (background + cone)
    NUM_CLASSES = 1 + 1
    # Number of training steps per epoch
    STEPS_PER_EPOCH = 100

### Load the train/test set

In [None]:
train_set = SafetyCones()
train_set.load_dataset(is_train=True)
train_set.prepare()

In [None]:
test_set = SafetyCones()
test_set.load_dataset(is_train=False)
test_set.prepare()

### Display the image with masks and boxes

In [None]:
# define image id
image_id = 1
# load the image
image = train_set.load_image(image_id)
# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
# extract bounding boxes from the masks
bbox = extract_bboxes(mask)
# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

### Train the model for 3 epochs 100 steps each

In [None]:
# prepare config
config = SafetyConeConfig()
# define the model
model = MaskRCNN(mode='training', model_dir='./', config=config)
# load weights and exclude the output layers
model.load_weights('mask_rcnn_coco.h5', by_name=True, 
                   exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
# train weights (output layers or 'heads')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=3, layers='heads')

## Part 2 - Evaluating the model
For model evaluation mean average precision (mAP) will be used. It shows the average precision (AP) of a model across all of the images in a dataset. 

The mask-rcnn library provides a mrcnn.utils.compute_ap to calculate the AP.

In [None]:
from numpy import expand_dims
from numpy import mean
from Mask_RCNN.mrcnn.utils import compute_ap
from Mask_RCNN.mrcnn.model import load_image_gt
from Mask_RCNN.mrcnn.model import mold_image

In [None]:
# calculate the mAP for a model on a given dataset
def evaluate_model(dataset, model, cfg):
    APs = []
    print('Starting evaluation...')
    for image_id in dataset.image_ids:
        # load image, bounding boxes and masks for the image id
        image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
        # convert pixel values (e.g. center)
        scaled_image = mold_image(image, cfg)
        # convert image into one sample
        sample = expand_dims(scaled_image, 0)
        # make prediction
        yhat = model.detect(sample, verbose=0)
        # extract results for first sample
        r = yhat[0]
        # calculate statistics, including AP
        AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
        # store
        APs.append(AP)
    # calculate the mean AP across all images
    mAP = mean(APs)
    return mAP

### Config is changed so that model could predict each image individually

In [None]:
# simplify GPU config
config.GPU_COUNT = 1
config.IMAGES_PER_GPU = 1
config.BATCH_SIZE = 1
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=config)
# load trained model weights
model.load_weights('cone_cfg20200710T1702/mask_rcnn_cone_cfg_0003.h5', by_name=True)

**The code snippet below should return the following results**
- Train mAP: 0.943
- Test mAP: 0.970

In [None]:
train_mAP = evaluate_model(train_set, model, config)
print("Train mAP: %.3f" % train_mAP)
# evaluate model on test dataset
test_mAP = evaluate_model(test_set, model, config)
print("Test mAP: %.3f" % test_mAP)

## Part 3 - Predicting new images
Each image in test-imgs folder represents 1 frame of a 50 seconds clip provided for demonstration purposes

Here new images are loaded, predicted and saved to predictions folder to be compiled into a video

In [8]:
def prediction(_dir, model, cfg):
    # load images
    ROOT = 'C:/Users/User/Jupyter Notebooks/Object Detection/'
    for img in os.listdir(ROOT+_dir):
        # load the image via skimage and PIL
        time_spent = time.time()
        path_to_img = ROOT+_dir+img
        image = skimage.io.imread(path_to_img)
        # open the same image in PIL to draw a boundry box if the model predicts one
        tmp_image = Image.open(path_to_img)
        # convert pixel values (e.g. center)
        scaled_image = mold_image(image, cfg)
        # convert image into one sample
        sample = expand_dims(scaled_image, 0)
        # make prediction
        yhat = model.detect(sample, verbose=0)[0]
        # draw a rectangle
        draw = ImageDraw.Draw(tmp_image)
        for box in yhat['rois']:
            y1, x1, y2, x2 = box
            draw.rectangle(((x1, y1), (x2, y2)), outline='red')
        # save the image with box
        tmp_image.save(ROOT+'predictions/'+img)
        print(f'Predicted {img} in {round(time.time() - time_spent)}s and saved it to predictions folder')

In [None]:
prediction('test-imgs/', model, cfg)

## Conclusion
After running this notebook there should be a predictions folder full of images with or without bounding boxes around the safety cones. Images in prediction folder are compiled into a video using Adobe After Effects

**Next steps in improving models performence can be:**
- Include test set in the training after making sure that the model yields good mAP results and does not overfit;
- Use more images;
- Use images of a better resolution;
- Remove the text snippets in all of the images;
- Try to move to Tensorflow's Object Detection API and see if those models yield better results;
- Only train on images that include atleast one bounding box;
- Create batches of different images for training, where there are only 1 cone for image or 1 and 2 cones, or only 2 cones and so on.


