# Behavioral Object Detection

This notebook details how to use [Detectron2](https://github.com/facebookresearch/detectron2) to build a POC object detection model.

In [None]:
import os
import random
import json
import boto3
import sagemaker
import s3fs
import cv2
import matplotlib.pyplot as plt

fs = s3fs.S3FileSystem()

First, we need to access the manifest file generated by SageMaker Groundtruth. If you have it locally, replace the manifest with a local file path.

In [None]:
# S3 path to manifest file
BUCKET = 'behavior-images'
FOLDER = 'fps2-output/driver-actions'
MANIFEST = 'manifests/output/output.manifest'

# Define object dictionary
objects = { '0': 'phone',
            '1': 'cigarette',
            '2': 'phub',
            '4': 'smoke'
          }

# Define manifest files
manifest = 's3://{}/{}/{}'.format(BUCKET, FOLDER, MANIFEST)

## Validate and confirm the access to the bucket

In [None]:
# Make sure the bucket is in the same region as this notebook.
role = sagemaker.get_execution_role()
region = boto3.session.Session().region_name
s3 = boto3.client('s3')
bucket_region = s3.head_bucket(Bucket=BUCKET)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']
assert bucket_region == region, "Your S3 bucket {} and this notebook need to be in the same region. (notebook region: {}, bucket region: {})".format(BUCKET, region, bucket_region)

## Process data

We need to convert the manifest to the format that Detectron2 requires.

In [None]:
# replace fs.open(manifest) with open(manifest_local) if you have it locally
with fs.open(manifest) as f:
    manifest_list = [json.loads(line.strip()) for line in f.readlines()]

print("{} items are found in the manifest file.".format(len(manifest_list)))

In [None]:
manifest_list[1]

Download dataset from s3 using s3 high level cli command. Skip this step if you have your dataset locally.

In [None]:
!aws s3 cp s3://behavior-images/fps2-input/ data/fs2-input/ --recursive

Register the dataset to detectron2, following the [detectron2 custom dataset tutorial](https://detectron2.readthedocs.io/tutorials/datasets.html).
Here, the dataset is in its custom format, therefore we write a function to parse it and prepare it into detectron2's standard format. It should contain `file_name`, `height`, `width`, `image_id`, `annotations` fields at least, the `annotations` is a list of `{bbox, bbox_mode, category_id}`.

In [None]:
from detectron2.structures import BoxMode

def get_data_dicts(manifest_list, img_dir):
    """Prepare datasets for detectron2 with manifest list and image directory

    Args:
        - manifest_list (list): annotation list
        - img_dir (str): local directory where images are stored
    Returns:
        - dataset_dicts: list(dict)
    """
    dataset_dicts = []
    for idx, v in enumerate(manifest_list):
        record = {}
        file_name = v['source-ref'].rsplit('/', 1)[-1]
        img_size = v['driver-actions']['image_size'][0] # idk why img_size is a list of more than one xywh
        annotations = v['driver-actions']['annotations']
        
        record['file_name'] = os.path.join(img_dir, file_name)
        record['image_id'] = idx
        record['height'] = img_size['height']
        record['width'] = img_size['width']
        
        objs = []
        for annot in annotations:
            x = annot['left']
            y = annot['top']
            w = annot['width']
            h = annot['height']
            category_id = annot['class_id']
            obj = {
                'bbox': [x, y, w, h],
                'bbox_mode': BoxMode.XYWH_ABS,
                'category_id': category_id
            }
            objs.append(obj)
        record['annotations'] = objs
        dataset_dicts.append(record)
    return dataset_dicts
        

Register dataset and metadata. Read more in the same [tutorial](https://detectron2.readthedocs.io/tutorials/datasets.html) about custom dataset. If you already registered it and you want to redo it, use `DatasetCatalog.remove('driver_action')` first.

In [None]:
# DatasetCatalog.remove('driver_action')

In [None]:
from detectron2.data import MetadataCatalog, DatasetCatalog

# change data/fs2-input to where you store your images
img_input_path = 'data/fs2-input'
DatasetCatalog.register("driver_action", lambda : get_data_dicts(manifest_list, img_input_path))
MetadataCatalog.get("driver_action").set(thing_classes=['phone', 'cigarette', 'phub', 'smoke'])

To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:



In [None]:
def cv2_imshow(image):
    # set size
    plt.figure(figsize=(10,10))
    plt.axis("off")

    # convert color from CV2 BGR back to RGB
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plt.imshow(image)
    plt.show()

In [None]:
from detectron2.utils.visualizer import Visualizer

dataset_dicts = get_data_dicts(manifest_list, img_input_path)
for d in random.sample(dataset_dicts, 3):
    print(d['file_name'])
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=MetadataCatalog.get("driver_action"), scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    cv2_imshow(out.get_image()[:, :, ::-1])

## Define a model and TRAIN it

In this step, we finetued a COCO-pretrained R50 FPN Faster RCNN model. First we need to specify configurations for it.

In [None]:
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) #Get the basic model configuration from the model zoo 
#Passing the Train and Validation sets
cfg.DATASETS.TRAIN = ("driver_action",)
cfg.DATASETS.TEST = ()
# Number of data loading threads
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
# Number of images per batch across all machines.
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.0125  # pick a good LearningRate
cfg.SOLVER.MAX_ITER = 500  #No. of iterations   
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128  
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4 
cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS = False

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

Then we simply train it by defining a trainer and call its `train()` method.

In [None]:
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

### Check training curves.
Following magic commands have problem runing in SageMaker, follow this [post](https://stackoverflow.com/questions/47818822/can-i-use-tensorboard-with-google-colab) for trouble shooting.

Also, you can launch it in a seperate console, and access it with `https://<YOUR_URL>.studio.region.sagemaker.aws/jupyter/default/proxy/6006/`.

In [None]:
%load_ext tensorboard
%tensorboard --logdir output 

## Inference & evaluation

First, let's create a predictor using the model we just trained:

In [None]:
from detectron2.engine import DefaultPredictor

# Inference should use the config with parameters that are used in training
# cfg now already contains everything we've set previously. We changed it a little bit for inference:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom testing threshold
predictor = DefaultPredictor(cfg)

Then, we randomly select several samples to visualize the prediction results.

In [None]:
from detectron2.utils.visualizer import ColorMode
# use the same dataset for training
for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)  # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
    v = Visualizer(im[:, :, ::-1],
                   metadata=MetadataCatalog.get("driver_action"), 
                   scale=0.5, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    cv2_imshow(out.get_image()[:, :, ::-1])

### Use AP metrics for evaluation

Create COCO format json file for evaluation. The functions are from detectron2 [coco dataset](https://github.com/facebookresearch/detectron2/blob/2c346e0b6ed95a6966564d588da8746f66adab38/detectron2/data/datasets/coco.py#L400).

This the `xxx_coco_format.json` file should actually be automatically generated, however it fails. Here we manually create it and save it at `./output/driver_action_coco_format.json`, where the evaluation would later access.

In [None]:
from detectron2.structures import Boxes, BoxMode
from fvcore.common.file_io import PathManager, file_lock
import datetime

def convert_to_coco_dict(dataset_name):
    """
    Convert an instance detection/segmentation or keypoint detection dataset
    in detectron2's standard format into COCO json format.
    Generic dataset description can be found here:
    https://detectron2.readthedocs.io/tutorials/datasets.html#register-a-dataset
    COCO data format description can be found here:
    http://cocodataset.org/#format-data
    Args:
        dataset_name (str):
            name of the source dataset
            Must be registered in DatastCatalog and in detectron2's standard format.
            Must have corresponding metadata "thing_classes"
    Returns:
        coco_dict: serializable dict in COCO json format
    """

    dataset_dicts = DatasetCatalog.get(dataset_name)
    metadata = MetadataCatalog.get(dataset_name)

    # unmap the category mapping ids for COCO
    if hasattr(metadata, "thing_dataset_id_to_contiguous_id"):
        reverse_id_mapping = {v: k for k, v in metadata.thing_dataset_id_to_contiguous_id.items()}
        reverse_id_mapper = lambda contiguous_id: reverse_id_mapping[contiguous_id]  # noqa
    else:
        reverse_id_mapper = lambda contiguous_id: contiguous_id  # noqa

    categories = [
        {"id": reverse_id_mapper(id), "name": name}
        for id, name in enumerate(metadata.thing_classes)
    ]

    print("Converting dataset dicts into COCO format")
    coco_images = []
    coco_annotations = []

    for image_id, image_dict in enumerate(dataset_dicts):
        coco_image = {
            "id": image_dict.get("image_id", image_id),
            "width": image_dict["width"],
            "height": image_dict["height"],
            "file_name": image_dict["file_name"],
        }
        coco_images.append(coco_image)

        anns_per_image = image_dict["annotations"]
        for annotation in anns_per_image:
            # create a new dict with only COCO fields
            coco_annotation = {}

            # COCO requirement: XYWH box format
            bbox = annotation["bbox"]
            bbox_mode = annotation["bbox_mode"]
            bbox = BoxMode.convert(bbox, bbox_mode, BoxMode.XYWH_ABS)

            # COCO requirement: instance area
            if "segmentation" in annotation:
                # Computing areas for instances by counting the pixels
                segmentation = annotation["segmentation"]
                # TODO: check segmentation type: RLE, BinaryMask or Polygon
                if isinstance(segmentation, list):
                    polygons = PolygonMasks([segmentation])
                    area = polygons.area()[0].item()
                elif isinstance(segmentation, dict):  # RLE
                    area = mask_util.area(segmentation)
                else:
                    raise TypeError(f"Unknown segmentation type {type(segmentation)}!")
            else:
                # Computing areas using bounding boxes
                bbox_xy = BoxMode.convert(bbox, BoxMode.XYWH_ABS, BoxMode.XYXY_ABS)
                area = Boxes([bbox_xy]).area()[0].item()

            if "keypoints" in annotation:
                keypoints = annotation["keypoints"]  # list[int]
                for idx, v in enumerate(keypoints):
                    if idx % 3 != 2:
                        # COCO's segmentation coordinates are floating points in [0, H or W],
                        # but keypoint coordinates are integers in [0, H-1 or W-1]
                        # For COCO format consistency we substract 0.5
                        # https://github.com/facebookresearch/detectron2/pull/175#issuecomment-551202163
                        keypoints[idx] = v - 0.5
                if "num_keypoints" in annotation:
                    num_keypoints = annotation["num_keypoints"]
                else:
                    num_keypoints = sum(kp > 0 for kp in keypoints[2::3])

            # COCO requirement:
            #   linking annotations to images
            #   "id" field must start with 1
            coco_annotation["id"] = len(coco_annotations) + 1
            coco_annotation["image_id"] = coco_image["id"]
            coco_annotation["bbox"] = [round(float(x), 3) for x in bbox]
            coco_annotation["area"] = area
            coco_annotation["iscrowd"] = annotation.get("iscrowd", 0)
            coco_annotation["category_id"] = reverse_id_mapper(annotation["category_id"])

            # Add optional fields
            if "keypoints" in annotation:
                coco_annotation["keypoints"] = keypoints
                coco_annotation["num_keypoints"] = num_keypoints

            if "segmentation" in annotation:
                coco_annotation["segmentation"] = annotation["segmentation"]

            coco_annotations.append(coco_annotation)

    print(
        "Conversion finished, "
        f"num images: {len(coco_images)}, num annotations: {len(coco_annotations)}"
    )

    info = {
        "date_created": str(datetime.datetime.now()),
        "description": "Automatically generated COCO json file for Detectron2.",
    }
    coco_dict = {
        "info": info,
        "images": coco_images,
        "annotations": coco_annotations,
        "categories": categories,
        "licenses": None,
    }
    return coco_dict

def convert_to_coco_json(dataset_name, output_file, allow_cached=True):
    """
    Converts dataset into COCO format and saves it to a json file.
    dataset_name must be registered in DatasetCatalog and in detectron2's standard format.
    Args:
        dataset_name:
            reference from the config file to the catalogs
            must be registered in DatasetCatalog and in detectron2's standard format
        output_file: path of json file that will be saved to
        allow_cached: if json file is already present then skip conversion
    """

    # TODO: The dataset or the conversion script *may* change,
    # a checksum would be useful for validating the cached data

    PathManager.mkdirs(os.path.dirname(output_file))
    with file_lock(output_file):
        if PathManager.exists(output_file) and allow_cached:
            logger.warning(
                f"Using previously cached COCO format annotations at '{output_file}'. "
                "You need to clear the cache file if your dataset has been modified."
            )
        else:
            print(f"Converting annotations of dataset '{dataset_name}' to COCO format ...)")
            coco_dict = convert_to_coco_dict(dataset_name)

            print(f"Caching COCO format annotations at '{output_file}' ...")
            with PathManager.open(output_file, "w") as f:
                json.dump(coco_dict, f)


In [None]:
convert_to_coco_json('driver_action', './output/driver_action_coco_format.json', allow_cached=True)

In [None]:
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

evaluator = COCOEvaluator("driver_action", None, False, output_dir="./output/")
val_loader = build_detection_test_loader(cfg, "driver_action")
print(inference_on_dataset(trainer.model, val_loader, evaluator))
# another equivalent way to evaluate the model is to use `trainer.test`

## Manually inference on a batch of frams

Download frames to convert from S3. Frames are extracted from videos, you can extract your own with `$ffmpeg -i video.mp4 -vf fps=fps=2/1 img_%d.jpg`, remove the filter `-vf fps=fps=2/1` to extract all frames.

In [None]:
!aws s3 cp s3://behavior-images/all-frames-input/frames.tar.gz data/
!tar -zxvf data/frames.tar.gz -C data/

Inference frames by frame.

In [None]:
import glob
from tqdm import tqdm
from pathlib import Path

# make sure this folder exists, or  cv2.imwrite won't write anything
Path('output/all_frames').mkdir(parents=True, exist_ok=True)

for img_path in tqdm(glob.glob('./data/frames/*.jpg'), desc="Inferencing"):
    im = cv2.imread(img_path)
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=MetadataCatalog.get("driver_action"), 
                   scale=0.5, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels. This option is only available for segmentation models
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    
    cv2.imwrite(os.path.join("output", "all_frames", img_path.rsplit('/', 1)[-1]), out.get_image())

Compress inference results and move it to S3.

In [None]:
!tar -zcf inferred_frames.tar.gz output/all_frames

In [None]:
!aws s3 mv inferred_frames.tar.gz s3://behavior-images/all-frames-output/