## What is run-length encoding (RLE)?
RLE is run-length encoding. It is used to encode the location of foreground objects in segmentation. Instead of outputting a mask image, you give a list of start pixels and how many pixels after each of those starts are included in the mask.

https://github.com/cocodataset/cocoapi/issues/184

"In order to reduce the submission file size, our metric uses run-length encoding on the pixel values. Instead of submitting an exhaustive list of indices for your segmentation, you will submit pairs of values that contain a start position and a run length. E.g. '1 3' implies starting at pixel 1 and running a total of 3 pixels (1,2,3).

The competition format requires a space delimited list of pairs. For example, '1 3 10 5' implies pixels 1,2,3,10,11,12,13,14 are to be included in the mask. The pixels are one-indexed and numbered from top to bottom, then left to right: 1 is pixel (1,1), 2 is pixel (2,1), etc.

The metric checks that the pairs are sorted, positive, and the decoded pixel values are not duplicated. It also checks that no two predicted masks for the same image are overlapping.

The file should contain a header and have the following format. Each row in your submission represents a single predicted nucleus segmentation for the given ImageId." 

https://www.kaggle.com/c/data-science-bowl-2018#evaluation

## What is the COCO format?


annotation{

"id" : int, "image_id" : int, 

"category_id" : int, 

"segmentation" : RLE or [polygon], 

"area" : float, 

"bbox" : [x,y,width,height], 

"iscrowd" : 0 or 1,

}

categories[{

"id" : int, 

"name" : str, 

"supercategory" : str,

}]


http://cocodataset.org/#format-data

http://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch/#coco-dataset-format



## What does COCO iscroud mean? 

"iscrowd": 0 if your segmentation based on polygon (object instance)

"iscrowd": 1 if your segmentation based uncompressed RLE (crowd)

https://github.com/cocodataset/cocoapi/issues/135



In [3]:
import datetime
import json
import os
import re
import fnmatch
from PIL import Image
import numpy as np
from pycococreatortools import pycococreatortools

In [4]:
ROOT_DIR = 'data/examples/shapes/train'
IMAGE_DIR = os.path.join(ROOT_DIR, "shapes_train2018")
ANNOTATION_DIR = os.path.join(ROOT_DIR, "annotations")

INFO = {
    "description": "Example Dataset",
    "url": "https://github.com/waspinator/pycococreator",
    "version": "0.1.0",
    "year": 2018,
    "contributor": "waspinator",
    "date_created": datetime.datetime.utcnow().isoformat(' ')
}

LICENSES = [
    {
        "id": 1,
        "name": "Attribution-NonCommercial-ShareAlike License",
        "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/"
    }
]

CATEGORIES = [
    {
        'id': 1,
        'name': 'square',
        'supercategory': 'shape',
    },
    {
        'id': 2,
        'name': 'circle',
        'supercategory': 'shape',
    },
    {
        'id': 3,
        'name': 'triangle',
        'supercategory': 'shape',
    },
]

def filter_for_jpeg(root, files):
    file_types = ['*.jpeg', '*.jpg']
    file_types = r'|'.join([fnmatch.translate(x) for x in file_types])
    files = [os.path.join(root, f) for f in files]
    files = [f for f in files if re.match(file_types, f)]
    
    return files

def filter_for_annotations(root, files, image_filename):
    file_types = ['*.png']
    file_types = r'|'.join([fnmatch.translate(x) for x in file_types])
    basename_no_extension = os.path.splitext(os.path.basename(image_filename))[0]
    file_name_prefix = basename_no_extension + '.*'
    files = [os.path.join(root, f) for f in files]
    files = [f for f in files if re.match(file_types, f)]
    files = [f for f in files if re.match(file_name_prefix, os.path.splitext(os.path.basename(f))[0])]

    return files



In [5]:
coco_output = {
    "info": INFO,
    "licenses": LICENSES,
    "categories": CATEGORIES,
    "images": [],
    "annotations": []
}

image_id = 1
segmentation_id = 1

# filter for jpeg images
for root, _, files in os.walk(IMAGE_DIR):
    image_files = filter_for_jpeg(root, files)

    # go through each image
    for image_filename in image_files:
        image = Image.open(image_filename)
        image_info = pycococreatortools.create_image_info(
            image_id, os.path.basename(image_filename), image.size)
        coco_output["images"].append(image_info)

        # filter for associated png annotations
        for root, _, files in os.walk(ANNOTATION_DIR):
            annotation_files = filter_for_annotations(root, files, image_filename)

            # go through each associated annotation
            for annotation_filename in annotation_files:

                #print(annotation_filename)
                class_id = [x['id'] for x in CATEGORIES if x['name'] in annotation_filename][0]

                category_info = {'id': class_id, 'is_crowd': 'crowd' in image_filename}
                binary_mask = np.asarray(Image.open(annotation_filename)
                    .convert('1')).astype(np.uint8)

                annotation_info = pycococreatortools.create_annotation_info(
                    segmentation_id, image_id, category_info, binary_mask,
                    image.size, tolerance=2)

                if annotation_info is not None:
                    coco_output["annotations"].append(annotation_info)

                segmentation_id = segmentation_id + 1

        image_id = image_id + 1

with open('{}/instances_shape_train2018.json'.format(ROOT_DIR), 'w') as output_json_file:
    json.dump(coco_output, output_json_file)