# Merge multiple annotated images into one

This notebook demonstrates how to stack multiple images into one, where each source image has been annotated with bounding box. The source annotations and merged annotations are in JSON format according to [section *Train with the Image Format* in the SageMaker Object Detection Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html) documentation.

The merge annotation will have the bboxes coordinates properly adjusted.

Atd to recap, here's the annotation structure:

```json
{
   "file": "haha.jpg",
   "image_size": [...]
   "annotations": [
      {
         "class_id": 0,
         "left": 300,
         "top": 38,
         "width": 100,
         "height": 52
      },
      {...}
   ],
   "categories": [...]
}
```

In [1]:
from IPython.display import display

import copy
import cv2
from itertools import chain
import json
import matplotlib.pyplot as plt
import numpy as np
import s3fs
from typing import Any, Dict, List

# Sample 6 images to group into one

In [None]:
aws_profile = 'default'   # Use 'default' for default credential

s3_prefix = 's3://bucket/dataset/train_annotation'
group_size = 6

fs = s3fs.S3FileSystem(anon=False, profile_name=aws_profile)
json_fnames: List[str] = fs.ls(s3_prefix)[:group_size]

json_fnames

# Load 6 images + annotations from S3.

In [4]:
def load_img(fname: str) -> np.ndarray:
    # Read bytes from S3 into an ndarray
    b = bytes(fs.cat(f's3://fuxin-marcverd/grouper/train/{fname}'))
    arr = np.frombuffer(b, dtype=np.uint8)
    
    # Decode the jpg-bytes ndarray to a cv2 image (and force 3 channels).
    return cv2.imdecode(arr,1)

annotations: List[Dict[Any,Any]] = [json.loads(fs.cat(fname)) for fname in json_fnames]
imgs: List[np.ndarray] = [load_img(d['file']) for d in annotations]

# Merging Procedure

**NOTE**: the code showed here can group variable-sized images. If all images are guaranteed to be the same size, then can also use `np.stack()`.

## Compute target height & width

In [5]:
# Get widths,heights of individual images
heights, widths = zip(*[x.shape[:2] for x in imgs])
print('Image heights:', heights)
print('Image widths:', widths)

# Compute width, height of grouped image
h,w = sum(heights), max(widths)
print('Size of grouped image will be', (h,w))

Image heights: (136, 136, 136, 136, 136, 136)
Image widths: (780, 780, 780, 780, 780, 780)
Size of grouped image will be (816, 780)


# Compute offsets

We also compute the y-offset for each image in the new picture. This offset determines the vertical starting point of each image.

```
+------------------+ => y = 0 
| image-00         |
+------------------+ => y = height of image-00
| image-01         |
+------------------+ => y = sum(heights of previous images)
| ...              |
+------------------+
```

In [6]:
y_offsets: np.ndarray = np.cumsum(heights) - heights[0]
y_offsets

array([  0, 136, 272, 408, 544, 680])

## Merge Images

In [7]:
# Allocate a black RGB image (i.e., all-zero array)
merged_arr = np.zeros((h,w,3), dtype=np.uint8)

# Put images one by one.
for img, offset in zip(imgs, y_offsets):
    height = img.shape[0]
    merged_arr[offset:offset+height,:,:] = img

## Recompute Bounding Boxes

Recall this is the annotation structure:

```json
{
   "file": "REPLACE_ME",
   "image_size": [...]
   "annotations": [
      {
         "class_id": 0,
         "left": 300,
         "top": 38,
         "width": 100,
         "height": 52
      },
   ],
   "categories": [...]
}
```

In [8]:
ann2 = []

for d, offset in zip(annotations, y_offsets):
    print('----')
    print(f'{offset}:', d['file'])
    d2 = copy.deepcopy(d)
    for bbox in d2['annotations']:
        print('ori: left, top, width, height:', (bbox['left'], bbox['top'], bbox['width'], bbox['height']))
        bbox['top'] += int(offset)    # Need int() otherwise result is np.int64 and json.dumps choke on it by default.
        print('adj: left, top, width, height:', (bbox['left'], bbox['top'], bbox['width'], bbox['height']))
    ann2.append(d2)

----
0: 2055_1_1.jpg
ori: left, top, width, height: (300, 38, 100, 52)
adj: left, top, width, height: (300, 38, 100, 52)
ori: left, top, width, height: (60, 60, 40, 50)
adj: left, top, width, height: (60, 60, 40, 50)
----
136: 2055_1_1_flipped.jpg
ori: left, top, width, height: (300, 38, 100, 52)
adj: left, top, width, height: (300, 174, 100, 52)
ori: left, top, width, height: (60, 60, 40, 50)
adj: left, top, width, height: (60, 196, 40, 50)
----
272: 2055_1_1_norm.jpg
ori: left, top, width, height: (300, 38, 100, 52)
adj: left, top, width, height: (300, 310, 100, 52)
ori: left, top, width, height: (60, 60, 40, 50)
adj: left, top, width, height: (60, 332, 40, 50)
----
408: 2055_1_1_norm_flipped.jpg
ori: left, top, width, height: (300, 38, 100, 52)
adj: left, top, width, height: (300, 446, 100, 52)
ori: left, top, width, height: (60, 60, 40, 50)
adj: left, top, width, height: (60, 468, 40, 50)
----
544: 2055_1_1_synth.jpg
ori: left, top, width, height: (300, 38, 100, 52)
adj: left, top,

## Merge annotations and Ensure consistent class id

We enforce a new mapping of `class_id => class_name` in the merge image. This anticipates unintended cases such as `class 0 => cat` in one annotation, but `class 0 => dog` in another annotation.

In [9]:
merged_fname = 'image-000.jpg'

new_cat = [
    {"class_id": 0, "name": "dog"},
    {"class_id": 1, "name": "cat"},
]

merged_ann = {
   "file": merged_fname,
   "image_size": [{
       'width': merged_arr.shape[1],
       'height': merged_arr.shape[0],
       'depth': merged_arr.shape[2]
   }],
   "annotations": [bbox for bbox in chain(*(d['annotations'] for d in ann2))],
   "categories": new_cat    
}

merged_ann

{'file': 'image-000.jpg',
 'image_size': [{'width': 780, 'height': 816, 'depth': 3}],
 'annotations': [{'class_id': 0,
   'left': 300,
   'top': 38,
   'width': 100,
   'height': 52},
  {'class_id': 1, 'left': 60, 'top': 60, 'width': 40, 'height': 50},
  {'class_id': 0, 'left': 300, 'top': 174, 'width': 100, 'height': 52},
  {'class_id': 1, 'left': 60, 'top': 196, 'width': 40, 'height': 50},
  {'class_id': 0, 'left': 300, 'top': 310, 'width': 100, 'height': 52},
  {'class_id': 1, 'left': 60, 'top': 332, 'width': 40, 'height': 50},
  {'class_id': 0, 'left': 300, 'top': 446, 'width': 100, 'height': 52},
  {'class_id': 1, 'left': 60, 'top': 468, 'width': 40, 'height': 50},
  {'class_id': 0, 'left': 300, 'top': 582, 'width': 100, 'height': 52},
  {'class_id': 1, 'left': 60, 'top': 604, 'width': 40, 'height': 50},
  {'class_id': 0, 'left': 300, 'top': 718, 'width': 100, 'height': 52},
  {'class_id': 1, 'left': 60, 'top': 740, 'width': 40, 'height': 50}],
 'categories': [{'class_id': 0, 'nam

## Display merged image with bboxes

In [None]:
def draw_bboxes(img: np.ndarray, d: Dict[str, Any], inplace=False) -> np.ndarray:
    im = img.copy() if not inplace else img
    for bbox in d['annotations']:
        # bbox coordinates
        x_min, y_min = bbox['left'], bbox['top']
        x_max, y_max = x_min + bbox['width'], y_min + bbox['height']

        # color to use (will round-robin r->g->b according to class_id)
        cid = bbox['class_id']
        color = [0,0,0]
        color[cid % 3] = 255
        cv2.rectangle(im, pt1=(x_min, y_min), pt2=(x_max, y_max), color=color, thickness=2)

    return im

img_with_bbox = draw_bboxes(merged_arr, merged_ann)

plt.figure(figsize=(9,9));
plt.title('Enlarged')
plt.imshow(img_with_bbox)
plt.show();

# Save to S3

In [11]:
print("Write merged image...")
with fs.open(f's3://bucket/dataset/train/{merged_fname}', 'wb') as f:
    content = cv2.imencode('.jpg', merged_arr)[1].tostring()
    f.write(content)

print("Write merged annotation...")
with fs.open('s3://bucket/dataset/train_annotation/image-000.json', 'wb') as f:
    content = json.dumps(merged_ann).encode(encoding='utf-8', errors='strict')
    f.write(content)

print("Write merged image with bboxes for debugging...")
with fs.open('s3://bucket/dataset/train_annotated/image-000-bbox.jpg', 'wb') as f:
    content = cv2.imencode('.jpg', img_with_bbox)[1].tostring()
    f.write(content)

Write merged image...
Write merged annotation...
Write merged image with bboxes for debugging...
