# HollywoodHeads

In this notebook, we prepare HollywoodHeads data. Preparation essentially consists of transforming the annotations into the format supported by YoloV7 and creating the summary.txt referencing all the images in the dataset.

## Links

- https://www.di.ens.fr/willow/research/headdetection/

## First step: Download

In `homemade/`:

From the link, download the following file:
- HollywoodHeads.zip

## Second step: Prepare repository

```bash
cd homemade/
unzip HollywoodHeads.zip
cd HollywoodHeads
```

```bash
mkdir images
rep=train; mkdir images/$rep; while read -r line; do mv JPEGImages/${line}.jpeg images/${rep}/; done < Splits/${rep}.txt
rep=val; mkdir images/$rep; while read -r line; do mv JPEGImages/${line}.jpeg images/${rep}/; done < Splits/${rep}.txt
rep=test; mkdir images/$rep; while read -r line; do mv JPEGImages/${line}.jpeg images/${rep}/; done < Splits/${rep}.txt
```

## Third step: Prepare labels

In [1]:
from pathlib import Path

path = Path("homemade/HollywoodHeads")
repositories = ['train', 'val', 'test']

path_images = path / 'images'
path_annotations = path / 'Annotations'
path_splits = path / 'Splits'
odgt_format = path / "annotation_{}.odgt"

path_labels = path / 'labels'
path_labels.mkdir(exist_ok=True)

In [2]:
def xml_to_yolo(input_file, output_file):
    root = ET.parse(input_file).getroot()
    
    width = int(root.find('size').find('width').text)
    height = int(root.find('size').find('height').text)
    depth = int(root.find('size').find('depth').text)

    strings = []
    for obj in root.iter('object'):
        if len(obj) == 0:
            continue
        name = obj.find('name').text
        assert(name == 'head')
        bb = [float(child.text) for child in obj.find('bndbox')] #xmin, ymin, xmax, ymax
        
        x_center = (bb[0] + bb[2]) / 2
        x_size = (bb[2] - bb[0])
        y_center = (bb[1] + bb[3]) / 2
        y_size = (bb[3] - bb[1])
        
        if x_size <= 3 or y_size <= 3:
            continue
            
        x_center /= width
        x_size /= width
        y_center /= height
        y_size /= height
        
        strings.append("{} {:.6f} {:.6f} {:.6f} {:.6f}".format(0, x_center, y_center, x_size, y_size))
    
    if len(strings) > 0:
        with open(output_file, 'w') as f:
            f.write("\n".join(strings) + "\n")
        return True
    return False

In [3]:
import xml.etree.ElementTree as ET
from tqdm import tqdm

for rep in repositories:
    split_file = path_splits / (rep + ".txt")
    print("Processing {}:".format(split_file))

    with open(split_file, 'r') as f:
        image_list = f.read().split('\n')
    image_list = list(filter(len, image_list))
    
    labels = path_labels / rep
    labels.mkdir(exist_ok=True)
    
    images = path_images / rep
    
    valid_images = []
    invalid_images = []
    for image_id in tqdm(image_list):
        annotation_file = path_annotations / (image_id + ".xml")
        output_file = labels / (image_id + '.txt')
        valid = xml_to_yolo(annotation_file, output_file)
        if valid:
            image_file = images / (image_id + ".jpeg")
            valid_images.append(str(image_file))
        else:
            invalid_images.append(str(annotation_file))

    with open(labels / "summary.txt", 'w') as f:
        f.write("\n".join(valid_images) + "\n")

    print("Invalid images: {}".format("\n".join(invalid_images)))

Processing homemade/HollywoodHeads/Splits/train.txt:


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 216719/216719 [00:38<00:00, 5660.83it/s]


Invalid images: homemade/HollywoodHeads/Annotations/mov_007_121337.xml
homemade/HollywoodHeads/Annotations/mov_007_121338.xml
homemade/HollywoodHeads/Annotations/mov_007_121339.xml
homemade/HollywoodHeads/Annotations/mov_007_121340.xml
homemade/HollywoodHeads/Annotations/mov_007_121341.xml
homemade/HollywoodHeads/Annotations/mov_007_121342.xml
homemade/HollywoodHeads/Annotations/mov_007_121343.xml
homemade/HollywoodHeads/Annotations/mov_007_121344.xml
homemade/HollywoodHeads/Annotations/mov_007_121345.xml
homemade/HollywoodHeads/Annotations/mov_007_121346.xml
homemade/HollywoodHeads/Annotations/mov_007_121347.xml
homemade/HollywoodHeads/Annotations/mov_007_121348.xml
homemade/HollywoodHeads/Annotations/mov_007_121349.xml
homemade/HollywoodHeads/Annotations/mov_007_121350.xml
homemade/HollywoodHeads/Annotations/mov_007_121351.xml
homemade/HollywoodHeads/Annotations/mov_007_121362.xml
homemade/HollywoodHeads/Annotations/mov_007_121363.xml
homemade/HollywoodHeads/Annotations/mov_007_12136

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6719/6719 [00:01<00:00, 5832.21it/s]


Invalid images: homemade/HollywoodHeads/Annotations/mov_016_020655.xml
homemade/HollywoodHeads/Annotations/mov_016_040350.xml
homemade/HollywoodHeads/Annotations/mov_016_058149.xml
homemade/HollywoodHeads/Annotations/mov_016_058159.xml
homemade/HollywoodHeads/Annotations/mov_016_058199.xml
homemade/HollywoodHeads/Annotations/mov_016_063144.xml
homemade/HollywoodHeads/Annotations/mov_016_099344.xml
homemade/HollywoodHeads/Annotations/mov_016_143427.xml
homemade/HollywoodHeads/Annotations/mov_016_143457.xml
homemade/HollywoodHeads/Annotations/mov_016_143477.xml
homemade/HollywoodHeads/Annotations/mov_017_017674.xml
homemade/HollywoodHeads/Annotations/mov_017_017684.xml
homemade/HollywoodHeads/Annotations/mov_017_017694.xml
homemade/HollywoodHeads/Annotations/mov_017_021212.xml
homemade/HollywoodHeads/Annotations/mov_017_055926.xml
homemade/HollywoodHeads/Annotations/mov_017_055936.xml
homemade/HollywoodHeads/Annotations/mov_017_106337.xml
homemade/HollywoodHeads/Annotations/mov_017_10634

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1302/1302 [00:00<00:00, 5867.07it/s]

Invalid images: homemade/HollywoodHeads/Annotations/mov_019_126546.xml
homemade/HollywoodHeads/Annotations/mov_020_189030.xml
homemade/HollywoodHeads/Annotations/mov_021_018056.xml
homemade/HollywoodHeads/Annotations/mov_021_061252.xml
homemade/HollywoodHeads/Annotations/mov_021_164475.xml



