### Implementation of the Fully Connected One Stage (FCOS) object detection algorithm with training/eval on the PASCAL VOC 2007 dataset. 

Make sure the data has been downloaded (http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) and extracted. The 'JPEGImages' subdirectory contains all the raw jpeg images. The 'Annotations' subdirectory contains corresponding XML files with object detection labels/metadata. The 'ImageSets/Main' subdirectory contains .txt files 'train.txt', 'val.txt' which contain identifiers of images for training and validation splits respectively. (There are also additional .txt files containing identifiers for images per class for each split).

In [9]:
import os
import xml.etree.ElementTree as ET
import torch

In [6]:
# first lets read in the image identifiers for train-val splits
with open(os.path.join('VOC2007_trainval', 'ImageSets', 'Main', 'train.txt')) as file:
    identifiers_train = [line.strip() for line in file.readlines()]

with open(os.path.join('VOC2007_trainval', 'ImageSets', 'Main', 'val.txt')) as file:
    identifiers_val = [line.strip() for line in file.readlines()]

# now get the jpeg filepaths for the images
image_filepaths_train = [os.path.join('VOC2007_trainval','JPEGImages',x+'.jpg') for x in identifiers_train]    
image_filepaths_val = [os.path.join('VOC2007_trainval','JPEGImages',x+'.jpg') for x in identifiers_val]    

# get the xml filepaths to object detection target labels
target_filepaths_train = [os.path.join('VOC2007_trainval','Annotations',x+'.xml') for x in identifiers_train]    
target_filepaths_val = [os.path.join('VOC2007_trainval','Annotations',x+'.xml') for x in identifiers_val]    

print(f"Num train images: {len(image_filepaths_train)}")
print(f"Num val images: {len(image_filepaths_val)}")

Num train images: 2501
Num val images: 2510


We will set up a pytorch Dataset object for accessing image-target pairs

In [7]:
class VOC2007(torch.utils.data.Dataset):
    def __init__(self, dataset_dir='VOC2007_trainval', split='train', image_size=224):
        super().__init__()
        self.image_size = image_size

        # first lets read in the image identifiers for train-val splits
        with open(os.path.join(dataset_dir, 'ImageSets', 'Main', split+'.txt')) as file:
            identifiers = [line.strip() for line in file.readlines()]
        # now get the jpeg filepaths for the images
        self.image_filepaths = [os.path.join(dataset_dir,'JPEGImages',x+'.jpg') for x in identifiers]    
        # get the xml filepaths to object detection target labels
        self.target_filepaths_train = [os.path.join(dataset_dir,'Annotations',x+'.xml') for x in identifiers]    

    def __len__(self):
        return len(self.image_filepaths)


    # fucntion for parsing XML file to get object detection target labels
    def parse_xml(self, filepath):
        # start at the root of the XML tree
        root_node = ET.parse(filepath).getroot()
        annotations = {}
        # get image size
        annotations['size'] = {
            'width': root_node.find('size/width').text,
            'height': root_node.find('size/height').text
        }
        # get all the object bounding boxes
        objects = []
        for obj in root_node.findall('object'):
            # for each object, get class name, difficulty identifier (0: easy, 1: difficult) and bounding box (top-left and bottom-right corner) coordinates 
            object_dict = {
                'class': obj.find('name').text,
                'difficult': obj.find('difficult').text,
                'bndbox': {
                    'xmin': int(obj.find('bndbox/xmin').text),
                    'ymin': int(obj.find('bndbox/xmin').text),
                    'xmax': int(obj.find('bndbox/xmin').text),
                    'ymax': int(obj.find('bndbox/xmin').text),
                }
            }
            objects.append(object_dict)
        annotations['objects'] = objects
        return annotations    

['VOC2007_trainval/Annotations/000012.xml',
 'VOC2007_trainval/Annotations/000017.xml',
 'VOC2007_trainval/Annotations/000023.xml',
 'VOC2007_trainval/Annotations/000026.xml',
 'VOC2007_trainval/Annotations/000032.xml']