# Marburg Open Forest (MOF) and Białowieża National Park (BNP) test data sets

The **Marburg Open Forest** data set consists of a collection of 2435 images showing 18 animal species (and 9 classes with higher taxonomic labels).
The **Białowieża National Park** data set consists of a collection of 15,717 images showing 20 animal species (and 16 classes with higher taxonomic labels).


The `img` folder contains the images grouped in subfolders by recording date and camera trap id. 
The `md` folder contains the metadata for each image, which constists of the bounding box detections obtained using the MegaDetector model (https://github.com/agentmorris/MegaDetector). The metadata is grouped into yaml-files for each label at different taxonomic levels. 

### Imports

In [1]:
import glob
import yaml

### Download and extract the data sets

In [10]:
dataset_folders = "MOF BNP"

In [None]:
!"./data_download.sh" "$dataset_folders"

### Check for completeness

In [4]:
imgs = glob.glob("MOF/img/**/*.JPG", recursive=True)
assert len(imgs) == 2435, f"{2435 - len(imgs)} missing image files"

In [5]:
mds = glob.glob("MOF/md/**/*.yaml", recursive=True)
assert len(mds) == 27, f"{27 - len(mds)} missing md files"

In [6]:
meta_count = 0
for md in mds:
    with open(md) as f:
        meta = yaml.load(f, yaml.SafeLoader)
        meta_count += len(meta['images'])
assert meta_count == 2731, f"{2731 - meta_count} missing md entries"

In [13]:
imgs = glob.glob("BNP/img/**/*.JPG", recursive=True)
assert len(imgs) == 15717, f"{15717 - len(imgs)} missing image files"

In [14]:
mds = glob.glob("BNP/md/**/*.yaml", recursive=True)
assert len(mds) == 36, f"{36 - len(mds)} missing md files"

In [15]:
meta_count = 0
for md in mds:
    with open(md) as f:
        meta = yaml.load(f, yaml.SafeLoader)
        meta_count += len(meta['images'])
assert meta_count == 16831, f"{16831 - meta_count} missing md entries"