## Functions

- check number of images in folder is same as json['images']
- get number of images with detections
- get number of detections
- get images in folder that are not in json['images']
- check images file structure - no need cause there will be error when creating fiftyone dataset
- get json['images'] not in images folder - no need cause there will be error when creating fiftyone dataset

## Setup

In [None]:
from pathlib import Path

import fiftyone as fo
from fiftyone import ViewField as F

## Import dataset

This example uses the COCO object detection format, but you can change the dataset_type to any of the [supported formats](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#supported-import-formats)

Some documentation on [SampleCollection](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection)

In [70]:
data_path = "/media/data/datasets/PersDet/TODO_CaltechPedestrians_blacked_10x/images"
labels_path = "/media/data/datasets/PersDet/TODO_CaltechPedestrians_blacked_10x/test.json"

dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path=data_path,
    labels_path=labels_path,
)

 100% |███████████████| 2198/2198 [2.6s elapsed, 0s remaining, 836.9 samples/s]      


In [71]:
dataset

Name:        2023.01.17.15.24.34
Media type:  image
Num samples: 2198
Persistent:  False
Tags:        []
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)

In [72]:
dataset.info

{'contributor': '',
 'date_created': '',
 'description': '',
 'url': '',
 'version': '',
 'year': '',
 'licenses': [{'name': '', 'id': 0, 'url': ''}],
 'categories': [{'id': 1, 'name': 'person', 'supercategory': 'person'}]}

In [73]:
# dataset.head()

In [74]:
# # Print a sample ground truth detection
# sample = dataset.first()
# print(sample.ground_truth.detections[0])

In [75]:
IMAGES_SUFFIX = ['.jpg', '.png']

In [76]:
# check number of images in folder is same as json['images']

num_images_json = len(dataset)
print(f'num_images_json: {num_images_json}')

num_images_folder = 0
for path in Path(data_path).rglob("*"):
    if path.is_file() and path.suffix in IMAGES_SUFFIX:
        num_images_folder += 1
print(f'num_images_folder: {num_images_folder}')

num_images_json: 2198
num_images_folder: 24988


In [77]:
# get number of images with detections

num_images_with_dets = dataset.match(F("ground_truth.detections").length() > 0)
print(f'num_images_with_dets: {len(num_images_with_dets)}')

num_images_with_dets: 1251


In [78]:
# get number of detections

num_detections = dataset.count("ground_truth.detections")
print(f'num_detections: {num_detections}')

num_detections: 2725


In [69]:
filepaths_json = dataset.values('filepath')
filepaths_json = [Path(path) for path in filepaths_json]
# filepaths_json

In [36]:
# get images in folder that are not in json['images']

data_path_path = Path(data_path)
missing_images_in_json = []
for path in Path(data_path).rglob("*"):
    if path.is_file() and path.suffix in IMAGES_SUFFIX:
        if path not in filepaths_json:
            p_relative = path.relative_to(data_path_path)
            missing_images_in_json.append(str(p_relative))
missing_images_in_json

KeyboardInterrupt: 

## Run App

In [49]:
# Launch the App
# session = fo.launch_app(dataset)

# To launch the App in a dedicated browser tab, run this line and go to 127.0.0.1:5151
session = fo.launch_app(dataset, auto=False)

Session launched. Run `session.show()` to open the App in a cell output.


In [50]:
random_view = dataset.take(100)
session.view = random_view