# FiftyOne Core Concepts

This notebook covers the core concepts of FiftyOne, that are relevant to V7. For full information, please refer to [Voxel51's documentation](https://docs.voxel51.com/).

## Dataset

The `Dataset` class is the most fundamental data structure in FiftyOne. It allows exploration & manipulation of all data relating to a specific dataset, including loading files and annotations into the dataset. Datasets are ordered collections of `Sample` objects.

In [3]:
import fiftyone as fo

# Create an empty dataset
dataset = fo.Dataset("my-dataset")

print(dataset)

Name:        my-dataset
Media type:  None
Num samples: 0
Persistent:  False
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField


Datasets are homogenous in that all their samples must be of the same media type. The media type of a dataset is set by the file extension of the first sample added to the dataset.

In [4]:
dataset = fo.Dataset()

print(dataset.media_type)
# None

sample = fo.Sample(filepath="/path/to/image.png")
dataset.add_sample(sample)

print(dataset.media_type)
# "image"

None
image


## Sample

A `Sample` object stores information related to a given piece of data (image or video). Every sample is initialized with a `filepath` pointing to the relevant data on disk.

In [5]:
# An image sample
sample = fo.Sample(filepath="/path/to/image.png")

# A video sample
another_sample = fo.Sample(filepath="/path/to/video.mp4")

# Adding samples to a dataset
dataset = fo.Dataset()
print(len(dataset))
# 0

dataset.add_samples(
    [
        fo.Sample(filepath="/path/to/image1.jpg"),
        fo.Sample(filepath="/path/to/image2.jpg"),
        fo.Sample(filepath="/path/to/image3.jpg"),
    ]
)

print(len(dataset))
# 3

0
 100% |█████████████████████| 3/3 [11.6ms elapsed, 0s remaining, 257.5 samples/s]    
3


## Field

A `Field` is an attribute of a `Sample` that stores some information. Fields can be created, modified, and deleted on a per-`Sample` basis. All samples have some required fields by default.

In [6]:
sample = fo.Sample(filepath="/path/to/image.png")

print(sample)

<Sample: {
    'id': None,
    'media_type': 'image',
    'filepath': '/path/to/image.png',
    'tags': [],
    'metadata': None,
    'created_at': None,
    'last_modified_at': None,
}>


## Labels

Labels store annotation data about a particular `Sample`. FiftyOne provides many label types, including Classifications, Detections, Keypoints, and Polylines.

In [7]:
sample = fo.Sample(filepath="/path/to/image.png")

# A simple polyline
polyline1 = fo.Polyline(
    points=[[(0.3, 0.3), (0.7, 0.3), (0.7, 0.3)]],
    closed=False,
    filled=False,
)

# A closed, filled polygon with a label
polyline2 = fo.Polyline(
    label="triangle", # `label` is analogous to V7 annotation classes / names
    points=[[(0.1, 0.1), (0.3, 0.1), (0.3, 0.3)]],
    closed=True,
    filled=True,
)

print(polyline1)
print(polyline2)

<Polyline: {
    'id': '67b5e61ee9aaca4b31d78259',
    'attributes': {},
    'tags': [],
    'label': None,
    'points': [[[0.3, 0.3], [0.7, 0.3], [0.7, 0.3]]],
    'confidence': None,
    'index': None,
    'closed': False,
    'filled': False,
}>
<Polyline: {
    'id': '67b5e61ee9aaca4b31d7825a',
    'attributes': {},
    'tags': [],
    'label': 'triangle',
    'points': [[[0.1, 0.1], [0.3, 0.1], [0.3, 0.3]]],
    'confidence': None,
    'index': None,
    'closed': True,
    'filled': True,
}>


## Dataset Views

A `DatasetView` is an object containing filtering rules. These objects can be applied to `Dataset` objects to return a subset of filtered data. Dataset views allow sorting, slicing, and searching operations.

In [12]:
import fiftyone.zoo as foz
import fiftyone.brain as fob
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("cifar10", split="test")
cats = dataset.match(F("ground_truth.label") == "cat")

# Compute the uniqueness of samples
fob.compute_uniqueness(cats)

# Create the `similar_cats` DatasetView, sorting by uniqueness
similar_cats = cats.sort_by("uniqueness", reverse=False)

# Display the filtered dataset (DatasetView) in the app
session = fo.launch_app(view=similar_cats)

Downloading split 'test' to '/Users/umbertodifabrizio/fiftyone/cifar10/test'


  0%|          | 623k/170M [00:11<53:11, 53.2kB/s]   


KeyboardInterrupt: 

## Grouped Datasets

A grouped dataset allows multiple "slices" of samples of possibly different modalities to be displayed simultaneously by organizing them into groups. Groups are quite analogous to multi-slotted items in V7.

Creating a grouped dataset is slightly different from creating a regular dataset. Here's an example of how to create and work with a grouped dataset:

In [None]:
import fiftyone as fo
import fiftyone.utils.random as four
import fiftyone.zoo as foz

groups = ["left", "center", "right"] # Similar to slot names
d = foz.load_zoo_dataset("quickstart")
four.random_split(d, {g: 1 / len(groups) for g in groups})
filepaths = [d.match_tags(g).values("filepath") for g in groups]
filepaths = [dict(zip(groups, fps)) for fps in zip(*filepaths)]

# Display the structure of the grouped data
print(filepaths[:2])

# Create a dataset and declare a group with `add_group_field(group_name)`
dataset = fo.Dataset("test-grouped-dataset")
dataset.add_group_field("group", default="center") # The optional default parameter specifies the slice of samples that will be returned via the API or visualized in the App’s grid view by default. If you don’t specify a default, one will be inferred from the first sample you add to the dataset.
 
samples = []
for fps in filepaths: # For every set of filepaths, create a group (Can be thought of as multi-slotted item)
    group = fo.Group()
    for name, filepath in fps.items(): # For every sample in the group, append the sample to the group
        sample = fo.Sample(filepath=filepath, group=group.element(name)) 
        samples.append(sample)

dataset.add_samples(samples)

print(dataset)

In this example, we create a grouped dataset with two groups. Each group contains samples of different media types (image and video in the first group, image and audio in the second group). This structure allows for flexible organization of related samples, similar to multi-slotted items in V7.

Grouped datasets are particularly useful when working with related data of different modalities or when you need to maintain relationships between multiple samples.