# Adding Model Predictions to Datasets

This recipe provides a glimpse into the possibilities for integrating FiftyOne into your machine learning workflows.

It covers the following concepts:

-   Loading your existing dataset in FiftyOne
-   Adding predictions from your model to your FiftyOne dataset
-   Launching the FiftyOne App and visualizing/exploring your data
-   Integrating the App into your data wrangling workflow

## Setup

Install `torch` and `torchvision`, if necessary:

In [1]:
# Modify as necessary (e.g., GPU install). See https://pytorch.org for options
!pip install torch
!pip install torchvision

Download the test split of the CIFAR-10 dataset to `~/fiftyone/cifar10/test`:

In [2]:
# Downloads the test split of CIFAR-10
!fiftyone zoo download cifar10 --splits test

Download a pretrained CIFAR-10 PyTorch model:

In [3]:
# Download the software
!git clone https://github.com/huyvnphan/PyTorch_CIFAR10

# Download the pretrained model (90MB)
!eta gdrive download --public \
    1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu \
    PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt

Cloning into 'PyTorch_CIFAR10'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (7/7), done.[K
remote: Total 560 (delta 2), reused 4 (delta 2), pack-reused 551[K
Receiving objects: 100% (560/560), 6.55 MiB | 13.01 MiB/s, done.
Resolving deltas: 100% (184/184), done.
Downloading '1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu' to 'PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt'
 100% |████|  719.8Mb/719.8Mb [8.0s elapsed, 0s remaining, 47.7Mb/s]       


## Importing FiftyOne

Let's start by importing the FiftyOne library:

In [4]:
import fiftyone as fo

## Loading an image classification dataset

Suppose you have an image classification dataset on disk in the following
format:

```
<dataset_dir>/
    data/
        <uuid1>.<ext>
        <uuid2>.<ext>
        ...
    labels.json
```

where `labels.json` is a JSON file in the following format:

```
{
    "classes": [
        <labelA>,
        <labelB>,
        ...
    ],
    "labels": {
        <uuid1>: <target1>,
        <uuid2>: <target2>,
        ...
    }
}
```

In your current workflow, you may parse this data into a list of
`(image_path, label)` tuples as follows:

In [4]:
import json
import os

# The location of the dataset on disk that you downloaded above
dataset_dir = os.path.expanduser("~/fiftyone/cifar10/test")

# Maps image UUIDs to image paths
images_dir = os.path.join(dataset_dir, "data")
image_uuids_to_paths = {
    os.path.splitext(n)[0]: os.path.join(images_dir, n)
    for n in os.listdir(images_dir)
}

labels_path = os.path.join(dataset_dir, "labels.json")
with open(labels_path, "rt") as f:
    _labels = json.load(f)

# Get classes
classes = _labels["classes"]

# Maps image UUIDs to int targets
labels = _labels["labels"]

# Make a list of (image_path, label) samples
data = [(image_uuids_to_paths[u], classes[t]) for u, t in labels.items()]

# Print a few data
print(data[:5])

[('/Users/Brian/fiftyone/cifar10/test/data/000001.jpg', 'cat'), ('/Users/Brian/fiftyone/cifar10/test/data/000002.jpg', 'ship'), ('/Users/Brian/fiftyone/cifar10/test/data/000003.jpg', 'ship'), ('/Users/Brian/fiftyone/cifar10/test/data/000004.jpg', 'airplane'), ('/Users/Brian/fiftyone/cifar10/test/data/000005.jpg', 'frog')]


Building a FiftyOne dataset from your samples is simple:

In [6]:
# Load the data into FiftyOne samples
samples = []
for image_path, label in data:
    samples.append(
        fo.Sample(
            filepath=image_path,
            ground_truth=fo.Classification(label=label),
        )
    )

# Add the samples to a dataset
dataset = fo.Dataset("my-dataset")
dataset.add_samples(samples)

# Print some information about the dataset
print(dataset)

 100% |███| 10000/10000 [3.7s elapsed, 0s remaining, 2.7K samples/s]      
Name:           my-dataset
Persistent:     False
Num samples:    10000
Tags:           []
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)


In [7]:
# Print a few samples from the dataset
print(dataset.head())

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454930',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000001.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'cat', 'confidence': None, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454931',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000002.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'ship', 'confidence': None, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454932',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000003.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'ship', 'confidence': None, 'logits': None}>,
}>


## Working with views into your dataset

FiftyOne provides a powerful notion of _dataset views_ for you to access
subsets of the samples in your dataset.

Here's an example operation:

In [8]:
# Used to write view expressions that involve sample fields
from fiftyone import ViewField as F

# Gets five airplanes from the dataset
view = (
    dataset.match(F("ground_truth.label") == "airplane")
    .limit(5)
)

# Print some information about the view you created
print(view)

Dataset:        my-dataset
Num samples:    5
Tags:           []
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
Pipeline stages:
    1. Match(filter={'$expr': {'$eq': [...]}})
    2. Limit(limit=5)


In [9]:
# Print a few samples from the view
print(view.head())

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454933',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000004.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'airplane', 'confidence': None, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f45493a',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000011.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'airplane', 'confidence': None, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454945',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000022.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'airplane', 'confidence': None, 'logits': None}>,
}>


Iterating over the samples in a view is easy:

In [10]:
for sample in view:
    print(sample.filepath)

/Users/Brian/fiftyone/cifar10/test/data/000004.jpg
/Users/Brian/fiftyone/cifar10/test/data/000011.jpg
/Users/Brian/fiftyone/cifar10/test/data/000022.jpg
/Users/Brian/fiftyone/cifar10/test/data/000028.jpg
/Users/Brian/fiftyone/cifar10/test/data/000045.jpg


## Adding model predictions to your dataset

The following code demonstrates how to add predictions from a model to your
FiftyOne dataset, with minimal changes to your existing ML code:

In [11]:
import sys

import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader

import fiftyone.utils.torch as fout

sys.path.insert(1, "PyTorch_CIFAR10")
from cifar10_models import *


def make_cifar10_data_loader(image_paths, sample_ids, batch_size):
    mean = [0.4914, 0.4822, 0.4465]
    std = [0.2023, 0.1994, 0.2010]
    transforms = torchvision.transforms.Compose(
        [
            torchvision.transforms.ToTensor(),
            torchvision.transforms.Normalize(mean, std),
        ]
    )
    dataset = fout.TorchImageDataset(
        image_paths, sample_ids=sample_ids, transform=transforms
    )
    return DataLoader(dataset, batch_size=batch_size, num_workers=4)


def predict(model, imgs):
    logits = model(imgs).detach().cpu().numpy()
    predictions = np.argmax(logits, axis=1)
    odds = np.exp(logits)
    confidences = np.max(odds, axis=1) / np.sum(odds, axis=1)
    return predictions, confidences


#
# Load a model
#
# Model performance numbers are available at:
#   https://github.com/huyvnphan/PyTorch_CIFAR10
#

model = resnet50(pretrained=True)
model_name = "resnet50"

#
# Extract a few images to process
#

num_samples = 25
batch_size = 5

view = dataset.take(num_samples)

image_paths, sample_ids = zip(*[(s.filepath, s.id) for s in view])
data_loader = make_cifar10_data_loader(image_paths, sample_ids, batch_size)

#
# Perform prediction and store results in dataset
#

for imgs, sample_ids in data_loader:
    predictions, confidences = predict(model, imgs)

    # Add predictions to your FiftyOne dataset
    for sample_id, prediction, confidence in zip(
        sample_ids, predictions, confidences
    ):
        sample = dataset[sample_id]
        sample[model_name] = fo.Classification(
            label=classes[prediction],
            confidence=confidence,
        )
        sample.save()

#
# Print the last batch of samples for which we added predictions
#

last_batch = dataset.select(sample_ids)
print(last_batch.head(batch_size))

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454e41',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/001298.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'bird', 'confidence': None, 'logits': None}>,
    'resnet50': <Classification: {'label': 'bird', 'confidence': 0.52449214, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd82dfebeb022f45546e',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/002879.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'cat', 'confidence': None, 'logits': None}>,
    'resnet50': <Classification: {'label': 'dog', 'confidence': 0.54008263, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd82dfebeb022f4555e1',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/003250.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth

In [12]:
#
# Get all samples for which we added predictions, in reverse order of
# confidence
#
pred_view = (dataset
    .exists(model_name)
    .sort_by("%s.confidence" % model_name, reverse=True)
)

print("Number of samples: %s\n" % len(pred_view))
print(pred_view.head())

Number of samples: 25

<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd82dfebeb022f4555f5',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/003270.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'horse', 'confidence': None, 'logits': None}>,
    'resnet50': <Classification: {'label': 'horse', 'confidence': 0.8297524, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd84dfebeb022f457025',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/009974.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'airplane', 'confidence': None, 'logits': None}>,
    'resnet50': <Classification: {'label': 'airplane', 'confidence': 0.7975642, 'logits': None}>,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd83dfebeb022f4560e9',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/006074.jpg',
    'tags': BaseList([]),
    'm

## Using the FiftyOne App

FiftyOne provides a powerful App that allows you easily visualize,
explore, search, filter, your datasets.

You can explore the App interactively through the GUI, and you can even
interact with it in real-time from your Python interpreter!

In [13]:
# Launch the FiftyOne App
session = fo.launch_app()

# Open your dataset in the App
session.dataset = dataset

App launched


![dataset](images/inference_1.png)

In [14]:
# Show five random samples in the App
session.view = dataset.take(5)

![limit](images/inference_2.png)

In [27]:
# Show the samples for which we added predictions above
session.view = pred_view

![pred-view](images/inference_3.png)

In [16]:
# Show the full dataset again
session.view = None

![selected](images/inference_4.png)

You can select images in the App by clicking on them. Then, you can hop back over to the library and make a view that contains those samples!

In [17]:
# Make a view containing the currently selected samples in the App
selected_view = dataset.select(session.selected)

# Print details about the selected samples
print(selected_view)
print(selected_view.head())

Dataset:        my-dataset
Num samples:    3
Tags:           []
Sample fields:
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    resnet50:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
Pipeline stages:
    1. Select(sample_ids=['5f21dd81dfebeb022f454930', '5f21dd81dfebeb022f454932', '5f21dd81dfebeb022f454931'])
<Sample: {
    'dataset_name': 'my-dataset',
    'id': '5f21dd81dfebeb022f454930',
    'filepath': '/Users/Brian/fiftyone/cifar10/test/data/000001.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'ground_truth': <Classification: {'label': 'cat', 'confidence': None, 'logits': None}>,
    'resnet50': None,
}>
<Sample: {
    'dataset_name': 'my-dataset',
    'i