Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fields, expressions, filters, matching, oh my! #290

Merged
merged 62 commits into from
Jul 24, 2020
Merged

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Jul 22, 2020

Inspired by @tylerganter's great work in #282 and #284, I'd like to propose this alternative implementation to implement fields, expressions, filtering, sorting, etc.

The two key differences are:
(1) Rather than encapsulating the operations in classes, we encapsulate the field to which the operation is being applied. This allows us to use standard operator overloading to achieve a pretty clean syntax for building list filters
(2) Implements the filtering as a standard pipeline stage so that it can be included anywhere within a pipeline, as desired

I like the syntax of (1), but, regardless of whether we go with (1) or #282, I would definitely like to adopt (2) because it leverages the full power of pipelines

Example use:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()

dataset.add_samples(
    [
        fo.Sample(
            filepath="filepath1.jpg",
            test_dets=fo.Detections(
                detections=[
                    fo.Detection(
                        label="friend",
                        confidence=0.9,
                        bounding_box=[0, 0, 0.5, 0.5],
                    ),
                    fo.Detection(
                        label="friend",
                        confidence=0.3,
                        bounding_box=[0.25, 0, 0.5, 0.1],
                    ),
                    fo.Detection(
                        label="stopper",
                        confidence=0.1,
                        bounding_box=[0, 0, 0.5, 0.5],
                    ),
                    fo.Detection(
                        label="big bro",
                        confidence=0.6,
                        bounding_box=[0, 0, 0.1, 0.5],
                    ),
                ]
            ),
        ),
        fo.Sample(
            filepath="filepath2.jpg",
            test_dets=fo.Detections(
                detections=[
                    fo.Detection(
                        label="friend",
                        confidence=0.99,
                        bounding_box=[0, 0, 1, 1],
                    ),
                    fo.Detection(
                        label="tricam",
                        confidence=0.2,
                        bounding_box=[0, 0, 0.5, 0.5],
                    ),
                    fo.Detection(
                        label="hex",
                        confidence=0.8,
                        bounding_box=[0.35, 0, 0.2, 0.25],
                    ),
                ]
            ),
        ),
    ]
)

confident_friends_view = (
    dataset.view()
    .exists("test_dets")
    .list_filter(
        "test_dets.detections",
        (F("label") == "friend") & (F("confidence") > 0.5)
    )
)

print("\nConfident friends:")
print(confident_friends_view)
for sample in confident_friends_view:
    print(sample)

big_confident_friends_view = (
    dataset.view()
    .exists("test_dets")
    .list_filter(
        "test_dets.detections",
        (F("label") == "friend") & (F("confidence") > 0.5)
    )
    .list_filter(
        "test_dets.detections",
        F("bounding_box")[2] * F("bounding_box")[3] >= 0.5,  # area of the bbox!
    )
)

print("\nBig confident friends:")
print(big_confident_friends_view)
for sample in big_confident_friends_view:
    print(sample)
Confident friends:

Dataset:        2020.07.22.13.00.12.838380
Num samples:    2
Tags:           []
Sample fields:
    filepath:  fiftyone.core.fields.StringField
    tags:      fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    test_dets: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Pipeline stages:
    1. Exists(field='test_dets')
    2. ListFilter(field='test_dets.detections', filter={'$and': [{'$...ence', 0.5]}]})

<Sample: {
    'dataset_name': '2020.07.22.13.00.12.838380',
    'id': '5f18709c5d154bab96aabeff',
    'filepath': None,
    'tags': BaseList([]),
    'metadata': None,
    'test_dets': <Detections: {
        'detections': BaseList([
            <Detection: {
                'label': 'friend',
                'bounding_box': array([0. , 0. , 0.5, 0.5]),
                'confidence': 0.9,
                'attributes': BaseDict({}),
            }>,
        ]),
    }>,
}>
<Sample: {
    'dataset_name': '2020.07.22.13.00.12.838380',
    'id': '5f18709c5d154bab96aabf00',
    'filepath': None,
    'tags': BaseList([]),
    'metadata': None,
    'test_dets': <Detections: {
        'detections': BaseList([
            <Detection: {
                'label': 'friend',
                'bounding_box': array([0, 0, 1, 1]),
                'confidence': 0.99,
                'attributes': BaseDict({}),
            }>,
        ]),
    }>,
}>


Big confident friends:

Dataset:        2020.07.22.13.00.12.838380
Num samples:    2
Tags:           []
Sample fields:
    filepath:  fiftyone.core.fields.StringField
    tags:      fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    test_dets: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Pipeline stages:
    1. Exists(field='test_dets')
    2. ListFilter(field='test_dets.detections', filter={'$and': [{'$...ence', 0.5]}]})
    3. ListFilter(field='test_dets.detections', filter={
    '$gte':... 0.5,
    ],
})

<Sample: {
    'dataset_name': '2020.07.22.13.00.12.838380',
    'id': '5f18709c5d154bab96aabeff',
    'filepath': None,
    'tags': BaseList([]),
    'metadata': None,
    'test_dets': <Detections: {'detections': BaseList([])}>,
}>
<Sample: {
    'dataset_name': '2020.07.22.13.00.12.838380',
    'id': '5f18709c5d154bab96aabf00',
    'filepath': None,
    'tags': BaseList([]),
    'metadata': None,
    'test_dets': <Detections: {
        'detections': BaseList([
            <Detection: {
                'label': 'friend',
                'bounding_box': array([0, 0, 1, 1]),
                'confidence': 0.99,
                'attributes': BaseDict({}),
            }>,
        ]),
    }>,
}>

AND... ONE LAST THING

The F syntax works with DatasetView.match() too!

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()

dataset.add_samples(
    [
        fo.Sample(filepath="filepath1.jpg", age=10),
        fo.Sample(filepath="filepath2.jpg", age=15),
        fo.Sample(filepath="filepath3.jpg", age=20),
        fo.Sample(filepath="filepath4.jpg", age=25),
        fo.Sample(filepath="filepath5.jpg", age=30),
        fo.Sample(filepath="filepath6.jpg", age=35),
    ]
)

view = dataset.view().match(F("age") > 20)
print(view)
for sample in view:
    print(sample)
Dataset:        2020.07.22.13.00.12.995539
Num samples:    3
Tags:           []
Sample fields:
    filepath: fiftyone.core.fields.StringField
    tags:     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    age:      fiftyone.core.fields.IntField
Pipeline stages:
    1. Match(filter={'$gt': ['$age', 20]})

<Sample: {
    'dataset_name': '2020.07.22.13.00.12.995539',
    'id': '5f18709d5d154bab96aabf05',
    'filepath': 'filepath4.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'age': 25,
}>
<Sample: {
    'dataset_name': '2020.07.22.13.00.12.995539',
    'id': '5f18709d5d154bab96aabf06',
    'filepath': 'filepath5.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'age': 30,
}>
<Sample: {
    'dataset_name': '2020.07.22.13.00.12.995539',
    'id': '5f18709d5d154bab96aabf07',
    'filepath': 'filepath6.jpg',
    'tags': BaseList([]),
    'metadata': None,
    'age': 35,
}>

@brimoor brimoor added the feature Work on a feature request label Jul 22, 2020
@brimoor brimoor requested review from tylerganter, ehofesmann and a team July 22, 2020 15:34
@brimoor brimoor self-assigned this Jul 22, 2020
@tylerganter
Copy link
Contributor

Wow! Just wow!

First impressions:

  • Making it a standard view stage is the right way to go (simpler than the logic stuff and more consistent with what we already know and love)
  • Overloaded operators is AWESOME
  • I would call it F instead of f 😉 (In all seriousness when I've seen other libraries do these single-letter functions they are generally capitalized, such as Q for query
  • Similar to how MatchTags wraps Match, we can have:
    • DetectionsFilter/ClassificationsFilter wrap ListFilter
    • FilterBBoxArea wrap DetectionsFilter
    • I'll be happy to implement this!

@ehofesmann
Copy link
Member

Yes! I had a similar thought when looking at Tyler's PR but I couldn't think of a good way of implementing it but I like this! I think having standard python syntax like == and * are really powerful. Great work!

@brimoor
Copy link
Contributor Author

brimoor commented Jul 22, 2020

UPDATE

Usage with DatasetView.match() is now working too!

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()

dataset.add_samples(
    [
        fo.Sample(filepath="filepath1.jpg", age=10),
        fo.Sample(filepath="filepath2.jpg", age=15),
        fo.Sample(filepath="filepath3.jpg", age=20),
        fo.Sample(filepath="filepath4.jpg", age=25),
        fo.Sample(filepath="filepath5.jpg", age=30),
        fo.Sample(filepath="filepath6.jpg", age=35),
    ]
)

view = dataset.view().match(F("age") > 20)
print(view)
Dataset:        2020.07.22.12.54.21
Num samples:    3
Tags:           []
Sample fields:
    filepath: fiftyone.core.fields.StringField
    tags:     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    age:      fiftyone.core.fields.IntField
Pipeline stages:
    1. Match(filter={'$gt': ['$age', 20]})

@brimoor
Copy link
Contributor Author

brimoor commented Jul 22, 2020

Note that, unfortunately, in has to be implemented as F("label").is_in([1, 2, 3]) rather than F("label") in [1, 2, 3] because __contains__ is a method of the RHS of in, not the LHS

Copy link
Contributor

@tylerganter tylerganter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to just start reviewing PRs by making new PRs. thats my new thing

@brimoor brimoor merged commit 7f2bd2c into develop Jul 24, 2020
@brimoor brimoor deleted the detection-filter2 branch July 24, 2020 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Work on a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants