Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Annotation schema attribute definitions not filtering when defined on classes #1748

Closed
4 tasks
geoffrp opened this issue May 13, 2022 · 4 comments · Fixed by #1973
Closed
4 tasks

[BUG] Annotation schema attribute definitions not filtering when defined on classes #1748

geoffrp opened this issue May 13, 2022 · 4 comments · Fixed by #1973
Labels
bug Bug fixes

Comments

@geoffrp
Copy link
Contributor

geoffrp commented May 13, 2022

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04.3 LTS
  • FiftyOne installed from (pip or source): pip
  • FiftyOne version (run fiftyone --version): v0.15.1
  • Python version: Python 3.9.10

Commands to reproduce

When submitting a schema, attributes are not filtered if attributes are only attached to classes. For example, if a sample of object1 had fields of condition1, condition2, and other_field, the following schema would allow other_field into CVAT annotations:

        "type": "detections",
        "classes": [
            "object0",
            {
                "classes": ["object1"],
                "attributes": {
                    "condition1": {
                        "type": "checkbox",
                        "default": False,
                    },
                    "condition2": {
                        "type": "checkbox",
                        "default": False,
                    },
                },
            }
     ]
}

This schema would block the other_field field from going to annotations:

        "type": "detections",
        "classes": [
            "object0",
            {
                "classes": ["object1"],
                "attributes": {
                    "condition1": {
                        "type": "checkbox",
                        "default": False,
                    },
                    "condition2": {
                        "type": "checkbox",
                        "default": False,
                    },
                },
            }
     ],
     "attributes": {"test": {"type": "checkbox", "default": "False"}}
}

Describe the problem

Schemas need not define global attributes, they can also be defined class by class.

Code to reproduce issue

import fiftyone as fo
import fiftyone.zoo as foz
import tempfile
import numpy as np
from PIL import Image

USERNAME = ...
PASSWORD = ...

# Attributes are only defined by class, `bad` will go through despite not being listed
schema = {
    "test_field": {
        "type": "classification",
        "classes": [
            {
                "classes": ["test"],
                "attributes": {
                    "good": {
                        "type": "checkbox",
                        "default": True,
                    }
                },
            }
        ],
    }
}

# Include an attribute dict to force filtering in _get_attributes
hacky_schema =  {
    "test_field": {
        "type": "classification",
        "classes": [
            {
                "classes": ["test"],
                "attributes": {
                    "good": {
                        "type": "checkbox",
                        "default": True,
                    }
                },
            }
        ],
        'attributes': {
            "another_one": {
                        "type": "checkbox",
                        "default": True,
                    }
        }
    }
}

with tempfile.TemporaryDirectory() as tmpdir:
    # Generate images
    for i in range(3):
        imarray = np.random.rand(100, 100, 3) * 255
        im = Image.fromarray(imarray.astype("uint8")).convert("RGBA")
        im.save(f"{tmpdir}/temp{i}.png")
    # Create a dataset
    dataset = fo.Dataset.from_images_dir(tmpdir)

    # Assign some bogus classification tags with attributes
    for sample in dataset:
        sample["test_field"] = fo.Classification(label="test")
        # Assign a desirable and undesirable attribute to the classification
        sample["test_field"]["good"] = True
        sample["test_field"]["bad"] = ":("
        sample.save()
    dataset.save()

    # Annotate and go to CVAT to see result

    dataset.annotate(
        "test_annotation",
        # Swap schemas here
        label_schema=hacky_schema,
        backend="cvat",
        url="https://cvat.org/",
        username=USERNAME,
        password=PASSWORD,
    )

What areas of FiftyOne does this bug affect?

  • App: FiftyOne application issue
  • [x ] Core: Core fiftyone Python library issue
  • Server: Fiftyone server issue

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another
member of your organization be willing to contribute a fix for this bug to the
FiftyOne codebase?

  • Yes. I can contribute a fix for this bug independently.
  • [x ] Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community.
  • No. I cannot contribute a bug fix at this time.
@geoffrp geoffrp added the bug Bug fixes label May 13, 2022
@geoffrp
Copy link
Contributor Author

geoffrp commented May 13, 2022

It looks like the problem lies at:

fiftyone/utils/annotations.py:801

    if "attributes" in label_info:
        attributes = label_info["attributes"]

If attributes isn't included, we iterate over all fields, adding those that aren't None. Maybe classes must be populated to give information on how to parse class-specific attributes?

@ehofesmann
Copy link
Member

Thanks for bringing up this issue and the reproduction code @geoffrp!
To recap there are three different ways that you can define the attributes being annotated in order or priority:

  1. For a single class in one field in your schema, you can provide explicit attributes that only apply to that one class.
schema = {
    "test_field": {
        "type": "classification",
        "classes": [
            {
                "classes": ["test"],
                "attributes": {
                    "good": {
                        "type": "checkbox",
                        "default": True,
                    }
                },
            }
        ],
    }
}
  1. For all classes of a specific field in your schema, you can provide the attributes directly under the label field to apply to all classes in that field, like your hacky schema.
hacky_schema =  {
    "test_field": {
        "type": "classification",
        "classes": [
            {
                "classes": ["test"],
                "attributes": {
                    "good": {
                        "type": "checkbox",
                        "default": True,
                    }
                },
            }
        ],
        'attributes': {
            "another_one": {
                        "type": "checkbox",
                        "default": True,
                    }
        }
    }
}
  1. For all label fields, you can provide a value to the attributes keyword argument when calling dataset.annotate(). For existing fields, this can either be a boolean to load all or no attributes, or a dictionary/list defining the specific attributes to load. This will then be applied to all label fields annotated in that run.
results = dataset.annotate(anno_key, label_schema=label_schema, attributes=True)

So the easiest way to currently do what you want is to define the class-specific attributes in the way that you have, and then add the attributes=False argument when calling dataset.annotate() to prevent all other attributes from being loaded. This is necessary since by default, attributes=True so all existing attributes will always be loaded no matter what you define within the label schema.

There are a few options going forward:

  1. Set attributes=False by default so that no attributes are loaded unless specified. I vote against this since a common workflow is to just provide a label field directly to dataset.annotate(), and you would expect everything to be loaded in that case.

  2. Update the code so that attributes=True still, but if you provide class- or field-level attributes, then it will internally set attributes=False. There is also a problem with this approach since a supported workflow should be to allow defining NEW attributes class- or field-wise and then still letting attributes=True to automatically load the remaining existing attributes.

schema = {
    "test_field": {
        "type": "classification",
        "classes": [
            {
                "classes": ["test"],
                "attributes": {
                    "new_attr": {
                        "type": "checkbox",
                        "default": True,
                    }
                },
            }
        ],
    }
}

results = dataset.annotate(anno_key, label_schema=schema, attributes=True)
# Annotate all existing attrs and `new_attr`
  1. Default attributes=None which has the properties of 2) where it will load all attributes unless any specific class- or field-level attributes are provided, but still allows the option to explicitly set attributes to True/False. The main downside of this is that it is more complex and likely harder to understand what is happening for new users, but I think this would probably be the best approach.

I'd be interested in hearing your thoughts @geoffrp and @brimoor

@brimoor
Copy link
Contributor

brimoor commented May 13, 2022

Since it is possible to get the behavior that @geoffrp wants by simply providing an additional parameter, I see an argument for option 4:

  1. Don't change any code, just update/clarify the docs to explain the implications of top-level attributes=True being the default behavior.

Side comment: +1 on @ehofesmann's comment that attributes should be included by default in simple syntaxes like:

dataset.annotate(anno_key, label_field="existing_field_with_attributes")

This is consistent with other I/O in the codebase, such as how there's an extra_attrs parameter that defaults to True on many exporters to automatically include everything known about an object unless otherwise requested.

@geoffrp
Copy link
Contributor Author

geoffrp commented May 13, 2022

@brimoor @ehofesmann I think that option 4 could work here, and re-reading the documentation, this line slipped by me:

label_schema (None): a dictionary defining the label schema to use. If this argument is provided, it takes precedence over label_field and label_type

The presentation in the documentation pushes label schema as a catch-all method of specifying the whole annotation task, but there are other parameters that gate certain functions. Maybe for some users, it is useful to have all attributes show up to contextualize the annotation, so having the open attributes=True could be default behavior. I think a little bit of clarification in the label schema section could make this more intuitive.

I'd be happy to submit a documentation update that might make this clearer for users in my use case.

kaixi-wang pushed a commit that referenced this issue May 11, 2023
* bson fix

* dataset 404s

* fix selection css and setting

* bumps

* release notes

* bump teams app

* 404 fix

* bump os compat

* Fix custom_parser implementation (get_label should return an instance of the label_cls)

* bump helm charts for 0.1.7 (#79)

* more get_field() fixes

* adding support for serializing aggregations

* adding serialization tests for aggregations

* linting

* adding __eq__ methods

* linting

* more unit testy

* bug fixes

* adding set_values() and exclude_fields() tests

* adding get_field() tests

* adding label ID tests

* reverting to v0.15.1 implementation

* fixing #1893

* id updates

* adding id back to repr

* fixing more ID bugs

* adding rel_dir option

* finishing implementation

* Fix bug when loading group ids in CVAT video tasks (#1917)

* catch tags in app aggregation data flow (#1924)

* adding weighted_sample() and balanced_sample() utils

* moving random_split() to random module

* updating docstrings

* bug

* adding new_ids option to add_collection()

* adding tests for add_collection()

* Resolve bug when uploading to projects in CVAT (#1926)

* proper handling of db fields in mixin methods

* proper handling of db fields in view stages/saving

* fixing clips media type bug

* removing duplicate method definition

* fixing edge case and adding tests

* fixing typo

* view refresh fix

* fix result alignment

* adding quantile aggregation

* adding unit tests for quantile aggregation

* bug fix

* handling string vs numeric

* adding bad values test

* adding quantiles to docs

* implementing kwargs

* relaxing numpy version requirement

* updating nan tests

* supporting serializing and deserializing of arbitrary ObjectId data

* handling SON serialization

* adding stats() method to sample collections

* move all auth0 Teams App configuration to env config

* add targets conversion to graphql datasets (#1943)

* use correct dict default

* fixing bug

* db field is no longer necessary

* fixing one part of #1945

* fixing #1945

* adding object ID tests

* removing default_classes usage

* removing outdated syntax

* adding missing docs

* raising informative error when trying to combine splits

* fixing frames bug

* testing ID fields/db fields

* make ObjectIdField handle str/ObjectId conversions

* validating sample_id field

* documenting stats() method

* tweaking langugage

* more robust implementation

* optimizing convert datasets method

* adding test to cover expected ObjectId behavior

* fixing ObjectId bugs in to_dict()/from_dict()

* unnecessary

* fixing set indexing bug

* unique

* only use patches selection mode for spatial label fields

* fixing bool bugs

* adding bool unit test

* adding numpy bool support

* added field name validation

* finishing implementation of ObjectIdField handling for DatasetViews

* fixing bug

* handling edge case

* proper handling of private fields when serializing collections

* adding dataset serialization tests

* updating unit test

* adding dynamic/BSON field tests

* tweaking todos

* fixing tests

* adding persistent option to clone()

* clarifying that add field methods are idempotent

* production defaults

* be more explicit

* removes helm chart from fiftyone-teams repo(#81)

* helm charts moved to fiftyone-teams-app-deploy repo

* removes deployment/helm from .gitignore

* removes helm chart from fiftyone-teams repo(#81)

* helm charts moved to fiftyone-teams-app-deploy repo

* removes deployment/helm from .gitignore

* adding missing brain call

* fix session views

* gracefully casting to numpy array

* updating documentation

* lifting plotly<5 requirement

* adding missing brain call

* update tabulate to 0.8.10 for source install

* Added clarification to tie attribute argument to label_schema (#1973)

* trying looser requirements

* making tags optional

* fixing bug

* documenting explanation of #1748 in all relevant places

* passing attributes through during label coercions

* linting

* fixing test

* adding field_names, iter_fields(), and merge() to SerializableDocument

* CVAT optimizations (#1944)

* avoid listing tasks at download time

* pass api into results methods

* document api optimization

* lint

* lint pro tip

* add api close context

* refactoring connect_to_api() into an interface concept

* refactoring connect_to_api() into an interface concept

* updating documentation

* cleanup

* moving context manager to AnnotationAPI class

Co-authored-by: brimoor <brimoor@umich.edu>
Co-authored-by: Brian Moore <brian@voxel51.com>

* noting that dataset-level metadata is excluded

* Label studio integration (#1848)

* add label studio integration for classification, detection and polyline

* handle export and import of other label types

* upload predictions to label studio for classification

* upload predictions for all label types

* update labelstudio docstrings and formatting

* store existing labels in id_map

* read api key from env or prompt, small fixes

* add supports attrs property

* add label studio integration for classification, detection and polyline

* handle export and import of other label types

* upload predictions to label studio for classification

* upload predictions for all label types

* update labelstudio docstrings and formatting

* store existing labels in id_map

* read api key from env or prompt, small fixes

* raise error if label studio version is below 1.5,
prompt for a specific label studio sdk version if not installed

* add warning about ignoring attributes

* add labelstudio docs

* fix tests

Co-authored-by: Eric Hofesmann <ehofesmann@gmail.com>

* adding support for writing transformed images to a new location

* adding support for writing transformed videos to a new location

* adding unique filename utils

* updating CLI

* Label studio updates (#2006)

* add label studio integration for classification, detection and polyline

* handle export and import of other label types

* upload predictions to label studio for classification

* upload predictions for all label types

* update labelstudio docstrings and formatting

* store existing labels in id_map

* read api key from env or prompt, small fixes

* add supports attrs property

* add label studio integration for classification, detection and polyline

* handle export and import of other label types

* upload predictions to label studio for classification

* upload predictions for all label types

* update labelstudio docstrings and formatting

* store existing labels in id_map

* read api key from env or prompt, small fixes

* raise error if label studio version is below 1.5,
prompt for a specific label studio sdk version if not installed

* add warning about ignoring attributes

* add labelstudio docs

* fix tests

* update docstrings

* add supports video flag to annotation backend

* update connect_to_api

* parse config parameters for labelstudio tests

* track label ids to merge properly

* support unsubmitted tasks

* note supported scalar types

* currently doesnt support classifications plural

* linting

* fixing typos

* linting docs

* linting

* one more linting pass

* final pass

* fix labelstudio tests

Co-authored-by: rusteam <rustemgal@gmail.com>
Co-authored-by: brimoor <brimoor@umich.edu>

* bumps, release notes

* add to release notes

* linting

* converting to public SaveContext class

* adding missing _save_replacements implementation

* adding public save_context() method

* adding logging configuration

* documenting

* finishing work

* generated views don't support save contexts

* setting default behavior

* compute ops in real-time

* adding ipywidgets<8 requirement

* documenting save contexts

* refactoring into a deferred=True option

* fixing docs warnings

* adding unit tests

* unnecessary

* adding more examples, standardizing default batch_size logic

* adding compatible DB versions

* using client version terminology

* making DatasetDocument dynamic

* optimizing get dataset version

* read path variable from dataset.yaml

* update docs for path in dataset.yaml

* tweaking docs

* tweaks

* avoid deserializing extra fields

* updating unit test

* using strict=False in more places

* more non-strict

* showing available logging levels

* adding normpath

* updating compatibility version

* documenting compatible versions

* fixing LegacyFiftyOneDataset import bug

* removes artifacts I should not have added (#88)

* clarifying

* oops

* updating pkg versions

* removing unnecessary pull

* removes helm chart from fiftyone-teams repo(#81)

* helm charts moved to fiftyone-teams-app-deploy repo

* removes deployment/helm from .gitignore

* removes artifacts I should not have added (#88)

* remove editable flag

* move App build to end

* updating release notes

* always migrate when user is admin

* moving legacy troubleshooting to another page

* adding docs on coordinating migrations

* adding helpful error when a dataset fails to load

* pinning max requirement

* using more cloud-friendly layout

* cloud-friendly updates

* adding upgrade note

* reverting premature version changes

* packages bumps

Co-authored-by: Benjamin Kane <ben@voxel51.com>
Co-authored-by: brimoor <brimoor@umich.edu>
Co-authored-by: Brian Moore <brian@voxel51.com>
Co-authored-by: idow09 <idow09@gmail.com>
Co-authored-by: Eric Hofesmann <ehofesmann@gmail.com>
Co-authored-by: imanjra <ibrahim@voxel51.com>
Co-authored-by: Geoffrey Keating <96071142+geoffrp@users.noreply.github.com>
Co-authored-by: Rustem Galiullin <rustemgal@gmail.com>
Co-authored-by: Victor Oancea <victor.oancea@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants