Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SODA-A dataset #2575

Merged
merged 18 commits into from
Mar 12, 2025
Merged

Add SODA-A dataset #2575

merged 18 commits into from
Mar 12, 2025

Conversation

nilsleh
Copy link
Collaborator

@nilsleh nilsleh commented Feb 10, 2025

This PR adds the SODA-A dataset. Dataset rehosted on HF.

Dataset features:

* 2513 images
* 872,069 annotations with oriented bounding boxes
* 9 classes

Dataset format:

* Images are three channel .jpg files.
* Annotations are in json files

TODOS:

  • some annotations have more than just 8 coordinates, for example file Annotations/train/01874.json:
{
        {
            "poly": [
                1248.874059994201,
                2785.0,
                1267.2565270181412,
                2785.0,
                1290.0001220703125,
                2783.999755859375,
                1289.2877197265625,
                2767.801025390625,
                1248.1971435546875,
                2769.608154296875
            ],
            "area": 659.4863047840048,
            "category_id": 7,
            "image_id": 1874,
            "id": 43
        },

Example plot:
soda

@github-actions github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Feb 10, 2025
@nilsleh nilsleh added this to the 0.7.0 milestone Feb 10, 2025
@nilsleh nilsleh marked this pull request as draft February 10, 2025 21:02
@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 10, 2025

@shaunyuan22 Thank you for this nice dataset and all the work. We aim to make the dataset more easily usable in torchgeo, and would appreciate it if you have any comments, corrections etc.

@nilsleh
Copy link
Collaborator Author

nilsleh commented Feb 12, 2025

Open question how to deal with the polygons into a common oriented bounding box schema.

@nilsleh nilsleh marked this pull request as ready for review February 12, 2025 07:43
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked about oriented bboxes on Slack. We may need to add support to Kornia for this ourselves. Until Kornia supports it natively, I guess it doesn't matter what the format looks like. But let's use the same key names that Kornia uses.

@nilsleh
Copy link
Collaborator Author

nilsleh commented Mar 10, 2025

Instead of open-cv, as an additional dependency, I implemented the bounding box logic with shapely:

soda

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my remaining comments are optional pedantic things, this is 99% ready to merge.

array: np.typing.NDArray[np.int_] = np.array(img.convert('RGB'))
tensor: Tensor = torch.from_numpy(array)
# Convert from HxWxC to CxHxW
tensor = tensor.permute((2, 0, 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more clear if you use einops.rearrange here

@adamjstewart adamjstewart merged commit 9dfb126 into microsoft:main Mar 12, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants