Datasets: always return sample = dict[str, Tensor] #2567

adamjstewart · 2025-02-08T13:03:03Z

This PR introduces a new Sample type alias defined as:

Sample = dict[str, Tensor]

All dataset __getitem__ methods return a Sample, and all transforms operate on a Sample.

This PR supersedes #2249 with a simpler approach (can be expanded later).

Pros

Compatibility with PyTorch's default_collate function
Compatibility with Lightning's transfer_batch_to_device
Uniform type hints across all TorchGeo subpackages
Avoids most uses of dynamic typing (typing.Any)
Allows arbitrary key names (unlike Add Sample and Batch TypedDicts #2249)

Cons

Datasets can no longer return non-Tensor types
Does not enforce standard key names (unlike Add Sample and Batch TypedDicts #2249)
Backwards-incompatible changes (removing CRS/bounds)

Closes #2249
See #985 for discussion

adamjstewart · 2025-02-10T10:16:41Z

@ashnair1 do object detection bounding boxes have to be list[Tensor] or can they be Tensor? I wonder if we can use some kind of packed/padded sequence logic.

ashnair1 · 2025-02-10T11:31:34Z

@ashnair1 do object detection bounding boxes have to be list[Tensor] or can they be Tensor? I wonder if we can use some kind of packed/padded sequence logic.

Object detection bboxes are usually list[Tensor]. This is because each image can have a varying number of objects e.g.

img1 -> [5, 4], 
img2 -> [15, 4], 
img3 -> [1, 4]

Not sure about packing/padding. I think given a batch, we could theoretically iterate over the box tensors, find the max and pad the rest accordingly. So above example would be

img1 -> [15, 4], # Was 5, padded to 15 
img2 -> [15, 4],  # 15 max
img3 -> [15, 4], # Was 1, padded to 15

But I don't think it's optimal.

adamjstewart · 2025-02-10T11:35:41Z

In order to create mini-batches, do we need to pad anyway? PyTorch has a PackedSequence class but it's still not a subclass of Tensor. We might be back to #2249 if we need to support dicts with multiple value types. But that doesn't support unknown keys...

adamjstewart · 2025-04-05T19:33:03Z

Main has already diverged a lot, let's wait until PEP 728 is approved before trying again.

Datasets: always return sample = dict[str, Tensor]

1ef55fd

adamjstewart added the backwards-incompatible Changes that are not backwards compatible label Feb 8, 2025

github-actions bot added datasets Geospatial or benchmark datasets models Models and pretrained weights testing Continuous integration testing trainers PyTorch Lightning trainers transforms Data augmentation transforms datamodules PyTorch Lightning datamodules labels Feb 8, 2025

Add more Sample

91f40f5

adamjstewart added this to the 0.7.0 milestone Feb 8, 2025

Undo false positives

02ea85b

github-actions bot removed the models Models and pretrained weights label Feb 9, 2025

hkristen mentioned this pull request Feb 24, 2025

Geosampler prechipping #2300

Open

adamjstewart removed this from the 0.7.0 milestone Mar 13, 2025

adamjstewart closed this Apr 5, 2025

adamjstewart marked this pull request as ready for review April 5, 2025 19:33

adamjstewart deleted the datasets/sample2 branch April 5, 2025 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Datasets: always return sample = dict[str, Tensor] #2567

Datasets: always return sample = dict[str, Tensor] #2567

Uh oh!

adamjstewart commented Feb 8, 2025 •

edited

Loading

Uh oh!

adamjstewart commented Feb 10, 2025

Uh oh!

ashnair1 commented Feb 10, 2025

Uh oh!

adamjstewart commented Feb 10, 2025

Uh oh!

adamjstewart commented Apr 5, 2025

Uh oh!

Uh oh!

Datasets: always return sample = dict[str, Tensor] #2567

Datasets: always return sample = dict[str, Tensor] #2567

Uh oh!

Conversation

adamjstewart commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pros

Cons

Uh oh!

adamjstewart commented Feb 10, 2025

Uh oh!

ashnair1 commented Feb 10, 2025

Uh oh!

adamjstewart commented Feb 10, 2025

Uh oh!

adamjstewart commented Apr 5, 2025

Uh oh!

Uh oh!

adamjstewart commented Feb 8, 2025 •

edited

Loading