Add PyTorch dataloader #25

jhamman · 2021-08-12T00:24:24Z

This PR is an an initial attempt at adding a set of data loaders. It makes some changes to the base BatchGenerator class and adds a new loaders.torch module to test building real life data loaders.

jhamman · 2021-08-12T00:26:19Z

xbatcher/generators.py

+    def _gen_batches(self) -> dict:
+        # in the future, we will want to do the batch generation lazily
+        # going the eager route for now is allowing me to fill out the loader api
+        # but it is likely to perform poorly.


Flagging this as something so discuss / work out a design for. It feels quite important that we are able to generate arbitrary batches on the fly. The current implementation eagerly generates batches which will not scale well. However, the pure generator approach doesn't work if you need to randomly access batches (eg via getitem).

jhamman · 2021-10-09T15:48:38Z

xbatcher/loaders/torch.py

+        # TODO: figure out the dataset -> array workflow
+        # currently hardcoding a variable name
+        X_batch = self.X_generator[idx]['x'].torch.to_tensor()
+        y_batch = self.y_generator[idx]['y'].torch.to_tensor()


flagging that we can't use named tensors here while we wait for pytorch/pytorch#29010

codecov · 2022-02-23T17:03:46Z

Codecov Report

Merging #25 (8bcd870) into main (802bbd5) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main       #25   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         3    +1     
  Lines           77       134   +57     
  Branches        18        30   +12     
=========================================
+ Hits            77       134   +57

Impacted Files	Coverage Δ
xbatcher/accessors.py	`100.00% <100.00%> (ø)`
xbatcher/generators.py	`100.00% <100.00%> (ø)`
xbatcher/loaders/torch.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 802bbd5...8bcd870. Read the comment docs.

ghiggi · 2022-02-25T15:28:09Z

Hi @jhamman. @djhoese told me to get in contact/involved with you to share some of my previous work related to a pytorch DataLoader designed specifically for the dataloading of spatio-temporal data stored in Zarr.

Our implementation is targeted to the development of autoregressive forecasting models, where we differentiate between 3 sources of data:

dynamic data: which varies over time, are the target prediction, and can be reinjected as input into the next model iterations
boundary condition data: which are injected into models at each time step and are known a priori (i.e. top-of-the-atmosphere solar radiation as a function of time) or can be computed as a function of previous predictions
static data: which are invariant to time

Our implementation does not currently sample spatial patches although we plan to work on that in the coming months, and we plan to formalize everything in a xforecasting library.

The proof-of-concept is available here

We also plan to design a sort of xscaler library to preprocess nD-tensor à la scikitlearn.scaler fashion ;)

Cheers

[loaders refactor] initial commit

1c5febf

jhamman commented Aug 12, 2021

View reviewed changes

Joseph Hamman added 3 commits August 11, 2021 17:28

add torch to dev environment

94519b3

fix mypy checks

04480ba

add torch accessor

6104bf3

jhamman commented Oct 9, 2021

View reviewed changes

Merge branch 'main' into loader/torch

ed8aa66

Joseph Hamman added 4 commits February 23, 2022 09:05

lint

86c8560

additional test coverage for torch loaders

2bbf2df

update pre-commit

69909b4

update docs

8bcd870

jhamman merged commit 3af1306 into xarray-contrib:main Feb 23, 2022

djhoese mentioned this pull request Feb 24, 2022

Add documentation/examples for new data loaders and help with use case #52

Open

weiji14 mentioned this pull request Apr 27, 2022

Integration with Hugging Face Datasets #60

Open

maxrjones added the feature label Oct 7, 2022

maxrjones changed the title ~~Add pytorch dataloader~~ Add PyTorch dataloader Oct 17, 2022

maxrjones mentioned this pull request Nov 2, 2022

Batch generation with batch_dims in v0.2.0 is about 10-20times slower #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PyTorch dataloader #25

Add PyTorch dataloader #25

jhamman commented Aug 12, 2021

jhamman Aug 12, 2021 •

edited

Loading

jhamman Oct 9, 2021

codecov bot commented Feb 23, 2022 •

edited

Loading

ghiggi commented Feb 25, 2022

Add PyTorch dataloader #25

Add PyTorch dataloader #25

Conversation

jhamman commented Aug 12, 2021

jhamman Aug 12, 2021 • edited Loading

Choose a reason for hiding this comment

jhamman Oct 9, 2021

Choose a reason for hiding this comment

codecov bot commented Feb 23, 2022 • edited Loading

Codecov Report

ghiggi commented Feb 25, 2022

jhamman Aug 12, 2021 •

edited

Loading

codecov bot commented Feb 23, 2022 •

edited

Loading