-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PyTorch dataloader #25
Conversation
def _gen_batches(self) -> dict: | ||
# in the future, we will want to do the batch generation lazily | ||
# going the eager route for now is allowing me to fill out the loader api | ||
# but it is likely to perform poorly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging this as something so discuss / work out a design for. It feels quite important that we are able to generate arbitrary batches on the fly. The current implementation eagerly generates batches which will not scale well. However, the pure generator approach doesn't work if you need to randomly access batches (eg via getitem).
# TODO: figure out the dataset -> array workflow | ||
# currently hardcoding a variable name | ||
X_batch = self.X_generator[idx]['x'].torch.to_tensor() | ||
y_batch = self.y_generator[idx]['y'].torch.to_tensor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging that we can't use named tensors here while we wait for pytorch/pytorch#29010
Codecov Report
@@ Coverage Diff @@
## main #25 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 2 3 +1
Lines 77 134 +57
Branches 18 30 +12
=========================================
+ Hits 77 134 +57
Continue to review full report at Codecov.
|
Hi @jhamman. @djhoese told me to get in contact/involved with you to share some of my previous work related to a pytorch DataLoader designed specifically for the dataloading of spatio-temporal data stored in Zarr. Our implementation is targeted to the development of autoregressive forecasting models, where we differentiate between 3 sources of data:
Our implementation does not currently sample spatial patches although we plan to work on that in the coming months, and we plan to formalize everything in a The proof-of-concept is available here We also plan to design a sort of Cheers |
This PR is an an initial attempt at adding a set of data loaders. It makes some changes to the base BatchGenerator class and adds a new
loaders.torch
module to test building real life data loaders.