Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why provide a BatchMetaDataLoader if meta-sets have the same API as normal pytorch data-sets? #76

Closed
renesax14 opened this issue Jul 10, 2020 · 1 comment

Comments

@renesax14
Copy link

I was reading the very helpful paper for the library and saw this paragraph that confused me wrt the implementation decisions/how the library works/usage:

2.4 Meta Data-loaders
The objects presented in Sections 2.1 & 2.2 can be iterated over to generate datasets from the meta- training set; these datasets are PyTorch Dataset objects, and as such can be included as part of any standard data pipeline (combined with DataLoader). Nonetheless, most meta-learning algorithms operate better on batches of tasks. Similar to how examples are batched together with DataLoader in PyTorch, Torchmeta exposes a MetaDalaoader that can produce batches of tasks when iterated over.

In particular it says that the meta-sets (wether of type/inherits CombinationMetaDataset or a MetaDataset) are just normal pytorch data-sets.
If they have the same API as normal pytorch datasets, then why not just always pass them directly to the standard pytorch dataloaders? Why at all provide the interface:

dataloader = torchmeta.utils.data.BatchMetaDataLoader(dataset, batch_size=16)

I think a comment about this somewhere (probably in the paper would be good).

I've of course the paper (twice now) and I hope I didn't miss this detail if it was mentioned.

@tristandeleu
Copy link
Owner

BatchMetaDataLoader is just syntactic sugar for torch.utils.data.DataLoader, with a special collate function and sampler. The reason why you'd want to use BatchMetaDataLoader for Torchmeta's datasets over torch.utils.data.DataLoader is because the defaults for torch.utils.data.DataLoader were made specifically for standard supervised learning, and not episodes as is necessary in meta-learning: elements of Torchmeta's datasets are indexed with tuples of classes, as opposed to integers for standard PyTorch datasets.

To summarize, the datasets are indeed normal PyTorch datasets, as in they have the same API, but they are using a different indexing which requires different functions in the DataLoader. This could be made more explicit in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants