You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using webdataset with pytorch-lightning, I discovered that if I pass dataloaders to pytorch-lightning as instances of MultiDataset, training will stall on epoch 0. Once I changed the dataloaders to be instances of torch.utils.data.DataLoader instead, the pytorch-lightning trainer behaved as expected.
Is MultiDataset supposed to completely replace torch.utils.data.DataLoader? If so, is there a way to make it work with pytorch-lightning?
The text was updated successfully, but these errors were encountered:
The DataLoader class is complex and has some problems, in particular when it comes to working with IterableDatasets. MultiDataset is an experimental class showing what DataLoader might be replaced with in the future.
Among other differences, MultiDataset handles splitting of samples among workers differently from DataLoader, and it also handles determining dataset length differently.
So, for now, you want to use DataLoader if your training framework requires it, but you will have to deal with the limitations in DataLoader for IterableDataset. On the other hand, MultiDataset is a good choice in containers (since it doesn't use shared memory) or if you want a simpler way of controlling the assignment of shards to processes.
Thanks for the explanation. That makes sense - So Multidataset is not currently intended to be a fully compatible drop-in replacement for torch.utils.data.DataLoader, but it rather provides an alternative that gets around the limitations of torch.utils.data.DataLoader.
Yes, they have different use cases. There is a strong desire to refactor DataLoader as well, but we have to take this one step at a time.
Another alternative to either DataLoader or MultiDatset that's in development is Tensorcom, which runs data loaders as explicit, separate processes, simplifying debugging and scaling; Tensorcom also supports RDMA and GPUdirect for very large scale, high performance training.
When using webdataset with pytorch-lightning, I discovered that if I pass dataloaders to pytorch-lightning as instances of MultiDataset, training will stall on epoch 0. Once I changed the dataloaders to be instances of torch.utils.data.DataLoader instead, the pytorch-lightning trainer behaved as expected.
Is MultiDataset supposed to completely replace torch.utils.data.DataLoader? If so, is there a way to make it work with pytorch-lightning?
The text was updated successfully, but these errors were encountered: