New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to load data in multiple processes? #35
Comments
@gfjiangly The same problem with you. I think current version cannot work for multiple GPUs. The reason is that |
@gfjiangly @JerryLead The way I solve this problem is to shuffle and distribute TFRecord files to different GPUs evenly before each epoch. The problem is, how can to handle these 2 situations:
|
Is the
What can be done to resolve this error? |
@linkun-1998 Yes, IterableDataset does not has |
@DelightRun awesome, but how can you define a |
It's pretty straightforward when the index is available. You can just implement a new "RandomAccessMultiTFRecordDataset" class that inherits from torch.utils.data.Dataset and change the logic. PR are welcome. |
I tried to implement
I try to use
The implementation of the following training is as follows:
What could be the possible error? |
@linkun-1998 Due to company's compliance reason, I cannot upload the full code. This is the core part of class MultiTFRecordDataset(torch.utils.data.IterableDataset):
"""Parse multiple (generic) TFRecords datasets into an `IterableDataset`
object, which contain `np.ndarrays`s.
Params:
-------
data_pattern: str
Input data path pattern.
index_pattern: str or None
Input index path pattern.
splits: dict
Dictionary of (key, value) pairs, where the key is used to
construct the data and index path(s) and the value determines
the contribution of each split to the batch.
description: list or dict of str, optional, default=None
List of keys or dict of (key, value) pairs to extract from each
record. The keys represent the name of the features and the
values ("byte", "float", or "int") correspond to the data type.
If dtypes are provided, then they are verified against the
inferred type for compatibility purposes. If None (default),
then all features contained in the file are extracted.
is_sequence: bool, optional, default=False
TFRecord example type. Using tf.train.SequenceExample if
is_sequence=True, else tf.train.Example.
shuffle_queue_size: int, optional, default=None
Length of buffer. Determines how many records are queued to
sample from.
transform : a callable, default = None
A function that takes in the input `features` i.e the dict
provided in the description, transforms it and returns a
desirable output.
"""
def __init__(self,
data_pattern: str,
index_pattern: typing.Union[str, None],
splits: typing.Dict[str, float],
description: typing.Union[typing.List[str], typing.Dict[str, str], None] = None,
is_sequence: bool = False,
shuffle_queue_size: typing.Optional[int] = None,
transform: typing.Callable[[dict], typing.Any] = None) -> None:
super(MultiTFRecordDataset, self).__init__()
self.data_pattern = data_pattern
self.index_pattern = index_pattern
self.splits = splits
self.description = description
self.is_sequence = is_sequence
self.shuffle_queue_size = shuffle_queue_size
self.transform = transform
if self.index_pattern is not None:
self.num_samples = sum(
sum(1 for _ in open(self.index_pattern.format(split)))
for split in self.splits
)
else:
self.num_samples = None
def __len__(self):
if self.num_samples is not None:
return self.num_samples
else:
raise NotImplementedError()
def __iter__(self):
worker_info = torch.utils.data.get_worker_info()
if worker_info is not None:
shard = worker_info.id, worker_info.num_workers
np.random.seed(worker_info.seed % np.iinfo(np.uint32).max)
else:
shard = None
it = reader.multi_tfrecord_loader(
self.data_pattern, self.index_pattern, self.splits, self.description, self.is_sequence, shard)
if self.shuffle_queue_size:
it = iterator_utils.shuffle_iterator(it, self.shuffle_queue_size)
if self.transform:
it = map(self.transform, it)
return it |
@DelightRun You just implemented the |
@DelightRun Moreover I get the same error after adding a |
@linkun-1998 Were you able to solve it? |
does someone solve the this issue? |
How to load data in multiple processes?
The text was updated successfully, but these errors were encountered: