Petastorm sharding and setting batch sizes #785

Data-drone · 2022-12-22T01:26:45Z

With sharding in petastorm ie:


with peta_conv_train_df.make_torch_dataloader(transform_spec=transform_func,
                                              num_epochs=1,
                                              batch_size=test_batch_size,
                                              cur_shard = curr_shard,
                                              shard_count = num_shards,
                                              reader_pool_type = pool_type) as reader:

Is the batch_size what we want per GPU or for whole cluster. ie in the above if I had:

test_batch_size = 64 then each shard gets 64 or each shard gets 64 / num_shards?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Petastorm sharding and setting batch sizes #785

Petastorm sharding and setting batch sizes #785

Data-drone commented Dec 22, 2022

Petastorm sharding and setting batch sizes #785

Petastorm sharding and setting batch sizes #785

Comments

Data-drone commented Dec 22, 2022