-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Update iterate.py #762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update iterate.py #762
Conversation
Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples
Please update the Changelog with your changes. |
I have tested this PR it works. However, I would like to have your opinion before merging @zsdonghao @luomai @lgarithm Let's consider this simple script: import numpy as np
import tensorlayer as tl
data = np.random.random((1050, 100))
y = np.random.random((1050, ))
i = 0
total_data = 0
for batch in tl.iterate.minibatches(data, y, batch_size=100, shuffle=True):
print("Batch ID: %d - Batch Size: %d" % (i, batch[0].shape[0]))
i += 1
total_data += batch[0].shape[0]
print("Total Data: %d" % total_data) Output with current TensorLayer
Output with this PR work
My questionUsually people don't really care if they loose a small number of samples (dataset is very large) and the dataset should be shuffle at the beginning of each epoch. I believe, and might be wrong, that the version currently in Tensorlayer is more robust than the version proposed in this PR and more standard practice in Deep Learning. But I genuinely have doubts ... I'm actually puzzled with this situation, what you think is best ? |
Returning a different batch size may lead to error when users set a fixed batch size in the input placeholder (many people do that), |
@aloooha can you reflect the changes suggested by @zsdonghao |
add an argument 'is_dynamic_batch_size' in iterate API.
@DEKHTIARJonathan I have changed the code, added an option argument 'is_dynamic_batch_size' |
* Update iterate.py Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples * Update CHANGELOG.md * Update VHANGELOG>md * Update CHANGELOG.md * Update iterate.py add an argument 'is_dynamic_batch_size' in iterate API. * Update iterate.py
Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples
Checklist
Motivation and Context
Description