Update iterate.py #762

ghost · 2018-07-31T08:30:21Z

Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples

Checklist

I've tested that my changes are compatible with the latest version of Tensorflow.
I've read the Contribution Guidelines
I've updated the documentation if necessary.

Motivation and Context

Description

Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples

DEKHTIARJonathan · 2018-07-31T08:31:46Z

Please update the Changelog with your changes.

DEKHTIARJonathan · 2018-07-31T10:15:28Z

I have tested this PR it works. However, I would like to have your opinion before merging @zsdonghao @luomai @lgarithm

Let's consider this simple script:

import numpy as np
import tensorlayer as tl

data = np.random.random((1050, 100))
y = np.random.random((1050, ))

i = 0
total_data = 0

for batch in tl.iterate.minibatches(data, y, batch_size=100, shuffle=True):
    print("Batch ID: %d - Batch Size: %d" % (i, batch[0].shape[0]))
    i += 1

    total_data += batch[0].shape[0]

print("Total Data: %d" % total_data)

Output with current TensorLayer

Batch ID: 0 - Batch Size: 100
Batch ID: 1 - Batch Size: 100
Batch ID: 2 - Batch Size: 100
Batch ID: 3 - Batch Size: 100
Batch ID: 4 - Batch Size: 100
Batch ID: 5 - Batch Size: 100
Batch ID: 6 - Batch Size: 100
Batch ID: 7 - Batch Size: 100
Batch ID: 8 - Batch Size: 100
Batch ID: 9 - Batch Size: 100
Total Data: 1000

Output with this PR work

Batch ID: 0 - Batch Size: 100
Batch ID: 1 - Batch Size: 100
Batch ID: 2 - Batch Size: 100
Batch ID: 3 - Batch Size: 100
Batch ID: 4 - Batch Size: 100
Batch ID: 5 - Batch Size: 100
Batch ID: 6 - Batch Size: 100
Batch ID: 7 - Batch Size: 100
Batch ID: 8 - Batch Size: 100
Batch ID: 9 - Batch Size: 100
Batch ID: 10 - Batch Size: 50
Total Data: 1050

My question

Usually people don't really care if they loose a small number of samples (dataset is very large) and the dataset should be shuffle at the beginning of each epoch.
Is it actually a good thing to enforce a smaller batch at the end (potentially of size 1) if the number of samples is not a multiple of batch_size ?

I believe, and might be wrong, that the version currently in Tensorlayer is more robust than the version proposed in this PR and more standard practice in Deep Learning. But I genuinely have doubts ...

I'm actually puzzled with this situation, what you think is best ?

zsdonghao · 2018-07-31T11:03:17Z

Returning a different batch size may lead to error when users set a fixed batch size in the input placeholder (many people do that),
soI think we should add an argument (e.g. is_dynamic_batch_size etc) in iterate API, and set it to False by default.

DEKHTIARJonathan · 2018-07-31T11:04:37Z

@aloooha can you reflect the changes suggested by @zsdonghao

add an argument 'is_dynamic_batch_size' in iterate API.

ghost · 2018-08-01T06:47:19Z

@DEKHTIARJonathan I have changed the code, added an option argument 'is_dynamic_batch_size'

* Update iterate.py Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples * Update CHANGELOG.md * Update VHANGELOG>md * Update CHANGELOG.md * Update iterate.py add an argument 'is_dynamic_batch_size' in iterate API. * Update iterate.py

Update iterate.py

89700c0

Handling the case where the number of samples is not a multiple of batch_size, avoiding wasting samples

aloooha added 3 commits July 31, 2018 17:17

Update CHANGELOG.md

053c08c

Update VHANGELOG>md

48e4abc

Update CHANGELOG.md

08c92cd

tensorlayer deleted a comment Jul 31, 2018

Jonathan DEKHTIAR and others added 2 commits July 31, 2018 16:45

Merge branch 'master' into master

0817e59

Update iterate.py

4a23178

add an argument 'is_dynamic_batch_size' in iterate API.

tensorlayer deleted a comment Aug 1, 2018

Update iterate.py

dfa2817

tensorlayer deleted a comment Aug 1, 2018

DEKHTIARJonathan approved these changes Aug 1, 2018

View reviewed changes

DEKHTIARJonathan merged commit 31e4c8e into tensorlayer:master Aug 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update iterate.py #762

Update iterate.py #762

Uh oh!

ghost commented Jul 31, 2018 •

edited by ghost

Loading

Uh oh!

DEKHTIARJonathan commented Jul 31, 2018

Uh oh!

DEKHTIARJonathan commented Jul 31, 2018 •

edited

Loading

Uh oh!

zsdonghao commented Jul 31, 2018

Uh oh!

DEKHTIARJonathan commented Jul 31, 2018

Uh oh!

ghost commented Aug 1, 2018

Uh oh!

Uh oh!

Update iterate.py #762

Update iterate.py #762

Uh oh!

Conversation

ghost commented Jul 31, 2018 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Motivation and Context

Description

Uh oh!

DEKHTIARJonathan commented Jul 31, 2018

Uh oh!

DEKHTIARJonathan commented Jul 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Output with current TensorLayer

Output with this PR work

My question

Uh oh!

zsdonghao commented Jul 31, 2018

Uh oh!

DEKHTIARJonathan commented Jul 31, 2018

Uh oh!

ghost commented Aug 1, 2018

Uh oh!

Uh oh!

ghost commented Jul 31, 2018 •

edited by ghost

Loading

DEKHTIARJonathan commented Jul 31, 2018 •

edited

Loading