-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset: "Shuffle" doesn't work #13446
Comments
Can you describe the problem? From the shuffled output, it does seem to be working as intended. (Perhaps you need to pass a different seed for each epoch? When you call |
thanks a lot for your instant response! |
Right, it is trying very hard to be reproducible, but there's no indication in the code of when it should reshuffle. (Consider that if it did reshuffle each iteration, you wouldn't be able to reproduce the same sequence within the same session.) If you're using def input_pipeline(filenames, batch_size, seed=None):
# Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data.
dataset = (tf.contrib.data.TextLineDataset(filenames)
.map(lambda line: tf.decode_csv(
line, record_defaults=[['1'], ['1'], ['1']], field_delim='-'))
.shuffle(buffer_size=10, seed=seed) # Equivalent to min_after_dequeue=10.
.batch(batch_size))
# Return an *initializable* iterator over the dataset, which will allow us to
# re-initialize it at the beginning of each epoch.
return dataset.make_initializable_iterator()
filenames=['1.txt']
batch_size = 3
num_epochs = 3
seed = tf.placeholder(tf.int64, shape=())
iterator = input_pipeline(filenames, batch_size, seed)
# `a1`, `a2`, and `a3` represent the next element to be retrieved from the iterator.
a1, a2, a3 = iterator.get_next()
with tf.Session() as sess:
for epoch in range(num_epochs):
# Resets the iterator at the beginning of an epoch.
sess.run(iterator.initializer, feed_dict={seed: epoch})
print('epoch:%d' % (epoch))
try:
while True:
a, b, c = sess.run([a1, a2, a3])
print(a, b, c)
except tf.errors.OutOfRangeError:
# This will be raised when you reach the end of an epoch (i.e. the
# iterator has no more elements).
pass |
I appreciate the solution you suggested. Do you think, is there any more elegant way? |
I think the workaround is about as elegant as it will get with the current API. (One could imagine adding something like per- |
System information
Build label: 0.5.4
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Aug 25 10:00:00 2017 (1503655200)
Build timestamp: 1503655200
Build timestamp as int: 1503655200
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
== tensorflow import ============================================
tf.VERSION = 1.3.0
tf.GIT_VERSION = b'v1.3.0-rc1-2408-ge9d5ee1'
tf.COMPILER_VERSION = b'v1.3.0-rc1-2408-ge9d5ee1'
Describe the problem
"Shuffle" from Dataset doesn't work.
Source code / logs
The following files can be used to reproduce problem.
The corresponding file: "1.txt"
The output:
The text was updated successfully, but these errors were encountered: