New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random seed not set in graph context of Dataset#map
#29101
Comments
Dataset#map
Dataset#map
@eha11 I tried to reproduce the issue on my system and also on google colab but code executed as expected. Can you try once and let us know if that still an issue. Thanks! |
@gadagashwini as I wrote the example originally it wouldn't fail, just print out that the seed was None. I've simplified and updated the example to make it assert when the seed is |
A related but slightly different issue was posted in #13932. That particular issue is more about race conditions due to parallelism. The issue in this MR, is also reported in another MR's comment: #23789 (comment)
From gathering a few snippets, I have created the following test that shows the issue: import numpy as np
import tensorflow as tf
def test(threads):
tf.reset_default_graph()
np.random.seed(42)
tf.set_random_seed(42)
images = np.random.rand(100, 64, 64, 3).astype(np.float32)
def mapfn(p):
return tf.image.random_hue(p, 0.04)
dataset = tf.data.Dataset.from_tensor_slices(images)
dataset = dataset.map(mapfn, num_parallel_calls=threads)
dataset = dataset.batch(32)
x = dataset.make_one_shot_iterator().get_next()
with tf.Session() as sess:
return sess.run(x)
assert np.allclose(test(1), test(1)), "num_parallel_calls=1 undeterministic"
assert np.allclose(test(15), test(15)), "num_parallel_calls=15 undeterministic" num_parallel_calls == 1 failsAbove fails with
We can solve this first test with 1 parallel call in two ways: def mapfn(p):
- return tf.image.random_hue(p, 0.04)
+ return tf.image.random_hue(p, 0.04, seed=42) Or def mapfn(p):
+ tf.set_random_seed(42)
return tf.image.random_hue(p, 0.04) @mrry, Derek, It's unclear to me why num_parallel_calls > 1 failsAnd of course, after fixing the first test as above, the second one will fail with 15 parallel calls:
But that case is just a race condition and not necessarily a bug. This is what #13932 is all about. |
You are correct. I am able to reproduce the reported issue with Tensorflow 1.13. Thanks! |
@eha11 could you please see if the issue is present in 1.14? RC0 for 1.14 was released three days ago. I was not able to reproduce the issue locally running the bleeding edge of TensorFlow which hopefully means this issue has been indeed fixed. |
Hi There, We are checking to see if you still need help on this, as you are using an older version of tensorflow which is officially considered end of life . We recommend that you upgrade to the latest 2.x version and let us know if the issue still persists in newer versions. Please open a new issue for any help you need against 2.x, and we will get you the right help. This issue will be closed automatically 7 days from now. If you still need help with this issue, please provide us with more information. |
Nice robot. This post has nothing to do with TF1 specifically. |
Yes this was fixed at least by 1.15 so it can be closed. Hi @TimZaman ! |
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0:python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
The random seed set via
tf.set_random_seed(seed)
is not set in the context in which the functions passed totf.data.Dataset#map
are invoked. Even for the single thread case.Describe the expected behavior
The random seed set via
tf.set_random_seed(seed)
should be set in the context in which the functions passed totf.data.Dataset#map
are invoked, at least for the single thread case.Code to reproduce the issue
Can run here:
Seed in Dataset#Map.ipynb
Other info / logs
I originally saw this issue locally but was able to reproduce it on the Jupyter notebook provided by Google. Here is the log of the errors I see when running the above code.
The text was updated successfully, but these errors were encountered: