-
Notifications
You must be signed in to change notification settings - Fork 74.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-determinism from tf.data.Dataset.map
with random ops
#13932
Comments
Additionally, I would like to note that for steps 3 and 4, an op-level seed must be set on the random ops used within the map function, regardless of whether or not a graph-level seed is set. This appears to be an inconsistent behavior with that of the documentation for
|
I'm not familiar with tensorflow codes, but I tried to trace this. Looks like if we can't assign the exact thread in thread pool to run for each input element, we can't make sure the parallel map functions with random ops are deterministic. However, assigning thread sounds counterintuitive to the nature of thread pool. |
Unfortunately, this is "expected behavior" due to the way In principle, you could slice your import numpy as np
import tensorflow as tf
def test(threads):
np.random.seed(42)
tf.set_random_seed(42)
images = np.random.rand(100, 64, 64, 3).astype(np.float32)
def get_data():
dataset = tf.data.Dataset.from_tensor_slices(images)
# Perform the random number generation in a single-threaded map().
dataset = dataset.map(
lambda image: (image, tf.random_uniform([], -0.04, 0.04, seed=42)),
num_parallel_calls=1)
# Perform the compute-intensive hue adjustment in a multi-threaded map().
dataset = dataset.map(
lambda image, adjustment: tf.image.adjust_hue(image, adjustment),
num_parallel_calls=threads)
dataset = dataset.batch(32)
x = dataset.make_one_shot_iterator().get_next()
return x
# execution 1
x = get_data()
with tf.Session() as sess:
x_batch1 = sess.run(x)
# clear out everything
tf.reset_default_graph()
# execution 2
x = get_data()
with tf.Session() as sess:
x_batch2 = sess.run(x)
# results should be equivalent
assert np.allclose(x_batch1, x_batch2)
test(1) # works with 1 thread!
test(15) # works with >1 threads! However, this manual approach might not scale to a real program. In our CNN benchmarks, we've been using a sequence number to deterministically map "random" perturbations onto input images. In future we might consider doing this kind of slicing automatically, but that's probably some way off. Hope this helps though! |
It has been 14 days with no activity and the |
1 similar comment
It has been 14 days with no activity and the |
Hi @mrry , I just stumbled on this behaviour. If not, I think it would be nice to have a warning in the docs, especially since there is a |
@zaccharieramzi @mrry I tried setting By the way, I've already called a function for deterministic results as below. def seed_everything(seed_value):
tf.random.set_seed(seed_value)
os.environ['TF_DETERMINISTIC_OPS'] = '1' And the tf version I'm using is tf-nightly-gpu |
The work-around suggested by @mrry can be extended using the stateless random image ops. For example, an early stage in your The advantage of using the relatively newly added stateless random image ops in this way is that you only have to inject one random number per-example into the pipeline and that one random number can be used for all the stateless random image ops (as the op's |
System information
pip3 install tf-nightly
(also happens when built from source)Describe the problem
The new
tf.data.Dataset
API contains amap
function with anum_parallel_calls
parameter, which allows elements to be processed in parallel by multiple threads. Although not explicitly mentioned in the API docs, prior discussions (such as a comment from today) have indicated that themap
function should be deterministic (w.r.t. the graph seed) even ifnum_parallel_calls > 1
. I have observed that if the function being mapped contains only non-random ops, then this determinism is observed (see step 2 below). However, if the the function being mapped contains a random op, the results become non-deterministic for all values ofnum_parallel_calls > 1
. This is unexpected, and prevents training experiments from being reproducible, unlessnum_parallel_calls == 1
. Also, please note that the example below serves as a minimal example to reproduce the issue. The real scenario involves running data augmentation during training.Source code / logs
pip3 install tf-nightly
map
functions with only non-random ops are deterministic for all values ofnum_parallel_calls
, which is the expected behavior:map
functions with random ops are deterministic ifnum_parallel_calls == 1
, but are non-deterministic for values ofnum_parallel_calls > 1
, which seems to me to be an unexpected behavior:map
line above with an entirely different random op such asdataset = dataset.map(lambda x: x * tf.random_normal([64, 64, 3], seed=42), num_parallel_calls=threads)
is also non-deterministic for values ofnum_parallel_calls > 1
.The text was updated successfully, but these errors were encountered: