-
Notifications
You must be signed in to change notification settings - Fork 74.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mirrored Strategy] You dataset iterator ran out of data; interrupting training. #30636
Comments
As mentioned in #25254 . The offical example code does not worked either. |
Keras should be able to figure out validation_steps. (cc @omalleyt12 ) Could you please share the code so that we have more complete details of this case while we are looking at it? Thanks. |
When I do not pass the validations_steps: Without Mirroed Strategy : Work |
I think this maybe the keras does not consider when the numbers of val data is odd. |
same here here |
I have get the same problem |
Same issue |
@edwardyehuang Can you please test it against latest tf 2.0 nightly build. Thanks!
|
@edwardyehuang Is this still an issue? Can you check with |
@edwardyehuang Is this still an issue? If not, please close the issue. Thank! |
Is this really fixed? With tf 2.0.0 + keras.fit + tf.dataset for validation data + multi-gpu, i'm having the same issue. |
FIxed confirmed in 2.0 stable version |
what's the fix ? want to use it in 1.14 |
I found it cannot estimate the data from tfrecord. |
However, it still can complete the validation, but will print some ugly warning message. The program will not stop |
try: print(tf.version) #just try to not to use TF 2.x |
Hi There, We are checking to see if you still need help on this issue, as you are using an older version of tensorflow(1.x) which is officially considered as end of life. We recommend that you upgrade to 2.4 or later version and let us know if the issue still persists in newer versions. This issue will be closed automatically 7 days from now. If you still need help with this issue, Please open a new issue for any help you need against 2.x, and we will get you the right help. |
System information
Error in keras.Model.fit.
When using the mirroredstrategy with tensorflow dataset in both training and validation.
Single GPU card works fine, whether using the mirroredstertegy or not. (When using the mirroredstertegy, set the devices = /gpu:0). This problem only occurs when using multiple gpu cards.
The error displayed:
[training_arrays.py 325] Your dataset iterator ran out of data; interrupting training. Make sure that your iterator can geretate at least "validation_steps * epochs" batches.
Currently the only worked solutation for me is manully set the "validation_steps" in keras.Model.fit.
Tensorflow dataset repeat or/and take, will not work. By setting the validation batch size to 2 (1 for each GPU) also does not work
Simliar issues in here #25254, but closed
The text was updated successfully, but these errors were encountered: