-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keras & data.Dataset : "Your dataset iterator ran out of data" #25254
Comments
I think it would be nice to have tf.keras consider one epoch when the dataset runs out of data, it would make the steps_per_epoch and validation_steps undeeded. |
As a quick workaround the validation dataset could be trimmed to validation_steps * batch_size using .take() before .repeat(). |
Reassigning this to @omalleyt12, since I think he has been improving the validation path lately, and I believe this feature would need to be implemented at the Keras level (but feel free to reassign, Tom!). |
We now allow users to not pass in validation_steps or steps_per_epoch for datasets, like in @cassianocasagrande 's suggestion |
System information
I still encounter similar problems with TF 2.0 alpha and TF 1.13. TF 1.13 gen_train = # ... generator function ...
gen_test = # ... generator function ...
# creation of datasets
types = (tf.float32, tf.int32)
shapes = ((512, 512, 3), (2,))
ds_train = tf.data.Dataset.from_generator(lambda: gen_train, types, shapes).shuffle(1000).repeat().batch(32)
ds_test = tf.data.Dataset.from_generator(lambda: gen_test, types, shapes).shuffle(100).repeat().batch(32)
# usage in model
model.fit(ds_train, steps_per_epoch=188, validation_data=ds_test, validation_steps=20, epochs=10, verbose=True, callbacks=[visualize, tensorboard])
TF 2.0 gen_train = # ... generator function ...
gen_test = # ... generator function ...
# creation of datasets
types = (tf.float32, tf.int32)
shapes = ((512, 512, 3), (2,))
ds_train = tf.data.Dataset.from_generator(lambda: gen_train, types, shapes).shuffle(1000).batch(32)
ds_test = tf.data.Dataset.from_generator(lambda: gen_test, types, shapes).shuffle(100).batch(32)
# usage in model
model.fit(ds_train, validation_data=ds_test, epochs=10, verbose=True, callbacks=[visualize, tensorboard]) In this case the system completes the first epoch and the evaluation. However beginning of the second epoch I get the following error:
If I am using @omalleyt12 Am I right in the assumption that the change is already included in the tf-2.0-alpha release? (As TF 1.13 would raise an error if I do not provide |
I'm guessing the generator runs out of Data, I don't think repeat works with generators |
Thanks for the note. # creation of datasets
types = (tf.float32, tf.int32)
shapes = ((512, 512, 3), (2,))
ds_train = tf.data.Dataset.from_generator(lambda: fct_to_create_train_gen(), types, shapes).shuffle(1000).batch(32)
ds_test = tf.data.Dataset.from_generator(lambda: fct_to_create_test_gen(), types, shapes).shuffle(100).batch(32)
# usage in model
model.fit(ds_train, validation_data=ds_test, epochs=10, verbose=True, callbacks=[visualize, tensorboard]) Alternative option would be to create a generator that loops infinitely over the data, but that would require to provide |
I think this bug still exists when it is used in multiworkerdistributed mode, tfversion: 2.0.0-beta1
train.py
|
Yes, even running the Multi-worker distributed training with Keras code example on the official TensorFlow Documentation website has this error. How do you get this to work over data loaded in from MNIST for example? This is the code example I was talking about, and it's identical to the one shown by ahmedanis03: https://www.tensorflow.org/beta/tutorials/distribute/multi_worker_with_keras. |
same issue |
same for me. |
I am having the same issue as @ahmedanis03 and @benhe2011, but even for one machine two GPU setup. I modified the old code from the multi_gpu_model documentation and used the required MirroredStrategy.
It runs perfectly until the final epoch. It always ends with:
This is only an issue with MirroredStrategy. When I train on a single GPU, there is no issue. Edit: I'm also getting this output too!
TEMPORARY SOLUTION: I converted the numpy arrays to a tf Dataset and used .repeat() while providing the proper number of steps per epoch within fit:
|
Any news on this topic? |
I am having similar issue. @Path-A 's solution works like charm ! Thanks. I tried it and I confirm it works. |
I had a similar issue and setting |
Any updates? I experience the same with MultiWorkerMirroredStrategy. .repeat() + setting the correct number of samples works as a workaround. Nevertheless, it'd be nicer if one can just use the entire validation set without providing the correct number of samples. |
I get the same issue when using tensorflow.keras.preprocessing.image.ImageDataGenerator with model.fit() when trying to specify a steps_per_epoch greater than the length of the generator. |
Got the same issue on TF 2.1.0 |
This still occurs in 2.2 (tf-nightly) if your epochs have varying lengths. I accept this is a rare occurance, but e.g. for graph neural networks, computation / memory requirements are generally dependent on the number of nodes, so batches can have dynamic sizes to accomodate this, which can lead to slightly varying numbers of batches per epoch. If epochs beyond the first are even one step shorter than the first epoch, this issue still arises. |
Solution: Put the repeat(epochs) in the front of batch( batch_size ) https://blog.csdn.net/Linli522362242/article/details/108396485 |
What's the problem with this code that skips all epochs greater than 1? |
Here, the model need input of a list of 3 tensor . But
Try to reformat the dataset
would get error : TypeError: Cannot convert value <class 'list'> to a TensorFlow DType. |
I think the issue with not having enough data in the last batch still exists. For me it happens when running model.predict() as my data is passed in through a generator. Here my problem: len(X_test) = 567 When using this calculation I have 113 steps, as 0.4 or the last two samples are being ignored. This causes the warning below, which is correct: 113/113 [============================>.] - ETA: 0sWARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least When increasing the step size to 114 by applying "steps=math.ceil(len(X_test) / batch_size)" 113/114 [============================>.] - ETA: 0sWARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least I don't see these problems when running model.fit() and setting "steps_per_epoch" and "validation_steps" I am using Singularity container: tensorflow_2.3.2-gpu-jupyter.sif |
I got the same isssue in training with retinanet. I resolved it by using repeat inside "fit" `train_steps_per_epoch = dataset_info.splits["train[:95%]"].num_examples // batch_size train_steps = 4*100000 import random random.seed(10) |
I encountered the same issue. gen_lambda = lambda: gen |
The issue was resolved in my case. The porblem was in my dataset that contain some image with no labels. the number of generated samples < size of the set, so the generator will try to generate data that don't exist! |
Providing a case may help somebody. I create a dataset with The right way is removing Chaos...... |
System information
Describe the current behavior
Keras model.fit() does not reset validation dataset iterator between epochs. Thus, when specifying
validation_steps
<validation_dataset_size / batch_size
, then every evaluation will be performed on a different set of examples.Describe the expected behavior
I would expect that
model.fit()
restarts from the beginning in the validation dataset after every epoch of training. This way the validation dataset could be used without.repeat()
and the evaluation would be performed on the same set of examples.Code to reproduce the issue
https://colab.research.google.com/drive/1UjKNbX38UC4EG6EPm6xLzQ1AmFV8HWe5
Other info / logs
The text was updated successfully, but these errors were encountered: