New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow 2.1 Error “when finalizing GeneratorDataset iterator” - a memory leak? #37515
Comments
@aaudiber could you please take a look? thank you |
Trying to find a work around I have reduces the epochs to 1, and instead tried a loop, which gives me a slightly different error, but still a memory leak:
Here is the small change I did to the code (original see above):
|
@Tuxius Is it possible to reproduce the issue using fake data? If you can provide a minimal, self-contained repro, that will help a lot in finding the root cause. |
I have also the same issue: Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled |
I am seeing this issue as well. Memory increases at the beginning of each epoch and fills up quickly. |
I am having similar issues (Error..finalizing GeneratorDataset iterator..) with the same script I have successfully run before. I recently upgraded to Tensorflow v2.1. I just downgraded to an earlier Tensorflow version and the script works without error; downgraded via "conda install tensorflow=1.15.0", miniconda3 python v3.7, numpy 1.18.1. Note: I use Keras in R to access tensorflow backend functions. It is possible that there is some conflict between Keras and Tensorflow?? I am putting together a sample script/data to try and reproduce this error for the group here. Hopefully we can identify a solution. I can confirm that I got my script working using the procedure below (from terminal):
My script runs successfully without error. I tested separate versions of tensorflow and it appears that tensorflow 2.1 is not compatible with keras version or there is some other conflict that was resolved when running the procedures above. I also verified that installing tensorflow via conda was sufficient -- I did not have to specify "conda install tensorflow-gpu" to get tensorflow to use native GPU on my system. From terminal, "nvidi-smi" shows that GPU is being used when running my code, and also, from within R, "tf$test$is_gpu_available()" shows that GPU returns TRUE. Hopefully this helps people. |
Got same problem here. And this only happens when I specify the number of workers. But removing this argument will slow down the process. |
Yes, I can confirm that setting the number of workers to 1 or just leaving out the argument completely solves the problem! It doesn't crash anymore and the memory consumption is stable. Only with workers set to >1 it crashes. |
I update tf to tf-nightly. Error was gone. |
Also updated to tf-nightly 2.2.0.dev20200319, but still get a crash with workers > 1. Also tried several other of the last nightlies, still get crashed. Only with workers = 1 it runs for me :-( |
@aaudiber: I created a minimal, self-contained repro for you: run.py:
datagenerator.py:
for me this works with workers = 1 but crashes with workers = 6 ... |
Thank you very much for providing the reproduction and narrowing it down to the use of the |
I used "conda install tensorflow-gpu" to install my tensorflow environment. |
I'm using the tensorflow image tensorflow/tensorflow:2.1.0-gpu-py3 from docker hub: https://hub.docker.com/r/tensorflow/tensorflow/tags/?page=1 I'm also interested in consuming the fix. I'm using tf.keras. |
Fixes: tensorflow#37515 PiperOrigin-RevId: 302568217 Change-Id: I28d0eaf3602fea0461901680df24899f135ce649
@geetachavan1 Thanks! |
This issue happen to me today, and I also happen to use conda so I think I might share it to you as well, conda create -n tf22 python=3.7 cudnn cupti cudatoolkit=10.1.243
pip install tensorflow==2.2.0rc3
conda activate tf22 For tf2, tensorflow already support GPU if it can open all the libary |
I am with |
We're in 2022 and still there is no clear answer for the problem. I get the same message on all versions of tensorflow and with all kinds of gpu configurations. Nothing is working. This happens by end of each training process. |
setting |
Reopening of issue #35100, as more and more people report to still have the same problem:
Problem description
I am using TensorFlow 2.1.0 for image classification under Centos Linux. As my image training data set is growing, I have to start using a Generator as I do not have enough RAM to hold all pictures. I have coded the Generator based on this tutorial.
It seems to work fine, until my program all the sudden gets killed without an error message:
Looking at the growing memory consumption with linux's top, I suspect a memory leak?
What I have tried
The suggestion to switch to TF nightly build version. For me it did not help, also downgrading to TF2.0.1 did not help
There is a discussion suggesting that it is important, that 'steps_per_epoch' and 'batch size' correspond (whatever this exactly means) - I played with it without finding any improvement.
Trying to narrow down by looking at the size development of all variables in my Generator
Relevant code snippets
being called by
I have tried running the exact same code under Windows 10, which gives me the following error:
The text was updated successfully, but these errors were encountered: