Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leakage when converting to tensor #31419

Closed
benoitkoenig opened this issue Aug 7, 2019 · 11 comments
Closed

Memory leakage when converting to tensor #31419

benoitkoenig opened this issue Aug 7, 2019 · 11 comments
Assignees
Labels
comp:core issues related to core part of tensorflow TF 1.14 for issues seen with TF 1.14 type:bug Bug

Comments

@benoitkoenig
Copy link

`
import numpy as np
import tensorflow as tf

for i in range(5000):
print(i)
array = np.random.random((1024, 1024))
tf.convert_to_tensor(array, dtype=tf.float32)
`

Tensorflow version is 1.14.0, Numpy version is 1.17.0, python version is 3.6.8
The process is killed when i ~= 2400 on my machine
The command "watch -d free -m" shows that memory decreases over time until it gets close to zero, then crashes

I did not find a way to free the memory from the unreferenced tensors

Best,
Benoît

@gowthamkpr gowthamkpr self-assigned this Aug 7, 2019
@gowthamkpr gowthamkpr added the TF 1.14 for issues seen with TF 1.14 label Aug 7, 2019
@gowthamkpr
Copy link

Was able to reproduce the error in google colab.

@gowthamkpr gowthamkpr added the type:bug Bug label Aug 7, 2019
@gowthamkpr gowthamkpr assigned ymodak and unassigned ymodak and gowthamkpr Aug 7, 2019
@jvishnuvardhan
Copy link
Contributor

@benoitkoenig You are overloading the graph with several variables. In order to clear graph, I added a line to the end of your code tf.reset_default_graph() which makes it run for 5000 times without any issue. Please check the full code. Here is the gist

import numpy as np
import tensorflow as tf


for i in range(5000):
  print(i)
  array = np.random.random((1024, 1024))
  tf.convert_to_tensor(array, dtype=tf.float32)
  tf.reset_default_graph()

@benoitkoenig
Copy link
Author

@jvishnuvardhan thank you for your answer
That being said, I actually use this code inside a generator, which I call using fit_generator.
If I call reset_default_graph, then my model is lost, too. To put it simple, consider the following code:

import numpy as np
import tensorflow as tf

outside_tensor = tf.convert_to_tensor(2)
for i in range(5000):
  print(i)
  array = np.random.random((1024, 1024))
  tf.convert_to_tensor(array, dtype=tf.float32)
print(outside_tensor)

The problem I am facing is equivalent to keeping the outside_tensor in the graph in order to print it at the end. In that case, reseting the whole graph won't do. Is there any way to clear one specific tensor from the graph? Here is my specific code:

def training_generator(graph):
    for (img, true_mask) in list:
        with graph.as_default():
            image = tf.convert_to_tensor(img, dtype=tf.float32)
            true_mask = tf.convert_to_tensor(true_mask, dtype=tf.float32)
        yield ([image], [true_mask])

graph = tf.compat.v1.get_default_graph()
model = get_model()
unet.compile(optimizer=Adam(lr=learning_rate), loss=calculate_loss)
gen = training_generator(graph)
unet.fit_generator(gen, steps_per_epoch=steps_per_epoch, epochs=epochs)

Calling tf.reset_default_graph or not using graph.as_default inside the generator both result in a "Invalid input graph." error

Thanks,
Benoît

@ymodak ymodak assigned jvishnuvardhan and unassigned ymodak Aug 9, 2019
@jvishnuvardhan jvishnuvardhan added the comp:core issues related to core part of tensorflow label Oct 21, 2019
@jvishnuvardhan
Copy link
Contributor

@benoitkoenig Sorry for the delay in my response. There were lot of improvements between TF1.14 and TF1.15. Can you please check with TF1.15.0 which was released recently and let us know how it progresses. Thanks!

@jvishnuvardhan jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Oct 21, 2019
@benoitkoenig
Copy link
Author

benoitkoenig commented Nov 7, 2019

Hi!

Sorry for not getting back here, I'm having issues installing tensorflow 1.15 on my machine, I will get back to you as soon as this is done

So you know, the test simply consists in running the following code:

import numpy as np
import tensorflow as tf

outside_tensor = tf.convert_to_tensor(2)
for i in range(5000):
  print(i)
  array = np.random.random((1024, 1024))
  tf.convert_to_tensor(array, dtype=tf.float32)
print(outside_tensor)

And making sure no memory leakage is observed (it does with tensorflow1.14)

Benoît

@jvishnuvardhan
Copy link
Contributor

jvishnuvardhan commented Mar 28, 2020

@benoitkoenig Is this still an issue? I suspect there are no more updates to TF1.x unless there is any security related issues. Can you please try TF2.x and let us know how it progresses.

I used recent tf-nightly and I cannot reproduce the issue. Please check it with tf-nightly and let us know whether you have any issues. Please check the gist here. Thanks!

Please close the issue If this was already resolved for you. Thanks!

@jvishnuvardhan
Copy link
Contributor

@benoitkoenig Can you please check my last response? Thanks!

@benoitkoenig
Copy link
Author

Hi @jvishnuvardhan
I updated pip (v20.0.2), ran pip install tensorflow which installed version 2.1.0, ran the following code:
`import numpy as np
import tensorflow as tf

print('\n\n\n\n')
print(tf.version)
print('\n\n\n\n')

outside_tensor = tf.convert_to_tensor(2)
for i in range(5000):
print(i)
array = np.random.random((1024, 1024))
tf.convert_to_tensor(array, dtype=tf.float32)
print(outside_tensor)
`
And got the error: AttributeError: module 'tensorflow' has no attribute 'compat', which is unrelated and has already been pointed out in other threads

I then uninstalled tensorflow and ran pip install tf-nightly, got tf-nightly-2.2.0.dev20200402, ran the same code. This time it executed but ran in the same issue as mentioned in the thread, and my python script got Killed after ~1700 iterations when memory hit 0. tf.version in this scenario is 2.2.0-dev20200402

Regarding you gist, when I execute it, the fields "RAM" and "Disk" seem stable, however execution stops at the 57th iteration

Let me know if I can help you any further,
Benoît

@mdanatg
Copy link

mdanatg commented Apr 8, 2020

I suspect the compat error is coming from elsewhere? The snippet that you included should run fine in TF 2.

In TF 1, whenever you call convert_to_tensor, a new constant is created in the graph. These constants are permanent. Removing them them is not easy and in general you want to avoid creating too many anyway. In TF 2, this is not a problem because the execution model has radically changed and it's more intuitive. But in TF 1, you should consider using a single tf.random.normal([1024, 1024]), or if you need to inject external data, tf.Placehlder of tf.py_function.

@rohan100jain
Copy link
Member

Can confirm that there isn't any memory increase with TF2 (as per Dan's last comment). Closing issue for now. Please let me know if you run into problems.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:core issues related to core part of tensorflow TF 1.14 for issues seen with TF 1.14 type:bug Bug
Projects
None yet
Development

No branches or pull requests

7 participants