Memory leakage when converting to tensor #31419

benoitkoenig · 2019-08-07T18:12:44Z

`
import numpy as np
import tensorflow as tf

for i in range(5000):
print(i)
array = np.random.random((1024, 1024))
tf.convert_to_tensor(array, dtype=tf.float32)
`

Tensorflow version is 1.14.0, Numpy version is 1.17.0, python version is 3.6.8
The process is killed when i ~= 2400 on my machine
The command "watch -d free -m" shows that memory decreases over time until it gets close to zero, then crashes

I did not find a way to free the memory from the unreferenced tensors

Best,
Benoît

gowthamkpr · 2019-08-07T20:43:41Z

Was able to reproduce the error in google colab.

jvishnuvardhan · 2019-08-08T00:11:50Z

@benoitkoenig You are overloading the graph with several variables. In order to clear graph, I added a line to the end of your code tf.reset_default_graph() which makes it run for 5000 times without any issue. Please check the full code. Here is the gist

import numpy as np
import tensorflow as tf


for i in range(5000):
  print(i)
  array = np.random.random((1024, 1024))
  tf.convert_to_tensor(array, dtype=tf.float32)
  tf.reset_default_graph()

benoitkoenig · 2019-08-09T03:24:22Z

@jvishnuvardhan thank you for your answer
That being said, I actually use this code inside a generator, which I call using fit_generator.
If I call reset_default_graph, then my model is lost, too. To put it simple, consider the following code:

import numpy as np
import tensorflow as tf

outside_tensor = tf.convert_to_tensor(2)
for i in range(5000):
  print(i)
  array = np.random.random((1024, 1024))
  tf.convert_to_tensor(array, dtype=tf.float32)
print(outside_tensor)

The problem I am facing is equivalent to keeping the outside_tensor in the graph in order to print it at the end. In that case, reseting the whole graph won't do. Is there any way to clear one specific tensor from the graph? Here is my specific code:

def training_generator(graph):
    for (img, true_mask) in list:
        with graph.as_default():
            image = tf.convert_to_tensor(img, dtype=tf.float32)
            true_mask = tf.convert_to_tensor(true_mask, dtype=tf.float32)
        yield ([image], [true_mask])

graph = tf.compat.v1.get_default_graph()
model = get_model()
unet.compile(optimizer=Adam(lr=learning_rate), loss=calculate_loss)
gen = training_generator(graph)
unet.fit_generator(gen, steps_per_epoch=steps_per_epoch, epochs=epochs)

Calling tf.reset_default_graph or not using graph.as_default inside the generator both result in a "Invalid input graph." error

Thanks,
Benoît

jvishnuvardhan · 2019-10-21T23:39:46Z

@benoitkoenig Sorry for the delay in my response. There were lot of improvements between TF1.14 and TF1.15. Can you please check with TF1.15.0 which was released recently and let us know how it progresses. Thanks!

benoitkoenig · 2019-11-07T06:10:11Z

Hi!

Sorry for not getting back here, I'm having issues installing tensorflow 1.15 on my machine, I will get back to you as soon as this is done

So you know, the test simply consists in running the following code:

import numpy as np
import tensorflow as tf

outside_tensor = tf.convert_to_tensor(2)
for i in range(5000):
  print(i)
  array = np.random.random((1024, 1024))
  tf.convert_to_tensor(array, dtype=tf.float32)
print(outside_tensor)

And making sure no memory leakage is observed (it does with tensorflow1.14)

Benoît

jvishnuvardhan · 2020-03-28T20:14:54Z

@benoitkoenig Is this still an issue? I suspect there are no more updates to TF1.x unless there is any security related issues. Can you please try TF2.x and let us know how it progresses.

I used recent tf-nightly and I cannot reproduce the issue. Please check it with tf-nightly and let us know whether you have any issues. Please check the gist here. Thanks!

Please close the issue If this was already resolved for you. Thanks!

jvishnuvardhan · 2020-04-02T15:15:39Z

@benoitkoenig Can you please check my last response? Thanks!

benoitkoenig · 2020-04-02T16:34:25Z

Hi @jvishnuvardhan
I updated pip (v20.0.2), ran pip install tensorflow which installed version 2.1.0, ran the following code:
`import numpy as np
import tensorflow as tf

print('\n\n\n\n')
print(tf.version)
print('\n\n\n\n')

outside_tensor = tf.convert_to_tensor(2)
for i in range(5000):
print(i)
array = np.random.random((1024, 1024))
tf.convert_to_tensor(array, dtype=tf.float32)
print(outside_tensor)
`
And got the error: AttributeError: module 'tensorflow' has no attribute 'compat', which is unrelated and has already been pointed out in other threads

I then uninstalled tensorflow and ran pip install tf-nightly, got tf-nightly-2.2.0.dev20200402, ran the same code. This time it executed but ran in the same issue as mentioned in the thread, and my python script got Killed after ~1700 iterations when memory hit 0. tf.version in this scenario is 2.2.0-dev20200402

Regarding you gist, when I execute it, the fields "RAM" and "Disk" seem stable, however execution stops at the 57th iteration

Let me know if I can help you any further,
Benoît

mdanatg · 2020-04-08T00:59:26Z

I suspect the compat error is coming from elsewhere? The snippet that you included should run fine in TF 2.

In TF 1, whenever you call convert_to_tensor, a new constant is created in the graph. These constants are permanent. Removing them them is not easy and in general you want to avoid creating too many anyway. In TF 2, this is not a problem because the execution model has radically changed and it's more intuitive. But in TF 1, you should consider using a single tf.random.normal([1024, 1024]), or if you need to inject external data, tf.Placehlder of tf.py_function.

rohan100jain · 2020-05-17T01:31:47Z

Can confirm that there isn't any memory increase with TF2 (as per Dan's last comment). Closing issue for now. Please let me know if you run into problems.

google-ml-butler · 2020-05-17T01:31:49Z

Are you satisfied with the resolution of your issue?
Yes
No

gowthamkpr self-assigned this Aug 7, 2019

gowthamkpr added the TF 1.14 for issues seen with TF 1.14 label Aug 7, 2019

gowthamkpr added the type:bug Bug label Aug 7, 2019

gowthamkpr assigned ymodak and unassigned ymodak and gowthamkpr Aug 7, 2019

ymodak assigned jvishnuvardhan and unassigned ymodak Aug 9, 2019

jvishnuvardhan added the comp:core issues related to core part of tensorflow label Oct 21, 2019

jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Oct 21, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Nov 7, 2019

eugenevinitsky mentioned this issue Dec 23, 2019

Memory Leak with MultiAgentEnvironment ray-project/ray#6359

Closed

rohan100jain closed this as completed May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leakage when converting to tensor #31419

Memory leakage when converting to tensor #31419

benoitkoenig commented Aug 7, 2019

gowthamkpr commented Aug 7, 2019

jvishnuvardhan commented Aug 8, 2019

benoitkoenig commented Aug 9, 2019

jvishnuvardhan commented Oct 21, 2019

benoitkoenig commented Nov 7, 2019 •

edited

jvishnuvardhan commented Mar 28, 2020 •

edited

jvishnuvardhan commented Apr 2, 2020

benoitkoenig commented Apr 2, 2020

mdanatg commented Apr 8, 2020

rohan100jain commented May 17, 2020

google-ml-butler bot commented May 17, 2020

Memory leakage when converting to tensor #31419

Memory leakage when converting to tensor #31419

Comments

benoitkoenig commented Aug 7, 2019

gowthamkpr commented Aug 7, 2019

jvishnuvardhan commented Aug 8, 2019

benoitkoenig commented Aug 9, 2019

jvishnuvardhan commented Oct 21, 2019

benoitkoenig commented Nov 7, 2019 • edited

jvishnuvardhan commented Mar 28, 2020 • edited

jvishnuvardhan commented Apr 2, 2020

benoitkoenig commented Apr 2, 2020

mdanatg commented Apr 8, 2020

rohan100jain commented May 17, 2020

google-ml-butler bot commented May 17, 2020

benoitkoenig commented Nov 7, 2019 •

edited

jvishnuvardhan commented Mar 28, 2020 •

edited