New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak in tf.data.Dataset.from_generator #37653
Comments
Was able to replicate the issue with Tf 2.1. |
@kkimdev could you please take a look? Thank you. |
memory debug colab link : https://colab.corp.google.com/drive/1TYW_QWcJ6j6FfuepdTu6nfj1TP2Cw5PU#scrollTo=ViLA0wnRmblE leak per 100 iteration:
Reference graph from _GeneratorState https://graphviz.corp.google.com/svg?graph_id=2afe255c1644cc79fe98e01ab09c6be8 Seems like the leaking edge is |
Probably one of the calls
|
The function needs to be attached to the graph, otherwise the py_func Op would have nothing to call. There should be no reference from the function to the graph though - it should have no knowledge of it. That said, it's easy for it to close over something that points to it. |
btw, FYI: in this case it was the global graph. |
any progress in this issues? It seems like still occurs in tensorflow-gpu==2.2.0. |
4 months passed without any progress. I can do nothing but rewrite my entire project into PyTorch. I really love Keras and hope one day the user experience of TensorFlow will line up with Keras. |
I have a pretrained model for my product So I can't change to PyTorch easily unless I have a confidence that converting the weights giving the same performance due to the Precision between Float and Double which I think double data type is being use in keras. What if we use only normal Python Generator? Shouldn't that be solve....? Anyone has try that instead of using tf.data.Dataset.from_generator? |
@luvwinnie @hankcs I deeply apologize for the issues. Would you mind trying this workaround?
|
@kkimdev I have tried by adding the _py_funcs_used_in_graph, however I got this error instead. By the way I used
|
Hi @kkimdev , thanks for your workaround. I tried it on 2.1 and 2.2, neither works. @luvwinnie The parallelization and cache can't be easily implemented using NumPy. Besides, I have a whole pile of data pipelines built on top of tf.data API. To abandon all these was a hard decision for me. But I guess that's life... I'm happy with PyTorch so far, its data API is not as good as TF but it is stable at least. One user of my project said he ended up with using a docker and he has to restart it once tf exceeds the memory quota. I don't think this is the right thing to do for a decent project. |
@hankcs yes i agree with you, I have an exactly same situation with you which my pile of data pipelines built on top of tf.data API too. one of my user he said that he ended up using a normal python generator instead, but doing so, he can't use multiple GPU which the tensorflow need a tf.data API. @kkimdev I think this is a urgent issues for every heavy user of tf.data. How come this didn't solve in 4months? I think this is a problem of eager execution currently the multi-gpu strategy still depends on old Session thing. |
Another workaround try: # Cleanup utility class
class TfDataset(object):
def __init__(self):
self.py_func_set_to_cleanup = set()
def from_generator(self, generator, output_types, output_shapes=None, args=None):
if not hasattr(tf.compat.v1.get_default_graph(), '_py_funcs_used_in_graph'):
tf.compat.v1.get_default_graph()._py_funcs_used_in_graph = []
py_func_set_before = set(tf.compat.v1.get_default_graph()._py_funcs_used_in_graph)
result = tf.data.Dataset.from_generator(generator, output_types, output_shapes, args)
py_func_set_after = set(tf.compat.v1.get_default_graph()._py_funcs_used_in_graph) - py_func_set_before
self.py_func_set_to_cleanup |= py_func_set_after
return result
def cleanup(self):
new_py_funcs = set(tf.compat.v1.get_default_graph()._py_funcs_used_in_graph) - self.py_func_set_to_cleanup
tf.compat.v1.get_default_graph()._py_funcs_used_in_graph = list(new_py_funcs)
self.py_func_set_to_cleanup = set()
# Usage example
tf_dataset = TfDataset()
dataset = tf_dataset.from_generator(generator, output_types=tf.int32, output_shapes=[None])
del dataset
tf_dataset.cleanup() # Call this after done using the generator. Sample Colab run: https://colab.research.google.com/gist/kkimdev/0047ce8444a14197c60c19bba0349156/copy-of-untitled463.ipynb |
@kkimdev Thank you for your workaround! just another question about this, you did a strategy = tf.distribute.MirroredStrategy()
train_iterator = strategy.make_dataset_iterator(train_dataset)
def train_step(inputs):
images, labels = inputs
with tf.GradientTape() as tape:
y_pred = recognizer(images, training=True)
loss = compute_loss(
y_pred, labels
)
losses.update_state(loss)
gradients = tape.gradient(loss, recognizer.trainable_variables)
optimizer.apply_gradients(zip(gradients, recognizer.trainable_variables))
@tf.function
def distributred_train(dataset):
return strategy.experimental_run(train_step, dataset)
for epoch in range(epochs):
for steps in range(train_steps_per_epoch):
ditributred_train(train_iterator)
# it use the train_iterator to run the distributed training step.
# How should I suppose to reset the dataset? |
@luvwinnie Actually, your snippet could be a different issue. This issue is for memory leaks when |
@kkimdev I trying to find out the problem, but it seems like the problem of multi-GPUs settings, I would like to open an issue with a when I'm able to make a reproducible code since the Colab environment can't use 2gpus |
The issue is unrelated to GPU, and is perfectly reproducible with CPU training. I am using flat numpy arrays as training input, so it is not a dataset/generator issue either. Memory leak rate seems to depend on training set size though, I am collecting some stats, will update post in a couple of hours once I get a nice plot of RAM use vs time. |
Found a workaround, del together with gc.collect() working fine for me.:
|
@kkimdev I am going to check your workaround right now since I am experiencing memory leaks using the If I understand from your snipped provided in Colab, the dataset is cleared at every iteration. Would you do this periodically but not necessarily every step? I'm asking this because it seems that if the dataset is cleared at every step there is no gain in terms of caching (say the dataset is cleared every |
@jjrugui Yes that sounds reasonable, but let's first confirm that the memory leak could be resolved by using the wrapper. |
I am also seeing this error occur with using tf agents's after episode 2
after episode 3:
I have removed all my training code. The only method being called in this loop is:
However, if I change it to:
the memory leak goes away 100% ... Not sure what the tf agent code is doing with that variable, but it might help someone track down the problem or help someone else using tf agents. |
@hankcs , |
Thank you for the workaround and I can confirm it works with the latest v2.7: https://colab.research.google.com/drive/1xNMnqzM0Zrfr3I8XwcS0E9kdhhslxuZd?usp=sharing |
@hankcs , |
Thank you so much! |
Why is this issue closed @tilakrayal , I'm still experiencing the same issue. With tf 2.11 (v2.11.0-rc2-17-gd5b57ca93e5, 2.11.0 specifically) in my case. Is the proposed solution to use the workaround? To reproduce, I'm using mostly the same as @hankcs. In my 'real' scenario the leak is in the order of 40GB which is a bit problematic to be honest. import gc
import psutil
import tensorflow as tf
def generator():
yield tf.random.randn((1000, 1000))
gc.collect()
process = psutil.Process()
deltas = []
for i in range(1000):
mem_used_0 = process.memory_info().rss
# create a dataset from a generator, and cleanup immediatelly
dataset = tf.data.Dataset.from_generator(
generator, output_types=tf.float64, output_shapes=[None]
)
del dataset
gc.collect()
# collect memory usage
mem_used_1 = process.memory_info().rss
# store the memory usage delta
delta = mem_used_1 - mem_used_0
deltas.append(delta)
# How much memory is leaking?
delta_sum_mb = sum(deltas) / 1024**2
print(f"Sum of memory leaking: {delta_sum_mb:.2f}MB") Output:
|
If anyone is experiencing this, one workaround is to create the dataset in a new thread each time. This is because the object that's holding on to the data is the thread-local global default graph. As I understand it this is an object that exists to provide compatibility with TensorFlow version 1. Here's the chain of references that causes the leak:
In my case my generator function is a closure ( It may also be possible to avoid this by using a regular (non-closure) function, and passing the data via from_generator's args argument, but I haven't tried this. |
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
example script provided in TensorFlow): No
Linux Ubuntu 16.04): Ubuntu 18.04.3 LTS
the issue happens on mobile device:
binary): binary
source):
CUDA Version: 10.1
cudnn-10.1
TITAN RTX
24190MiB
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0:python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
tf.data.Dataset.from_generator
leaks memory after each call even if followed bygc.collect()
.Describe the expected behavior
Memory should be released when no reference exists for the dataset.
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
It leaks 3MB in 1000 calls. In some real projects, it can leak as much as 5GB and keeps increasing.
The text was updated successfully, but these errors were encountered: