Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report_tensor_allocations_upon_oom and embedding_lookup together lead to memory leak #25810

Open
bsnipers opened this issue Feb 17, 2019 · 3 comments
Assignees
Labels
comp:runtime c++ runtime, performance issues (cpu) stat:awaiting tensorflower Status - Awaiting response from tensorflower

Comments

@bsnipers
Copy link

Describe the current behavior
The RunOptions report_tensor_allocations_upon_oom and embedding_lookup will lead to CPU memory leak.
This will only happen in tf 1.12, tf 1.10 didn't have this problem.

I thought tf.gather have almost same functionality, so I test it and it seems no problem at all.

I'm not sure if #21348 is related to this problem. My problem will only occur with runoptions.

Code to reproduce the issue

import tensorflow as tf

embeddings = tf.reshape(tf.range(1000), shape=(100,10))
ids = tf.reshape(tf.range(100), shape=(20,5))

embedding_lookup = tf.nn.embedding_lookup(embeddings, ids)
ga = tf.gather(embeddings, ids)

with tf.Session() as sess:
  sess.graph.finalize()
  run_opts = tf.RunOptions(report_tensor_allocations_upon_oom=True)

  for i in range(10000000):
    sess.run(embedding_lookup, options=run_opts)
    #sess.run(ga, options=run_opts)

The code above leaks cpu memory quite rapidly ~3 Mb/s.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v1.12.0-0-ga6d8ffae09 1.12.0
  • Python version: 3.6.7
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: 9.0/ 7.4.2
  • GPU model and memory: GTX 1070 with 8GB memory
@ymodak ymodak added the comp:runtime c++ runtime, performance issues (cpu) label Feb 25, 2019
@ymodak ymodak added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Feb 25, 2019
@tilakrayal
Copy link
Contributor

@bsnipers,
Since the graph executor is multi-threaded, the order in which the various operations try to allocate memory will vary from run to run, which can create different amounts of fragmentation. And this can result in OOMs non-deterministically.

You can set the number of inter op threads to 1 to see if that helps. It will reduce performance but increase determinism.

GPU operations allocate memory as they execute (so tf.add(x, y) will allocate memory for its output every time it executes, and that memory will get free after the last use of the tensor it returned).

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Jul 24, 2023
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Aug 1, 2023
@tilakrayal tilakrayal self-assigned this Aug 7, 2023
@github-actions github-actions bot removed stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author labels Aug 8, 2023
@tilakrayal tilakrayal added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Aug 8, 2023
@JyotiPDLr
Copy link

@tilakrayal , Here the issue is related to Memory leak not OOM or memory fragmentation.

As you said here in below.

GPU operations allocate memory as they execute (so tf.add(x, y) will allocate memory for its output every time it executes, and that memory will get free after the last use of the tensor it returned).

If memory getting freed then why Memory leak happening ? That's the question here.

As mentioned in #21348 it might have resoled in TF2. May be but not sure.

@tilakrayal tilakrayal removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jan 17, 2024
@tilakrayal tilakrayal removed their assignment Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:runtime c++ runtime, performance issues (cpu) stat:awaiting tensorflower Status - Awaiting response from tensorflower
Projects
None yet
Development

No branches or pull requests

5 participants