-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU memory not released until Java process terminates #36627
Comments
@TheSentry By default TensorFlow allocates GPU memory for the lifetime of the process, not the lifetime of the session object. More details at: https://www.tensorflow.org/programmers_guide/using_gpu#allowing_gpu_memory_growth Thus, if you want memory to be freed, you'll have to exit the Java interpreter, not just close the session. For more info, you can refer to the following issue. |
@gowthamkpr Thank you for your response.
Unfortunately, the text in this link doesn't state this as clearly as you did just now. I've interpreted "Note we do not release memory, since it can lead to memory fragmentation." as session-bound, not process-bound.
This is unfortunate as our process is a Tomcat webserver, which is long-running.
Thank you for the link to that issue. I had hoped that this was an old issue and had changed by now, but this other issue also points to the conclusion that there is still no way to release GPU memory except terminating the process. Is there a way to turn this into a feature request, if it doesn't already exist? |
This GH issue can serve a feature request. We don't have anyone working on this in Q1 though. |
@TheSentry How about creating another sub-process to run Tensorflow instead of using the same process where Tomcat is running. The Tf-heap memory will be released back to CUDA as soon as the sub-process is killed. |
@akshayrana30 The is not feasible for us. It would require too much work to extract the TF part of our code and establish the proper inter-process communication, not to mention the impact on processing speed, which is very important in our system. Right now we mitigated this problem by having no other process on this machine that needs CUDA |
Hi There, We are checking to see if you still need help on this, as you are using an older version of tensorflow which is officially considered end of life . We recommend that you upgrade to the latest 2.x version and let us know if the issue still persists in newer versions. Please open a new issue for any help you need against 2.x, and we will get you the right help. This issue will be closed automatically 7 days from now. If you still need help with this issue, please provide us with more information. |
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 18.04.4 LTS, Kernel 4.15.0-76-generic
TensorFlow installed from (source or binary):
Binary
TensorFlow version (use command below):
1.15.0
Python version:
Python 3.6.9
CUDA/cuDNN version:
10.0.130
GPU model and memory:
GeForce GTX 1080 Ti, 11177MiB
You can collect some of this information using our environment capture
script
You can also obtain the TensorFlow version with: 1. TF 1.0:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
2. TF 2.0:python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
Describe the current behavior
After closing all Tensors, Graphs and Sessions in our Java programm, the Java process still holds the previously used GPU memory until the Java process terminates.
Describe the expected behavior
After closing all Tensors, Graphs and Sessions, the Java process should release all allocated GPU memory.
Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs
This is the output of
nvidia-smi
before the session is created ("Create Session") and after the JVM terminates ("Terminate").This is the output of
nvidia-smi
after the session has been created and after the session has been closed.The text was updated successfully, but these errors were encountered: