-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in Java API when using GPU #11948
Comments
@asimshankar could you please take a look at this. |
Sample log output fwiw,
|
I've added valgrind output to the test repository: https://github.com/riklopfer/TensorflowJavaGpuMemoryTest/blob/master/valgrind.out I'm not really familiar with this tool, but it seems like it would be useful. The summary makes me think that there definitely is a leak somewhere
|
@riklopfer : Thanks very much for getting that information across. Unfortunately not a lot struck out to me. I did see 32 bytes of leaks from graph construction, which I will fix, but that happens once - not in a loop so won't explain the increasing usage over time. |
@asimshankar thanks for the fixes. Were you able to reproduce the issue of ever-increasing memory consumption? Any idea what the next steps might be? |
Updating CUDA and Nvidia drivers seems to have greatly mitigated the problem for me. I added updated valgrind output to the test repo. |
Thanks for the update @riklopfer When you say "greatly mitigated", are you still seeing a monotonic increase in memory usage over time, or does it stabilize? |
Thanks to @riklopfer for reporting in tensorflow#11948 PiperOrigin-RevId: 167032430
@asimshankar I no longer see monotonic increase in memory consumption when running my the small test in the linked repo. However, when I run a longer, more complicated graph on the GPU, it is killed by the OOM killer. I wasn't able to get a valgrind dump for that process. When I have time, I will try increasing the complexity of the test graph until it shows the problem again (or not). |
It has been 14 days with no activity and the |
Running with 1.4.0, I still see a slow, monotonic increase in memory consumption. I haven't had a chance to attempt to minimally reproduce the issue. |
It has been 14 days with no activity and the |
The original poster has replied to this issue after the stat:awaiting response label was applied. |
Closing since the original issue has been fixed. Please file another ticket with a repro if you can. Thanks! |
System information
Describe the problem
Main memory on the machine is continuously consumed when running on the GPU. Memory consumption hovers around 600M when running on the CPU.
Source code / logs
see: https://github.com/riklopfer/TensorflowJavaGpuMemoryTest
The text was updated successfully, but these errors were encountered: