You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
v1.12.1-108954-g88310ddcbdd 2.17.0-dev20240412
Custom code
Yes
OS platform and distribution
Docker: tensorflow/tensorflow:nightly
Mobile device
No response
Python version
3.11.0rc1
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I discovered what appears to be a memory leak when iterating over a tf.data.Dataset created with from_generator. Process memory usage continues to grow out of hand. The effect only appears in certain combinations of Tensorflow and Python and it may have appeared in Python 3.11. Here are some examples I've tested:
Maybe related to https://docs.python.org/3/whatsnew/3.11.html#faster-cpython? I thought maybe Python is re-using the memory and not freeing it, but usage grows ridiculously (I noticed it because it started taking up tens of GB in one case) and it seems like it shouldn't with a generator. Odd that 2.13.0 experiences the problem with Python 3.10.10 too though.
2024-04-14 16:12:10.319053: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Python: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
TensorFlow: ('v1.12.1-109002-g2c2c0a17f05', '2.17.0-dev20240413')
Iterations: 10000 Memory use: 489308160
Iterations: 20000 Memory use: 496451584
Iterations: 30000 Memory use: 503353344
Iterations: 40000 Memory use: 510517248
Iterations: 50000 Memory use: 517464064
Iterations: 60000 Memory use: 524591104
...
Iterations: 980000 Memory use: 1173454848
Iterations: 990000 Memory use: 1180618752
Iterations: 1000000 Memory use: 1187770368
2024-04-14 16:19:00.851851: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
379
Output from running valgrind:
PYTHONMALLOC=malloc valgrind --track-origins=yes --leak-check=full --show-leak-kinds=definite --trace-children=yes python ./tfdata_test.py
I stopped it at about 80,000 iterations because it's much slower running under valgrind==324== 24,434,160 (4,660,032 direct, 19,774,128 indirect) bytes in 72,813 blocks are definitely lost in loss record 188,561 of 188,562==324== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)==324== by 0x4D4AC1: ??? (in /usr/bin/python3.11)==324== by 0x5BA613: ??? (in /usr/bin/python3.11)==324== by 0x5BB4C4: ??? (in /usr/bin/python3.11)==324== by 0x4FE1B5: _PyEval_EvalFrameDefault (in /usr/bin/python3.11)==324== by 0x531822: _PyFunction_Vectorcall (in /usr/bin/python3.11)==324== by 0x530FB8: ??? (in /usr/bin/python3.11)==324== by 0x5DA693: _PyObject_CallMethod_SizeT (in /usr/bin/python3.11)==324== by 0x26C7748A: _descriptor_from_pep3118_format (in /usr/local/lib/python3.11/dist-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)==324== by 0x26C8CBD5: _array_from_buffer_3118.part.0 (in /usr/local/lib/python3.11/dist-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)==324== by 0x26C8DF4E: _array_from_array_like (in /usr/local/lib/python3.11/dist-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)==324== by 0x26C6FCD4: PyArray_DiscoverDTypeAndShape_Recursive (in /usr/local/lib/python3.11/dist-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so)==324====324== 26,302,320 bytes in 73,062 blocks are definitely lost in loss record 188,562 of 188,562==324== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)==324== by 0x625128: ??? (in /usr/bin/python3.11)==324== by 0x6250BA: PyThreadState_New (in /usr/bin/python3.11)==324== by 0x64E44F: PyGILState_Ensure (in /usr/bin/python3.11)==324== by 0xA597C58: tensorflow::PyFuncOp::Compute(tensorflow::OpKernelContext*) (in /usr/local/lib/python3.11/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so)==324== by 0x92F0F3F: tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) (in /usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2)==324== by 0x927A539: tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode const&, long) (in /usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2)==324== by 0x9A3D39D: std::_Function_handler<void (), std::_Bind<tensorflow::data::RunnerWithMaxParallelism(std::function<void (std::function<void ()>)>, int)::$_0::operator()(std::function<void (std::function<void ()>)> const&, std::function<void ()>) const::{lambda(std::function<void ()> const&)#1} (std::function<void ()>)> >::_M_invoke(std::_Any_data const&) (in /usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2)==324== by 0x9C2299F: Eigen::ThreadPoolTempl<tsl::thread::EigenEnvironment>::WorkerLoop(int) (in /usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2)==324== by 0x9C223D0: void std::__invoke_impl<void, tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}&>(std::__invoke_other, tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}&) (in /usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2)==324== by 0x960E55A: tsl::(anonymous namespace)::PThread::ThreadFn(void*) (in /usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2)==324== by 0x4A27AC2: start_thread (pthread_create.c:442)==324====324== LEAK SUMMARY:==324== definitely lost: 30,984,576 bytes in 146,328 blocks==324== indirectly lost: 19,783,920 bytes in 72,736 blocks==324== possibly lost: 136,684,089 bytes in 1,082,632 blocks==324== still reachable: 17,341,568 bytes in 182,582 blocks==324== of which reachable via heuristic:==324== stdstring : 341 bytes in 8 blocks==324== newarray : 104,387 bytes in 320 blocks==324== multipleinheritance: 7,168 bytes in 24 blocks==324== suppressed: 0 bytes in 0 blocks
The text was updated successfully, but these errors were encountered:
Hi @cohaegen ,
I tried to run your code on colab using TF v2.15, 2.16.1, and nightly. But i am not facing any issue. Please find the gist here for reference.
Hi Venkat- thanks for checking it out. I’ve only seen the issue in newer
versions of TF with Python 3.11- it seems ok with 3.10. It looks like you
ran it with 3.10 (correct me if I’m mistaken). Can you try with 3.11?
Thanks,
Nick
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
v1.12.1-108954-g88310ddcbdd 2.17.0-dev20240412
Custom code
Yes
OS platform and distribution
Docker: tensorflow/tensorflow:nightly
Mobile device
No response
Python version
3.11.0rc1
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I discovered what appears to be a memory leak when iterating over a tf.data.Dataset created with from_generator. Process memory usage continues to grow out of hand. The effect only appears in certain combinations of Tensorflow and Python and it may have appeared in Python 3.11. Here are some examples I've tested:
Python 3.10.10, tensorflow 2.13.0: yes
Python 3.10.10, tensorflow 2.16.1: no
Python 3.10.12, tensorflow v2.15.0-0-g6887368d6d4: no
Python 3.11, tensorflow 2.16.1: yes
Python: 3.11.0rc1, tensorflow v1.12.1-109002-g2c2c0a17f05: yes
Maybe related to https://docs.python.org/3/whatsnew/3.11.html#faster-cpython? I thought maybe Python is re-using the memory and not freeing it, but usage grows ridiculously (I noticed it because it started taking up tens of GB in one case) and it seems like it shouldn't with a generator. Odd that 2.13.0 experiences the problem with Python 3.10.10 too though.
Standalone code to reproduce the issue
https://colab.research.google.com/drive/1LmdIqWME19GLFG0E7dsCtRtscLLFF89R?usp=sharing
Relevant log output
The text was updated successfully, but these errors were encountered: