-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.data.Dataset.map() makes unnecessary memory allocations #62788
Comments
Hi @hrsht , Thanks for reporting. The There seems to be an issue with map() function with dataset object. Attached gist for reference. Needs to dig more for root cause. Thanks! |
Hi @hrsht , It is intended behaviour. Consider the following code:
When we call Changing the code to below will resolve the issue:
|
Thanks @SuryanarayanaY for the investigation. I don't think you quite understood the issue here. The multiple chaining of I did further investigation on my local machine by printing malloc stats at different steps (I used tcmalloc for this run as the stats it prints are more concise and better than the default malloc). This highlights the issue much better. My code:
Here are the logs from running the above code with tcmalloc using
As you can see from the malloc stats, there is an extra allocation of 2GB when the dataset iterator is initialized using To exacerbate the problem here, if you increase the number of chained This is not a problem in general when referencing small data from the method used in FWIW, I also see a similar copy of tensor when using I hope this helps clarify the underlying issue. Thanks! |
I have same problem at 2.12, any solution?🫠🫠 |
@wilsingosti , Could you please put your comment on how chaining of map functions actually accumulating memory with each chaining of map function as it seems each chain call actually creating memory for Input array and it gets accumulating each time. is this intended behaviour? |
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
tf 2.15.0
Custom code
Yes
OS platform and distribution
Linux
Mobile device
No response
Python version
3.10.13
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
When
tf.data.Dataset.map
is called with a function which references tensor(s) (or a nested object with tensors), it seems to be making a copy of these tensors. If these tensors are large, this causes large memory allocations which can cause the process to OOM.Please see the below code snippet which reproduces the issue (here is a reference to the colab). I am allocating a tensor which takes 2GB of memory and then referencing it in the
get_data
function. This function is used intf.data.Dataset.map
to construct the dataset. I am chaining multiplemap
calls to exacerbate the bug to cause OOM in colab. Eachmap
call allocates a new copy of the original tensor referenced by the passed function.Please note that this is not a memory leak as these copies are subsequently freed and the memory is released back to the mem allocator. However, depending on the allocator and it's settings, the allocator may hold on to the memory for a long time and not release back to the OS, causing a memory bloat for the process in the best case, and an OOM in the worse case.
It is expected that these tensor copies do not happen as there is no functional need.
It is possible that the root cause of this issue is the same as #61344 , in which case feel free to close this issue and track the underlying bug over there.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: