New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak Running simple feed_dict graph #9091
Comments
Thank you @JonathanRaiman for putting a lot of thought into communicating this issue. You've indicated you suspect there is a memory leak in the C++ code. In that case, @zhifengc might be able to advise you on how to troubleshoot down to the precise bug, or perhaps look into it himself. |
@jart Terrific. Thanks! |
I have a similar issue and stumbled upon this report. I used the code supplied by @JonathanRaiman to quickly try and test what exactly (which instruction) is causing my issue. After a lot of different tests, evaluating different ops, I got the following code that reliably reproduces this problem:
The tf.ones(shape, dtype=tf.int32) instruction causes the issue. Same with tf.zeros, tf.ones_like and tf.zeros_like. But the interesting part is, this ONLY happens with dtype=tf.int32, it doesn't happen for int64, int16, int8, uints, floats. Another observation is that, while the reported memory usage by python on the first run is roughly the same for all those data types, the memory usage in the XFCE Task manager is more than twice as high for the int32 variant than for other datatypes. So it seems like python is incorrectly reporting memory usage when tf.int32 is used. Examples (first bump is int64 with no growth, second bump is int32 with fast growth): Please also note that the memory usage increases rather quickly (from 2% memory to 8% memory in 10'000 interations, which takes about 10-15 seconds) and that having multiple tf.ones instruction makes it go up even faster, which can have a pretty noticeable effect on larger and more complex models. But this only happens when the input-dimensions are random. If the shape supplied to tf.ones is the same on every run, memory used does not increased. So it only affects variable sized tensors. Also, tf.cast(tf.ones(shape, dtype=tf.int64), tf.int32) works fine I'm not 100% sure this is the same issue as @JonathanRaiman's, since he's not using int32 as far as I can tell, but his example does have a "None" dimension and the behaviour looks exactly the same. And I'm using tensorflow built from master yesterday, though the problem also existed in 1.1, on Arch Linux |
@Panaetius My original memory leak problem arised in code that had several int32 tensors being fed in with varying sizes (inputs for embedding lookup tables that have varying numbers of time steps going into RNNs). This sounds like it might be the same issue |
This is becoming problematic for me as well. I also use non-fixed input dimensions. Related: |
@JonathanRaiman I'm facing the same problem and I also suspect that it is due to the copying of numpy arrays in |
There is indeed different treatment for These ops are defined in tensorflow/python/ops/array_ops.py. They all call The
*REGISTER_KERNEL_BUILDER macro is defined in tensorflow/core/framework/op_kernel.h. |
There may be a relation to issue 13221. |
It has been 14 days with no activity and the |
Thanks for pointing out that #13221 is likely a duplicate of this. That issue has gained a lot of attention from the development team, with information on what should happen, and we're hoping someone in the community will volunteer to be the one to solve it. Please follow the other issue from here out. |
Appears to be fixed in tf 1.6 |
No, not yet. I still found this problem in tf 1.8. |
I second @hbb21st , I'm facing the same (or very similar) issue in tensorflow 1.8, running a loop of
After some iteration:
|
I'm getting a similar problem. I'm generating batches manually from different .h5 files. with sess.as_default():
full_start = clock()
for i in range(100): #epochs
start = clock()
batch_train_size = 512
batch_test_size = 200
start_train_index =0
end_train_index = start_train_index+batch_train_size
start_test_index = 0
end_test_index = start_test_index+batch_test_size
for j in range(int(ceil(float(train_len)/batch_train_size))):
if start_train_index >= train_len:
start_train_index = 0
if start_test_index >= test_len:
start_test_index = 0
end_train_index = start_train_index+batch_train_size
end_test_index = start_test_index + batch_test_size
if end_train_index >= train_len:
end_train_index = train_len
if end_test_index >= test_len:
end_test_index = test_len
print 'epoch:',i+1,'/100 batch_num:',j+1,'/19'
x_train,y_train,x_test,y_test = loadData(start_train_index,end_train_index,start_test_index,end_test_index)
start_train_index = end_train_index
start_test_index = end_test_index
print x_train.shape,x_test.shape
train_step.run(feed_dict={X_train: x_train,
labels: y_train}) loadData function returns padded input features of different videos from .h5 files After a few batches
Can someone suggest a way to load batches manually and not exhaust memory |
@Satcheel were you able to solve this issue? I am getting a similar memory leak when I change my feed_dict to extract specific tensor values in a single session. Any help in this regards would be really helpful! Many thanks! |
Sorry, I am not very active on GitHub. Leaving the answer here for future reference. |
In a series simple tensorflow programs I obtain memory leaks (unbounded growth of CPU memory).
On original program on a computer with 64GB of RAM this leak is about 640 megabytes per hour (1% of total memory).
Plots of computer's memory over time:
Long time scale picture:
short time scale picture:
Problem description
The original program was more advanced and included RNNs/Saving/Loading etc.. but I "narrowed it down" to a simple for loop with no gradient descent where memory grows over time without bound.
Tested on Fedora 25 and Mac OSX 10.11.5. Issue occurs when running on single GPU (Titan X Pascal) or on CPU. Varying the sizes of the variables in the graph only changes the degree of growth, but does not prevent the effect from occurring. This issue occurs on tensorflow 0.12 and on current tensorflow 1.0.1. No custom code was used. Tensorflow was installed using pip in both cases (pre-compiled binary. Each time this was
pip3 install tensorflow-gpu
). Using CUDA 8.0, CuDNN v5 [though this should not impact the use-case, since no cudnn kernels are being used]. GPU is a Titan X Pascal 12GB of VRAM (not Titan Xp).To reproduce:
Output will be (exact numbers are percentages of computer's ram, so should change based on hardware, but main point is that memory continues to grow when the program has no variation between graph runs, batches are all the same size, no randomness is left in the program, etc.):
How can I fix this? I currently suspect a CPU memory pool issue inside tensorflow since the problem is fairly generic, and does not depend on the ops inside the graph (much). From what I've gathered most likely candidate is the
tf.asarray
/copying of numpy arrays infeed_dict
, leading to memory fragmentation etc. Supposing this were the case, I've heard thattcmalloc
should alleviate this, but no dice (note: I've also checked thatobjgraph
shows no growth in program over time).The text was updated successfully, but these errors were encountered: