Calling variable.assign() too many times crashes on memory allocation. #2311

jhollowayj · 2016-05-10T18:49:57Z

Background: I'm working on a set of networks that only share some layers, so I have a parameter server that sends new weights for the different clients to use. These clients accept the new weights and bias for the layers they are using and assign the values to the TF.Variables via sess.run(self.w1.assign(new_weights)). However, when I start it up and let it run, it crashes saying

W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 16B.  See logs for memory state.

(Sometimes it's allocating 16B, other times its 3.9KiB)

To give you an idea of the size of the weights, I have three layers of:
Layer 1(W,b): (2, 1000), (1000, )
Layer 2(W,b): (1000, 1000), (1000, )
Layer 3(W,b): (1000, 4), (4, )
I'm running on a Titan X with 12G memory.

With per_process_gpu_memory_fraction = 0.01, the program dies at ~190 assign commands.
With per_process_gpu_memory_fraction = 0.02, the program dies at ~384 assign commands.
With per_process_gpu_memory_fraction = 0.03, the program dies at ~780 assign commands.
With per_process_gpu_memory_fraction = 0.04, the program dies at ~784 assign commands.
With per_process_gpu_memory_fraction = 0.05, the program dies at ~1582 assign commands.
With per_process_gpu_memory_fraction = 0.06, the program dies at ~1586 assign commands.

I've tried to set allow_growth=True, and deferred_deletion_bytes=1 in the session's GPUOptions after reading issue #1578, but that didn't get me much further. (I have no idea what deferred_deletion_bytes does...) Looking at the numbers just above (GPU%vsAssignmentCommands), it seems to be fairly linear, so it seems to me that the assign operation takes some of the GPU ram and it's never freed up. Is there any sense of GC on the GPU memory allocated durring the var.assign() op?

It seems that I could delete and create a new session, but that sounds expensive to me, and I'd have to maintain the weights outside of session to be able to restore them correctly. The second idea I had would to use placeholders and ship the weights in every time with the feed_dict, but again, that seems less that ideal and I think it would struggle in the optimizer on knowing what to optimize if they are just placeholders.

Let me know if you would like any other logs or reports from me. I figure this is the first time someone has tried to use assign operations like this, so I want to be helpful in fixing it if it's a bug.

Thanks

Environment info

Operating System: Ubuntu 16.04
Installed version of CUDA and cuDNN:
/usr/local/cuda/lib/libcudart.so -> libcudart.so.7.0
/usr/local/cuda/lib/libcudart.so.7.0 -> libcudart.so.7.0.28
/usr/local/cuda/lib/libcudart.so.7.0.28
/usr/local/cuda/lib/libcudart_static.a
Built from source. Commit hash: 35cd6a3

The text was updated successfully, but these errors were encountered:

jhollowayj · 2016-05-12T18:33:05Z

Here's a quick script that should break when you run it (if it helps...). Mine dies on iteration 31.
crash_tf_assign_op.py.txt

mrry · 2016-05-16T16:11:21Z

The assign op is not consuming memory, but the problem is caused by the fact that each instance of new_weights is converted to a constant op, and added to the graph. Each constant op owns a buffer containing the value that it produces, and a constant op on the GPU device will allocate that buffer in GPU memory.

The fix is to rewrite your program somewhat. Instead of doing:

for i in range(3000):
    print "Assigning i:{}".format(i)
    sess.run(w1.assign(new_value_array))

... you should declare the assign op and a placeholder before the loop, and feed different values to the placeholder in each iteration:

assign_placeholder = tf.placeholder(tf.float32, shape=[1000, 1000])
assign_op = w1.assign(assign_placeholder)

for i in range(3000):
    print "Assigning i:{}".format(i)
    sess.run(assign_op, feed_dict={assign_placeholder: new_value_array})

jhollowayj · 2016-05-16T19:33:32Z

That totally makes sense now. I never would have guessed to do that though. Thanks so much.

mrry · 2016-05-16T19:38:13Z

Indeed - it's a difficult error to disallow, because there are many totally valid patterns that involve adding nodes to the graph. One tip is to try calling tf.get_default_graph().finalize() before your training loop, so that an error will be thrown if you accidentally add a node. (However, we can't do that automatically - e.g. on the first run() call - because it would break a huge number of people :( ...)

jhollowayj · 2016-05-17T00:46:16Z

Thanks @mrry. Everything is running great again.

mrry · 2016-05-17T01:16:47Z

Glad to hear it!

petewarden assigned poxvoculi May 12, 2016

mrry closed this as completed May 16, 2016

This was referenced Jul 26, 2016

B.model.set_weights(A.model.get_weights()) leaks memory each iteration keras-team/keras#2973

Closed

Memory overflow after several calls to set_weights() keras-team/keras#3316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling variable.assign() too many times crashes on memory allocation. #2311

Calling variable.assign() too many times crashes on memory allocation. #2311

jhollowayj commented May 10, 2016 •

edited

jhollowayj commented May 12, 2016 •

edited

mrry commented May 16, 2016

jhollowayj commented May 16, 2016

mrry commented May 16, 2016

jhollowayj commented May 17, 2016

mrry commented May 17, 2016

Calling variable.assign() too many times crashes on memory allocation. #2311

Calling variable.assign() too many times crashes on memory allocation. #2311

Comments

jhollowayj commented May 10, 2016 • edited

Environment info

jhollowayj commented May 12, 2016 • edited

mrry commented May 16, 2016

jhollowayj commented May 16, 2016

mrry commented May 16, 2016

jhollowayj commented May 17, 2016

mrry commented May 17, 2016

jhollowayj commented May 10, 2016 •

edited

jhollowayj commented May 12, 2016 •

edited