-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Cannot assign a device to node..." bug in TensorArrayScatter_grad when using pre_scanned tensor in double loop of scan/while/map #5117
Comments
@lukaszkaiser this started from seq2seq, but seems isolated to something else. Maybe you have some insight. |
Maybe it is a problem with dynamic loops and TensorArrays? I see a lot of scan use. I think ebrevdo@ or Yuan might know better. |
@ebrevdo Any clue? |
When you call tf.gradiwnts, there's an option collocate_with_... Do you On Oct 21, 2016 8:24 AM, "yhg0112" notifications@github.com wrote:
|
I've just run with |
We just pushed some better debugging for this to master. It should be On Oct 25, 2016 1:24 PM, "yhg0112" notifications@github.com wrote:
|
alright. i've just did test with master branch, Evaluation works fine as well, but optimization with gradients have been broken again with same error. i guess it's not merged yet. |
Please provide a minimal failing example, full code. Try to reduce it to On Oct 25, 2016 10:46 PM, "yhg0112" notifications@github.com wrote:
|
i'm really sorry about late response. i have just pulled the master branch and re-installed tensorflow ( Isn't my bug-reproducing code example simple enough? the error seems to happen when i try to put |
i've run the example with pip installed version tensorflow ( And the error didn't happen. i think it is solved. thank you all. |
Yay! |
Environment info
tensorflow branch : 0.11.0rc0
CUDA version : 7.0
cuDNN version : 6.5.48
OS version : Ubuntu 14.04.5 LTS
GPU : GPU0 titan x(maxwell), GPU1 Tesla K20c(not using in this code)
(Also using anaconda2 environment and Jupyter with tf.InteractiveSession())
The bug (or is this intended error?)
I was using tf.scan() and tf.map() to code seq-2-seq encoder decoder structure with attention mechanism.
When i tried to put scanned tensor in map_fn() inside another scan(), the graph is drawn as normally and i even can evaluate the value of output tensor.
However when i try to optimize, or get gradient of that tensor, the bug pops up saying
InvalidArgumentError: Cannot assign a device to node 'gradients/scan_1/while/map/TensorArrayPack/TensorArrayScatter_grad/TensorArrayGather/f_acc': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0
.I tried
config.allow_soft_placement = True
, but it only changed the error log and didn't work.It was really awkward that the error log complains about AttrValue must not be the value of DT_STRING_REF when i set
config.allow_soft_placement = True
My code is like ,in simplified version ,following : (i wrote a bug-reproducing example code at the bottom)
encoder_states = tf.scan(_encoder_step, encoder_inputs, initializer=encoder_initial_states)
decoder_states = tf.scan(_decoder_step, decoder_inputs, initializer=encoder_states[-1]
, and in
def _decoder_step(prev_h, inputs):
i usedtf.map_fn()
to get aligned context of encoder states as attention mechanism in https://arxiv.org/abs/1409.0473.It looks like following :
in
_decoder_step(prev_h, inputs)
:error message
with
config.allow_soft_placement = False
:with
config.allow_soft_placement = True
:reproducible example code
The log says about 'scatter()' in 'ops/tensor_array_ops.py' and
_tensor_array_scatter' in
ops/gen_data_flow_ops.py`, which is written in this branch.@ebrevdo would anybody get me some hints about this?
edited :
if i only run the optimizer part without running res, such as,
then it works fine.
but still if i run both of them, sess.run(optimizer) raises
InvalidArgumentError
. could it be problem in my GPU config? actually when i executenvidia-smi
, it says that GPU0 is Tesla k20c and GPU1 is Geforce gtx titan x, but tensorflow says in the reverse order.The text was updated successfully, but these errors were encountered: