Bug - tfdbg + multi-gpu gives ValueError: Duplicate node name: 'n/_0'

Hello tensorflow team,

I have been starting to use your tensorflow debugger but have run into the issue that when I try and use it on a multi-gpu model I get `ValueError: Duplicate node name: 'n/_0'`.

Inspecting things closer, I saw that the issue originated from the run_metadata, whose partition graphs have many _Send and _HostRecv ops with names like 'n/_0'.  These ops are replicated with identical names across my towers which is what is causing the issue.

Looking through the tensorflow code, I believe I tracked where this name is set down to [graph_partition.cc:195](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/graph/graph_partition.cc#L195) where the edge's source name is used as the prefix 'n'.  Unfortunately, I have not been able to figure out why the source's name is only 'n', but that seems to be the root of the issue here.

I should add that I never set any tensor name to 'n' anywhere in my own code.  Plus, I see certain tests in your codebase rely on names such as 'n/_0' which indicates to me the name is being set somewhere internally in the tensorflow code.

Any help you can provide would be much appreciated!

### What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

I didn't find any related issues.

### Environment info
Operating System: Ubuntu 14.04.5 LTS (running in a [singularity](http://singularity.lbl.gov) container on a CentOS 6.7 host). 

Installed version of CUDA and cuDNN: 
I am using CUDA 8.0 with NVIDIA driver 367.48, and cuDNN v5.1 . 
(please attach the output of `ls -l /path/to/cuda/lib/libcud*`):
```
libOpenCL.so
libOpenCL.so.1
libOpenCL.so.1.0
libOpenCL.so.1.0.0
libcublas.so
libcublas.so.8.0
libcublas.so.8.0.45
libcublas_device.a
libcublas_static.a
libcudadevrt.a
libcudart.so
libcudart.so.8.0
libcudart.so.8.0.44
libcudart_static.a
libcudnn.so
libcudnn.so.5
libcudnn.so.5.1.5
libcudnn_static.a
libcufft.so
libcufft.so.8.0
libcufft.so.8.0.44
libcufft_static.a
libcufftw.so
libcufftw.so.8.0
libcufftw.so.8.0.44
libcufftw_static.a
libcuinj64.so
libcuinj64.so.8.0
libcuinj64.so.8.0.44
libculibos.a
libcurand.so
libcurand.so.8.0
libcurand.so.8.0.44
libcurand_static.a
libcusolver.so
libcusolver.so.8.0
libcusolver.so.8.0.44
libcusolver_static.a
libcusparse.so
libcusparse.so.8.0
libcusparse.so.8.0.44
libcusparse_static.a
libnppc.so
libnppc.so.8.0
libnppc.so.8.0.44
libnppc_static.a
libnppi.so
libnppi.so.8.0
libnppi.so.8.0.44
libnppi_static.a
libnppial.so
libnppial.so.8.0
libnppial.so.8.0.44
libnppicc.so
libnppicc.so.8.0
libnppicc.so.8.0.44
libnppicom.so
libnppicom.so.8.0
libnppicom.so.8.0.44
libnppidei.so
libnppidei.so.8.0
libnppidei.so.8.0.44
libnppif.so
libnppif.so.8.0
libnppif.so.8.0.44
libnppig.so
libnppig.so.8.0
libnppig.so.8.0.44
libnppim.so
libnppim.so.8.0
libnppim.so.8.0.44
libnppist.so
libnppist.so.8.0
libnppist.so.8.0.44
libnppisu.so
libnppisu.so.8.0
libnppisu.so.8.0.44
libnppitc.so
libnppitc.so.8.0
libnppitc.so.8.0.44
libnpps.so
libnpps.so.8.0
libnpps.so.8.0.44
libnpps_static.a
libnvToolsExt.so
libnvToolsExt.so.1
libnvToolsExt.so.1.0.0
libnvblas.so
libnvblas.so.8.0
libnvblas.so.8.0.44
libnvgraph.so
libnvgraph.so.8.0
libnvgraph.so.8.0.44
libnvgraph_static.a
libnvrtc-builtins.so
libnvrtc-builtins.so.8.0
libnvrtc-builtins.so.8.0.44
libnvrtc.so
libnvrtc.so.8.0
libnvrtc.so.8.0.44
stubs
```



1. A link to the pip package you installed:
I installed tensorflow using `pip install tensorflow-gpu==0.12.1`
2. The output from `python -c "import tensorflow; print(tensorflow.__version__)"`.
```
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
0.12.1 
```                                                                                     


### What other attempted solutions have you tried?

The single GPU case works fine.

### Logs or other output that would be helpful

Here is the dump of some of the problematic nodes.

```
node {
  name: "n/_0"
  op: "_Send"
  input: "__copy_TOWER0/Const_0"
  attr {
    key: "T"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "client_terminated"
    value {
      b: false
    }
  }
  attr {
    key: "recv_device"
    value {
      s: "/job:localhost/replica:0/task:0/gpu:0"
    }
  }
  attr {
    key: "send_device"
    value {
      s: "/job:localhost/replica:0/task:0/gpu:0"
    }
  }
  attr {
    key: "send_device_incarnation"
    value {
      i: 0
    }
  }
  attr {
    key: "tensor_name"
    value {
      s: "edge_545___copy_TOWER0/Const_0"
    }
  }
}
node {
  name: "n/_1"
  op: "_HostRecv"
  input: "^n/_0"
  attr {
    key: "client_terminated"
    value {
      b: false
    }
  }
  attr {
    key: "recv_device"
    value {
      s: "/job:localhost/replica:0/task:0/gpu:0"
    }
  }
  attr {
    key: "send_device"
    value {
      s: "/job:localhost/replica:0/task:0/gpu:0"
    }
  }
  attr {
    key: "send_device_incarnation"
    value {
      i: 0
    }
  }
  attr {
    key: "tensor_name"
    value {
      s: "edge_545___copy_TOWER0/Const_0"
    }
  }
  attr {
    key: "tensor_type"
    value {
      type: DT_FLOAT
    }
  }
}
node {
  name: "n/_2"
  op: "_Send"
  input: "__copy_TOWER0/Sub_0"
  attr {
    key: "T"
    value {
      type: DT_FLOAT
    }
  }
  attr {
    key: "client_terminated"
    value {
      b: false
    }
  }
  attr {
    key: "recv_device"
    value {
      s: "/job:localhost/replica:0/task:0/gpu:0"
    }
  }
  attr {
    key: "send_device"
    value {
      s: "/job:localhost/replica:0/task:0/gpu:0"
    }
  }
  attr {
    key: "send_device_incarnation"
    value {
      i: 0
    }
  }
  attr {
    key: "tensor_name"
    value {
      s: "edge_551___copy_TOWER0/Sub_0"
    }
  }
}

```

End of backtrace at crash point
```
  /home/raphtown/.local/lib/python2.7/site-packages/tensorflow/python/debug/wrappers/framework.py(419)run()                 
-> run_end_resp = self.on_run_end(run_end_req)                                                                              
  /home/raphtown/.local/lib/python2.7/site-packages/tensorflow/python/debug/wrappers/local_cli_wrapper.py(262)on_run_end()  
-> self._dump_root, partition_graphs=partition_graphs)                                                                      
  /home/raphtown/.local/lib/python2.7/site-packages/tensorflow/python/debug/debug_data.py(407)__init__()                    
-> self._load_partition_graphs(partition_graphs)                                                                            
> /home/raphtown/.local/lib/python2.7/site-packages/tensorflow/python/debug/debug_data.py(493)_load_partition_graphs()      
-> raise ValueError("Duplicate node name: '%s'" % node.name)                                                                
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug - tfdbg + multi-gpu gives ValueError: Duplicate node name: 'n/_0' #7051

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

What other attempted solutions have you tried?

Logs or other output that would be helpful

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug - tfdbg + multi-gpu gives ValueError: Duplicate node name: 'n/_0' #7051

Description

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

What other attempted solutions have you tried?

Logs or other output that would be helpful

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions