CUDA_TOO_MANY_PEERS exception when launching on p2.16xlarge on aws #5362

alexatknit · 2016-11-02T21:30:46Z

This issue is identical to the issue discussed here.

When launching tensorflow to train on more than 8 gpu instances, tensorflow will cause a CUDA_TOO_MANY_PEERS error in the driver. The Tesla K80 shows up as 2 gpu instances internally so even though the p2.16xlarge technically only has 8 gpus it looks to the driver like it has 16. It seems that the cuda p2p system limits any gpu to only connect to a maximum 8 other gpus via p2p, but when the graph is launched, tensorflow (quite understandably) attempts to create a p2p connection between every gpu in the graph.

Is there some way to disable p2p or is there a way via intelligent graph construction to limit the number of connections any given gpu requires, or even use a resource pool for gpu p2p connections?

ubuntu@host:~/workspace/nn$ nvidia-smi
Wed Nov  2 20:53:20 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.93.02              Driver Version: 361.93.02                 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:0F.0     Off |                    0 |
| N/A   45C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000:00:10.0     Off |                    0 |
| N/A   39C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 0000:00:11.0     Off |                    0 |
| N/A   50C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 0000:00:12.0     Off |                    0 |
| N/A   43C    P8    31W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 0000:00:13.0     Off |                    0 |
| N/A   50C    P8    57W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 0000:00:14.0     Off |                    0 |
| N/A   41C    P8    70W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 0000:00:15.0     Off |                    0 |
| N/A   51C    P0    56W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 0000:00:16.0     Off |                    0 |
| N/A   43C    P0    71W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   8  Tesla K80           Off  | 0000:00:17.0     Off |                    0 |
| N/A   46C    P0    56W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   9  Tesla K80           Off  | 0000:00:18.0     Off |                    0 |
| N/A   39C    P0    71W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  10  Tesla K80           Off  | 0000:00:19.0     Off |                    0 |
| N/A   49C    P0    58W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  11  Tesla K80           Off  | 0000:00:1A.0     Off |                    0 |
| N/A   40C    P0    73W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  12  Tesla K80           Off  | 0000:00:1B.0     Off |                    0 |
| N/A   49C    P0    58W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  13  Tesla K80           Off  | 0000:00:1C.0     Off |                    0 |
| N/A   40C    P0    70W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  14  Tesla K80           Off  | 0000:00:1D.0     Off |                    0 |
| N/A   49C    P0    60W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  15  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   41C    P0    69W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
ubuntu@host:~/workspace/nn$ python3 deploy_local.py 
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:0f.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x3bb3f120
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:10.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x3bf86790
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:11.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x3c3cf0c0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:12.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4a7fe070
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:13.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4ac4dd20
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:14.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4b0a16a0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:15.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4b4f9220
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:16.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4b9545d0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 8 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:17.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4bdb3920
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 9 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:18.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4c216790
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 10 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:19.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4c67d450
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 11 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1a.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x4cae8580
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 12 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1b.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x28999af0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 13 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1c.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x28e0bf00
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 14 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1d.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x29282290
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 15 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1e.0
Total memory: 11.17GiB
Free memory: 11.11GiB
E tensorflow/core/common_runtime/direct_session.cc:132] Internal: Internal: failed to enable peer access from 0x1c961680 to 0x18f1ac60: CUDA_ERROR_TOO_MANY_PEERS
Traceback (most recent call last):
  File "deploy_local.py", line 39, in <module>
    n_test_batches=4,
  File "/home/ubuntu/workspace/nn/nn/model/network.py", line 271, in train
    test_steps=test_steps, save_steps=save_steps, load_all=load_all, debug=debug, **kwargs)
  File "/home/ubuntu/workspace/nn/nn/train/train.py", line 86, in train_model
    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, gpu_options=gpu_options)) as sesh:
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1138, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 502, in __init__
    self._session = tf_session.TF_NewSession(opts, status)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InternalError: Failed to create session.

aselle · 2016-11-03T07:23:56Z

@zhengxq, it seems like the recommended way to go about this is to use distributed and give each process on the same host half of the available GPUs using CUDA_VISIBLE_DEVICES as discussed on the link article. Could you confirm?

alexatknit · 2016-11-03T20:44:14Z

I can confirm that deploying as a cluster to half of the devices per worker does not reduce the number of p2p connections attempted when starting a session.

creating ec2 cluster
creating security group
launching instances
waiting for ip addresses to be assigned
cluster config: {
  "ps_hosts": [
    "35.162.204.77"
  ],
  "worker_hosts": [
    "35.162.204.77",
    "35.162.204.77"
  ]
}
setting rules for security group.
waiting for cluster instances to boot up.
starting ps host at: 35.162.204.77
starting worker host at: 35.162.204.77
starting master worker host at: 35.162.204.77
35.162.204.77: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
35.162.204.77: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
35.162.204.77: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
35.162.204.77: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
35.162.204.77: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
35.162.204.77: INFO: starting parameter server 0
35.162.204.77: INFO: starting master worker.
35.162.204.77: INFO: starting training worker 1
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:0f.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd5093eb0
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:0f.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:0f.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7838bed0
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b6e672bb0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:10.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd54e5d30
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:10.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b6eaa38b0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:10.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b78778410
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:11.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd59365b0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:11.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b6eed7d10
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:11.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b78b66470
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:12.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd5d8add0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:12.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b6f310590
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:12.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b78f55830
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:13.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd61e3400
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:13.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b65446460
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:13.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b79346310
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:14.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd663fad0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:14.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b65837ee0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:14.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b79738150
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:15.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd6aa04e0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:15.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b65c2b780
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:15.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b79b2b630
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:16.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd6f04a60
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:16.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b6601f5b0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:16.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b79f1f670
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 8 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:17.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 11.04GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd736cfc0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 8 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:17.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b66414bf0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 8 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:17.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7a314800
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 9 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:18.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 11.04GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd77d9480
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 9 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:18.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b6680b590
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 9 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:18.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7a70acf0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 10 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:19.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 11.04GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7bd7c49a40
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 10 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:19.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b66c03120
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 10 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:19.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7ab02880
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 11 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1a.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 11.04GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f79f80be5b0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 11 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1a.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b66ffced0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 11 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1a.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7aefc630
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 12 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1b.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 11.04GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f79f85369e0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 12 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1b.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b673f71f0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 12 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1b.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7b2f6950
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 13 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1c.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 11.04GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f79f89b2d40
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 13 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1c.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b677f23c0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 13 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1c.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7b6f1b20
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 14 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1d.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f79f8e32fe0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 14 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1d.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.98GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b67bee8f0
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 14 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1d.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.98GiB
35.162.204.77: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f7b7baee050
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 15 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1e.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.99GiB
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 15 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1e.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.98GiB
35.162.204.77: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.162.204.77: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 15 with properties:
35.162.204.77: name: Tesla K80
35.162.204.77: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.162.204.77: pciBusID 0000:00:1e.0
35.162.204.77: Total memory: 11.17GiB
35.162.204.77: Free memory: 10.98GiB
Traceback (most recent call last):
  File "/Users/alexatknit/Workspace/nn/deploy_local.py", line 42, in <module>
    n_test_batches=4,
  File "/Users/alexatknit/Workspace/nn/nn/model/network.py", line 293, in train_cluster
    n_test_batches=n_test_batches, test_steps=test_steps, save_steps=save_steps, load_all=load_all, **kwargs)
  File "/Users/alexatknit/Workspace/nn/nn/train/deploy.py", line 81, in deploy_to_cluster
    **kwargs).result()
  File "/usr/local/lib/python3.5/site-packages/KnitPyUtils-0.0.39-py3.5.egg/knitutils/util/executor.py", line 25, in result
    raise self._result
  File "/usr/local/lib/python3.5/site-packages/KnitPyUtils-0.0.39-py3.5.egg/knitutils/util/executor.py", line 119, in _worker
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/KnitPyUtils-0.0.39-py3.5.egg/knitutils/parallel/cluster.py", line 83, in _submit
    return self.proxy.submit(fn, *args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/KnitPyUtils-0.0.39-py3.5.egg/knitutils/util/server.py", line 202, in method
security group removed.
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError: 
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.39-py3.4.egg/knitutils/util/server.py", line 122, in handle
    result = ('#RETURN', func(*args, **kwargs))
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.39-py3.4.egg/knitutils/util/functions.py", line 48, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.39-py3.4.egg/knitutils/parallel/cluster.py", line 39, in submit
    return self.future_pool.submit(fn, *args, **kwargs).result()
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.39-py3.4.egg/knitutils/parallel/future.py", line 75, in result
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError: 
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.39-py3.4.egg/knitutils/parallel/future.py", line 18, in _deploy_worker
    result = ("#RETURN", fn(*args, **kwargs))
  File "/usr/local/lib/python3.4/dist-packages/KnitNN-0.5.29-py3.4.egg/nn/train/train.py", line 398, in train_master
    server = tf.train.Server(cluster, job_name='worker', task_index=0)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/server_lib.py", line 152, in __init__
    self._server_def.SerializeToString(), status)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InternalError: Internal: failed to enable peer access from 0x7f7b8c286310 to 0x7f7b8e226740: CUDA_ERROR_TOO_MANY_PEERS
scheduling instances for termination: ['35.162.204.77']
waiting for termination to complete.

zheng-xq · 2016-11-03T21:04:13Z

@alexatknit, could you confirm that you have set CUDA_VISIBLE_DEVICES? TensorFlow should not even see the invisible devices in that case.

Even if it is not set, this is a bit weird. Before trying to enabling peer-access to each other GPU, TF calls cuDeviceCanAccessPeer, only if it returns True, TF proceeds to call the next cuCtxEnablePeerAccess. It seems to cuDeviceCanAccessPeer returns True even if the next cuCtxEnablePeerAccess will fail with CUDA_ERROR_TOO_MANY_PEERS.

+@benbarsdell, our friend from NVIDIA. Could you confirm whether cuCtxEnablePeerAccess should fail in this case?

alexatknit · 2016-11-03T21:06:04Z

I have not altered CUDA_VISIBLE_DEVICES, I'll try it out.

benbarsdell · 2016-11-03T21:26:17Z

cuDeviceCanAccessPeer returns True even if the next cuCtxEnablePeerAccess will fail with CUDA_ERROR_TOO_MANY_PEERS

This is correct.

zheng-xq · 2016-11-03T21:30:45Z

@benbarsdell, would it be too big a feature request to have future Cuda drivers return False in that case?

alexatknit · 2016-11-03T21:41:15Z

I've set CUDA_VISIBLE_DEVICES and still have the issue

zheng-xq · 2016-11-03T21:49:29Z

@alexatknit, could you share your command line, your CUDA_VISIBLE_DEVICES setting, and the resulting logs? If the log is too large, feel free to upload somewhere such as pastebin.com and share the URL here.

alexatknit · 2016-11-03T21:49:58Z

My cluster is launched remotely, I set the environment variable in python, my workers and parameter server are launched from a python process listening to a specific port. It looks like the separate processes all have their own environment variables, so unless those need to be set before importing tensorflow in the first place they should be set based on:

gpu_devices = [device.split(':')[-1] for device in devices if 'gpu' in device]
if len(gpu_devices) > 0:
  environ['CUDA_VISIBLE_DEVICES'] = ','.join(gpu_devices)

zheng-xq · 2016-11-03T21:59:10Z

Depending on where you set it, it might be too late. It needs to be set the first time Cuda driver is loaded.

The safest thing to try is to set it in the script that launches your worker and ps.

alexatknit · 2016-11-03T22:07:27Z

There are far too many variables associated with launching a model, I would need to rework my entire system to try that. My listener process seems to be importing tensorflow during the depickle phase which loads the driver, and this state is passed to the child processes.

benbarsdell · 2016-11-03T22:17:35Z

@zheng-xq

would it be too big a feature request to have future Cuda drivers return False in that case?

cuDeviceCanAccessPeer only considers the topology of the system, independent of any other state; I believe that changing this behavior could potentially break existing codes. There are also other reasons why calls to cuCtxEnablePeerAccess may still fail.

alexatknit · 2016-11-03T22:55:25Z

Alright, I updated the architecture to support setting environment variables before deserialization and it appears that it still fails:

creating ec2 cluster
launching instances
waiting for ip addresses to be assigned
cluster config: {
  "worker_hosts": [
    "35.160.255.186",
    "35.160.255.186"
  ],
  "ps_hosts": [
    "35.160.255.186"
  ]
}
setting rules for security group.
waiting for cluster instances to boot up.
starting ps host at: 35.160.255.186
starting worker host at: 35.160.255.186
starting master worker host at: 35.160.255.186
35.160.255.186: env = {}
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
35.160.255.186: env = {'CUDA_VISIBLE_DEVICES': '0,1,2,3,4,5,6,7'}
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
35.160.255.186: env = {'CUDA_VISIBLE_DEVICES': '8,9,10,11,12,13,14,15'}
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
35.160.255.186: I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
35.160.255.186: INFO: starting parameter server 0
35.160.255.186: INFO: starting master worker.
35.160.255.186: INFO: starting training worker 1
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:17.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e64e3f6b0
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:0f.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:0f.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e60e3c540
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebd0934f0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:18.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e6522b7e0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:10.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebd4e4f80
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 1 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:10.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e612288e0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:0f.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 10.99GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e65619840
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:11.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebd9357e0
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 2 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:11.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e61616d00
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:10.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 10.99GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e65a08fc0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:12.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebdd89f40
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 3 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:12.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e61a060c0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:11.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 10.99GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e65df9aa0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:13.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebe1e2600
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 4 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:13.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e61df67e0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:12.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 10.99GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e661eb520
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:14.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebe63ec40
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 5 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:14.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e621e8260
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:13.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 10.99GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e665dedc0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:15.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebea9f5a0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 6 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:15.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4e625dbb00
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:14.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 10.99GiB
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:16.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebef03bb0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 7 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:16.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.05GiB
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 1 2 3 4 5 6 7
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 1:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 2:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 3:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 4:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 5:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 6:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 7:   Y Y Y Y Y Y Y Y
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:17.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:00:18.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:00:0f.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:00:10.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:4) -> (device: 4, name: Tesla K80, pci bus id: 0000:00:11.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:5) -> (device: 5, name: Tesla K80, pci bus id: 0000:00:12.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:6) -> (device: 6, name: Tesla K80, pci bus id: 0000:00:13.0)
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:7) -> (device: 7, name: Tesla K80, pci bus id: 0000:00:14.0)
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 8 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:17.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 565.56MiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebf36c090
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 9 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:18.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 504.81MiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebf7d8550
35.160.255.186: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job ps -> {0 -> 127.0.0.1:40842}
35.160.255.186: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:197] Initialize GrpcChannelCache for job worker -> {0 -> 127.0.0.1:50425, 1 -> localhost:52197}
35.160.255.186: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:206] Started server with target: grpc://localhost:52197
35.160.255.186: INFO: training model res_net_v10_20161103_2_fcn.0
35.160.255.186: INFO: setting up device: '/job:worker/task:1'
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 10 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:19.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ebfc48b00
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 11 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:1a.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ce40bd6e0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 12 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:1b.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ce4535a80
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 13 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:1c.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ce49b1e00
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 14 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:1d.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
35.160.255.186: W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x7f4ce4e320b0
35.160.255.186: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
35.160.255.186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 15 with properties:
35.160.255.186: name: Tesla K80
35.160.255.186: major: 3 minor: 7 memoryClockRate (GHz) 0.8235
35.160.255.186: pciBusID 0000:00:1e.0
35.160.255.186: Total memory: 11.17GiB
35.160.255.186: Free memory: 11.11GiB
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.40-py3.4.egg/knitutils/parallel/future.py", line 18, in _deploy_worker
    result = ("#RETURN", fn(*args, **kwargs))
  File "/usr/local/lib/python3.4/dist-packages/KnitPyUtils-0.0.40-py3.4.egg/knitutils/parallel/cluster.py", line 23, in _run_worker
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/KnitNN-0.5.30-py3.4.egg/nn/train/train.py", line 406, in train_master
    server = tf.train.Server(cluster, job_name='worker', task_index=0)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/server_lib.py", line 152, in __init__
    self._server_def.SerializeToString(), status)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InternalError: Internal: failed to enable peer access from 0x7f4e79688120 to 0x7f4e7aa4ebb0: CUDA_ERROR_TOO_MANY_PEERS
scheduling instances for termination: ['35.160.255.186']
waiting for termination to complete.

vrv · 2016-11-03T22:59:32Z

If we change it from an error to a warning, would that make people happy? :)

vrv · 2016-11-03T23:09:58Z

(I'm going to send out a change that does this, and then we can refine it if it turns out to be bad). I think enough people are bitten by this that don't really care, that we should just fix it for the common case.

alexatknit · 2016-11-03T23:37:34Z

@vrv does the change make sure to prioritize the p2p connections that are in the graph? I wouldn't want to have an issue where gpus 8-15 can't talk to each other even though they're the only ones the graph is deployed to in that particular tf session.

edit: (I guess that can be controlled by CUDA_VISIBLE_DEVICES)

vrv · 2016-11-03T23:45:01Z

Yeah, I would try to use CUDA_VISIBLE_DEVICES if you really don't plan on using the GPUs in your graph. All the change will do is not fail when it currently fails, so things might get a bit slower, but at least it will tell you via the warning logs, instead of just failing the job.

@llhe

* Expanded docstrings for extract_dask_data and extract_dask_labels. Change: 138123658 * Update generated Python Op docs. Change: 138125802 * Optimize image copy loop and correct swapped R and B values. Change: 138126747 * Add a fused adjust_hue that is 6-7 times faster on CPU. The GPU kernel will be added in a separate CL. For now, the fused implementation can be chosen with the environmental variable "TF_ADJUST_HUE_FUSED=1" Before the CL: benchmarkAdjustHue_299_299_3_cpu1 step_time: 15951.86 us benchmarkAdjustHue_299_299_3_cpu_all step_time: 2732.68 us After the CL: benchmarkAdjustHue_299_299_3_cpu1 step_time: 2450.50 us benchmarkAdjustHue_299_299_3_cpu_all step_time: 399.18 us Change: 138128641 * Replace usages initialize_all_variables -> global_variables_initializer Change: 138128703 * Update ops-related pbtxt files. Change: 138130752 * Update generated Python Op docs. Change: 138131227 * Temporarily disable ASAN testing for scatter_nd. Change: 138131457 * Update Android inference interface to take variable number of dimensions. Change: 138133395 * Splitting Estimator interface into two: - Estimator that supports input_fn only (x,y,batch_size are depreacated) - SKCompat wrapper that takes Estimator and has x,y,batch_size arguments. See updated tests for example usage. Change: 138134485 * Internal cleanup. Change: 138135876 * Update generated Python Op docs. Change: 138136264 * Update how we force pip package to be "Root-Is-Purelib: false", as wheel 0.25.0 version or later no long has distribution.is_pure attribute. This change will cause the pip package tags to change based on the system it is built on. Update pip.sh to rename mac pip files(we already rename for linux) so pip names still matches our upload to gs script. Change: 138136631 * Adds feature column functions to exported docs. Fixes tensorflow#3622 Change: 138137511 * check_ops BUGFIX: Call convert_to_tensor on args before doing anything else! Change: 138138065 * Refining the condition for parallel IsDirectory(). Avoiding "defined()" for TARGET_OS_IPHONE since it is often defined as zero. Removing "defined(__ANDROID__)" since the problem with the android emulator cannot be reproduced now. Change: 138138237 * Switch callers of tf.pack and tf.unpack to call tf.stack and tf.unstack instead. Change: 138139542 * Record which op was used for training automatically in the optimizer. Change: 138139550 * Removed newline char in logs. Change: 138140095 * GPUDevice: if enabling peer access fails, log a warning but do not return an error. On some systems and GPUs, the driver may report being able to enable peer access between two devices, but trying to do so still fails. The system can still run, though possibly slower than if peer access were enabled. Since we cannot disambiguate between supported and unsupported cases in which this happens, we demote this to a warning, with the exception being that if *no* device could enable peering even though it should be possible, we still return an error. Fixes tensorflow#5362, hopefully. Change: 138141024 * Anonymize feature column names. Change: 138142843 * Merge changes from github. Change: 138143557 * Update ops-related pbtxt files. Change: 138144503 * Update generated Python Op docs. Change: 138144913 * tf.summary ops remove leading slashes from their names This ensures easier conversion to the new ops, as some clients construct tag names with leading forward slashes. Change: 138148288 * Minor tweak to matmul shape mismatch error message Change: 138149179 * address review feedback * TensorBoard compresses histograms only when the data will be stored in the reservoir. This is a performance optimization, since histogram compression is relatively expensive. Change: 138149635 * Loosen equality test. Change: 138155531 * Disable ASAN for scatter_nd test (python). Change: 138156808 * Speed up TensorForest by limiting the leaves that FinishedNodes looks at. Change: 138182239 * Change for internal compatibility. * Fix performance issues with the Embedding Projector plugin. - Setup data handlers in the class c-tor instead of on every GET request. - Use only 1 instance of the plugin, instead of creating a new instance on every GET Request. - Memorize readers, and other data, and share it across the multiple threads of TB server. Previously # of checkpoint readers = # of GET requests, ups. - Checkpoints can be big (order of 1G). Reading it takes a while. Only start reading checkpoints if someone explicitly opens the embeddings tab. Previously the checkpoint was read on TB startup, which slows down TensorBoard. Change: 138187941 * tfdbg: Fix a bug related to regex search and scrolling when there is line wrapping The issue: Prior to this CL, if there are is wrapping and you do regex search and scrolling, the output pad will scroll according to line indices in the original (unwrapped) text lines instead of the wrapped text lines, which misses the actual matches. This CL fixes this issue. Change: 138191265 * Add readme for SavedModel in C++. Change: 138194799 * Partially revert OSS merge e0ed7a8. Change: 138200850 * Revert to using log(1+x) instead of log1p(x). Change: 138201323 * Updated deprecation module to print location of call to deprecated function/arguments to make it easy to find. Updated deprecated_args to print which args exist if wrong argument was passed. Change: 138201843 * Docstring example and formatting updates Change: 138202348 * Streamline and canonicalize docstring for weighted_sum_from_feature_columns. Change: 138202979 * Expose tf.graph_util explicitly and seal its interface. Change: 138204056 * Use aligned vector stores for random ops in CUDA. Store the output generated by a single call to PhiloxRandom in a single vector, with partial specializations for float4, int4, double2, and long2. The unspecialized SampleCopier template does not use a vector, and loops over the individual outputs. This speeds up RandomUniform (probably the most IO-bound random op) by about 33%. The other random ops (which are probably compute-bound) are slightly improved. Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------------------- BM_gpu_RandomUniform/1M 226000 376443 1846 2.594G items/s BM_gpu_RandomUniform/2M 407823 640758 1000 3.048G items/s BM_gpu_RandomUniform/8M 1445044 1801758 401 4.336G items/s BM_gpu_RandomNormal/1M 455218 725107 917 1.347G items/s BM_gpu_RandomNormal/2M 816297 1131202 610 1.727G items/s BM_gpu_RandomNormal/8M 3015506 3377695 213 2.313G items/s BM_gpu_TruncatedNormal/1M 1015598 1332691 515 750.361M items/s BM_gpu_TruncatedNormal/2M 1907191 2260752 312 884.661M items/s BM_gpu_TruncatedNormal/8M 7292608 7789287 100 1.003G items/s Change: 138205581 * Disable writing the checkpoint state proto in SavedModel. Change: 138206504 * Small fix to C++ gradient checker. Some ops forward the input tensor's buffer to the output. This was causing an incorrect centered difference calculation. Also added RefIdentity Op and converted IdentityGrad Test to use the gradient checker. Change: 138207419 * Autogenerated Change: Change TensorBoard TAG to 37 Change: 138207958 * Fix class docstring. Change: 138209941 * Add Substr Op (tensorflow#4338) * Add initial code for substr op * Fix naming of substr Conflicts: tensorflow/core/kernels/BUILD * Add complete substr op, tests, and documentation * Update date 2015->2016 * Remove unnecessary header includes * Add dtype specific tests * Add element-wise version of Op, comments for bcast * Removed cheeky line break in string_ops.py * Change commented `int` vars to `size_t` * Add case statements for static template dimensions * Clean substr_op, update shape function, more tests * Adds info about pos/len needing the same shape * Remove #include of bcast statement * Add working broadcast code with tests * Clean line widths and comments * Add broadcasting note to string_ops.cc * Add print_function __future__ import to tests * Add comment in to shape function in string_ops.cc * Add SubtleMustCopy wrapper for scalar pos/len * Add SubtleMustCopy to element-wise path * Add SubtleMustCopy and FastBoundsCheck, tests * Add examples to documentation in string_ops.cc * Avoid repeat accessor calls to input tensor * Reorder variable declarations in loop * Reorder python imports to be alphabetical * Replace usages all_variables -> global_variables, GraphKeys.VARIABLES -> GraphKeys.GLOBAL_VARIABLES Change: 138212111 * Remove a premature checkpoint existence test in Saver.restore(). The correct way to perform such a check is inside the Op kernel. Change: 138212174 * Upgraded to the latest version of Eigen that adds support for AVX512 instructions Change: 138212280 * In word2vec tutorial, fix word choice to "Section" rather than "Chapter" when referring to the paper. Change: 138213623 * Fix class docstring example to be dataset-independent. Change: 138215557 * Internal change. Change: 138215608 * Update generated Python Op docs. Change: 138215748 * Change FileExists to return tensorflow::Status. Also done separately by @llhe at github.com/tensorflow/pull/5370. We needed to do this change internally to fix all callers. Motivation: The existing FileExists interface doesn't allow callers to distinguish between file not found vs. filesystem errors. Semantics changes: - gfile.Exists in Python now throws an exception for filesystem errors. It continues to return true/false if it can accurately determine whether a file exists. - RecursivelyCreateDir now returns errors for filesystem errors when calling FileExists. Change: 138224013 * Add environment variable USE_DEFAULT_PYTHON_LIB_PATH to python_config.sh to configure usage of default python library path without user interaction. (tensorflow#5397) * Fixed error with NaN elements (tensorflow#5024) * C++ QueueRunner: Bug fixes. Three bug fixes: (1) There was a thread-unsafe access to runs_ which could result in the queue close operation being invoked multiple times. (2) The Run() loop would not exit when there were multiple threads and the queue was closed (i.e., the enqueue failed with a queue_closed_exception_types_ error). Without this fix, the changed QueueRunnerTest.QueueCloseCode test would fail with a timeout since qr->Join() would be blocked on the never-exiting Run() call (3) Errors in invoking the close operation were being ignored. Without this fix, the added QueueRunnerTest.QueueCloseFails test would fail as Join() would return OK instead of NOT_FOUND Two other minor changes: - Slight simplification to QueueRunner::Run() so that runs_ is manipulated only once and the body of the loop is clearer - Avoid starting an extra thread which will not be used when there is no Coordinator. (Though in practice I suppose we always intend to have a coordinator). Change: 138228243 * Add ops definitions for graph transfer to SOC Change: 138233065 * Correcting documenting in slim.learning. Change: 138237011 * Add input and output parameters of nodes to transfer graph to SOC Change: 138238179 * Make overlay cover the full screen and provide built-in requestRender call in CameraActivity. Change: 138238290 * Anonymize feature column names. Change: 138240000 * Ignore tools/git/gen and fix a bash error in python_config.sh (tensorflow#5405) * Ignore tools/git/gen * Avoid bash error in python_config.sh Without this change, I get Please specify the location of python. [Default is /usr/bin/python]: Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] No Hadoop File System support will be enabled for TensorFlow ./util/python/python_config.sh: line 124: [: : integer expression expected Found possible Python library paths: /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages /usr/local/buildtools/current/sitecustomize Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages] The problem is that -eq is valid only for integers on both sides. * Fix dependency in SavedModel py. Change: 138246915 * Update index.md (tensorflow#5402) added link for OVL * Fix a bug that in cost model manager, graph ptr is used as key of map but memory allocator returns same addresses for graph objects when the graph mgr is being used repeatedly. Change: 138251588 * Use portpicker library when getting a port in localhost_cluster_performance_test and sync_replicas_optimizer_test. Change: 138253057 * reduce_logsumexp fix for reduction_indices. Fixes issue tensorflow#5291 (tensorflow#5292) * Fix issue 5291 * Update tensorflow#5291 Update from comments * Add unit test * remove extra space * Update math_ops.py Revert back to initial approach * wrap to 80 cols by restructuring * Fix windows build. (tensorflow#5411) * Fix windows build. Add resource_variable_ops to cmake * Update the document of windows cmake build. * fixed the equation of the sequence increase * Make ptb_word_lm tutorial test more robust against data file download failures Aims to address recently flakiness in the tutorial test. See example failure log at: http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=mac-slave/281/console Change: 138300959 * Seal errors interface. Change: 138332031 * Update generated Python Op docs. Change: 138332522 * Switched to the latest version of farmhash Change: 138337381 * Update README.md. Wrong graph name. (tensorflow#5429) * added full relative path to compile_ios_protobuf (tensorflow#5424) The command before and after this command have the full path. That was confusing for me and is a mistake if you ask me. * Fix selection of the very first point in the dataset. 0 (index of the first point) evaluates to false in JS, which resulted in the false conclusion that there was no selection Change: 138387835 * as well -> as well as (tensorflow#5442) as well -> as well as * Pylint `disable` fix (tensorflow#5418) * Started synthetic * Added circle synthetic dataset. partially resolves tensorflow#5314 * Implemented CLs to resolve tensorflow#5314 * Adding py_test srcs to BUILD * Added to fix tensorflow#5309 * Removed accidental contamination from * Moved the 'enable=wildcard-import' closer to the 'disable' * Added more of the 'enable=wildcard-import' and removed unused ones * Ignore tools/git/gen (tensorflow#5457) * Prevented the graph visualizer from making bridge paths to nodes with a in or out degree (depending on whether the bridge path is in or out bound) above 4. This prevents too many bridge paths from emanating out of a subhierarchy (metagraph) and crowding up. Change: 138403730 * Better error message for "No gradients provided for any variable" case. Address a few pylint warnings. Change: 138407267 * Update generated Python Op docs. Change: 138409090 * Fix windows cmake build by replacing enum class by enum Change: 138409704 * Reload the checkpoint reader when the latest checkpoint has changed. The embedding projector plugin caches the checkpoint reader, which is pointing to the latest checkpoint when TensorBoard was started. But after some time, the saver will remove that old checkpoint file (keeps only N latest checkpoints), while the reader is still pointing to it. Also add more tests regarding the V1 and V2 checkpoint versions. Change: 138416977 * [Windows/CMake] Remove dependency on zlib.dll. (tensorflow#5456) Instead, use the static zlibstatic.lib, as originally intended. Fixes tensorflow#5275. * Added option to sort the tooltip items from closest to farthest from the mouse cursor. Change: 138418340 * Fix Conv3d with unequal strides on CPU (order of dimensions got mixed up between TF and Eigen). Add tests covering the unequal stride case. Change: 138421428 * Remove now-unused TensorFlowImageListener class. Change: 138425944 * Remove 'projectedPoint' from DataSet. ProjectorScatterPlotAdapter creates a Float32Array of data point positions, and passes it into onPointPositionsChanged. Overhaul the visualizer interface, change 'onRecreateScene' to 'setScene', change 'removeAll' to 'dispose', change 'onUpdate' to 'onPointPositionsChanged', delete 'onDataSetChanged' entirely. Remove 'sceneIs3D' and 'backgroundColor' from onRecreateScene, they never really belonged there. The scene is /always/ 3D, the visualizers are really interested in the camera projection method, so now they query that. backgroundColor moved into the RenderContext, where it probably always belonged. Sprite visualizer now observes new point arrays and creates the WebGL representation only if one doesn't already exist, or its buffers are too small to hold the current point array. ScatterPlot directly uses THREE.AxesHelper, delete scatterPlotVisualizerAxes. (we might add it back later when scatter plot becomes more of a general-purpose reusable component, but it will need lots of work anyway). Add 'util.vector3FromPackedArray', which loads array[i], array[i+1], array[i+2] into a THREE.Vector3 and replaces 'util.getProjectedPointFromIndex'. Projector no longer automatically renders the scatter plot when the positions / attributes change. Break the trace visualizer's dependency on the selection context, it was causing problems when dataset filtering was being canceled. Add various Float32Array buffers for opacity and line width, generate them in the scatter plot adapter, and pass them into the render context with the rest of the 'attribute' data. When switching between sprites + labels mode, always add the trace visualizer. Change: 138426641 * Add ci scripts for cmake build and test on windows. * fix windows gpu build (tensorflow#5421) * msvc does't support unroll. for now let the compiler take care of it * fix windows gpu build ambiguous / operator (doesn't know if it should use float, double ...) * Set TF_NEED_OPENCL=0 by default on Windows * Wait for jvm to shutdown completely before remove bazel install directory * Fix a typo in export meta graph sample code. (tensorflow#5472) * Internal change. Change: 138451572 * Changes to scatter_nd ops * Rewrite CPU impl to be single-threaded and use vectorization; avoids race conditions. Removes use of the generator. * Remove scatter_nd_mul and scatter_nd_div to reduce binary size until we figure out a better way to reduce the templating pain * Modify scatter_nd to add for repeated indices as opposed to update (this is the appropriate gradient for gather_nd, for example) * Clean up docstrings. Change: 138452341 * C API: Rename TF_Session to TF_DeprecatedSession. This is one step towards having a stable C API by the time we get to TensorFlow 1.0. A follow-up step will involve renaming TF_SessionWithGraph to TF_Session. We want to encourage all client languages to use TF_SessionRun, TF_GraphImportGraphDef etc. (instead of TF_Run and TF_ExtendGraph), hence the choice of names. Ideally, all client languages will use these functions, but it is likely that Python will continue to use TF_DeprecatedSession for a while. Change: 138454433 * Update ops-related pbtxt files. Change: 138455935 * Update generated Python Op docs. Change: 138456486 * C++ gradient checker support multiple inputs and outputs. Also added a few more array ops. Change: 138456760 * Proofread and organization edits for the tfdbg tutorial. Change: 138466309 * Make Coordinator::RegisterRunner and Coordinator::Join thread-safe. Change: 138467240 * Move the orthographic camera farther away from the origin, it was lying on the edge of the unit cube and points on the cube were getting clipped. Change: 138470802 * Updated benchmark program to handle multiple inputs and outputs Change: 138471362 * Switch Pack and Unpack to use gradient checker. This uncovered a silly bug during idx calculation in the gradient checker which is fixed in this CL. Change: 138471470 * tfdbg core: let RunStateArgs hold a debugger state object, the destructor of which can do cleanups such as closing gRPC streams Change: 138472618 * Improve performance of tf.subscribe on large graphs by caching control outputs. Change: 138473617 * Auto-fetch Inception model assets for Android demo, so that manual download/extract step is unnecessary. Change: 138475677 * Refactor replica_device_setter to allow custom placement strategies Also introduces a scaffold in tf.contrib.training for load-balanced placement strategies Change: 138476980 * Update generated Python Op docs. Change: 138478269 * Add ability to debias 0-initialized EMAs in `assign_moving_average`. Change: 138481437 * Update generated Python Op docs. Change: 138483426 * Fix MetricSpec errors for non-functions as metrics. This fixes a failure where a non-function is used as a metric, fails, but raises an exception during error logging, hiding the real exception. Change: 138488393 * Fix typo in comment. Change: 138518927 * Add fake quant ops to mobile build targets. Change: 138526951 * Dropped support for gcudacc in stream_executor. Change: 138529294 * Add --quantized_fallback_min and --quantized_fallback_max for use when experimenting with quantized graphs. Change: 138529416 * Add precision and recall to Tensorforest metrics. Change: 138531687 * Clarify string splitting for unicode sequences. Change: 138532593 * Explicitly set `zero_debias` in moving averages to the default. This CL is a noop. Change: 138532885 * Update ops-related pbtxt files. Change: 138534971 * Update generated Python Op docs. Change: 138536102 * Make contrib.distributions.kl lookup properly handle subclasses. If no direct KL registration is found between classes A and B, their parents are searched for a registered KL function. The KL method whose registraiton has the shortest sum MRO distance to the child classes is used. Change: 138539857 * Let the DNNRegressor constructor accept an optional label_dimension argument. Change: 138540603 * Update generated Python Op docs. Change: 138541463 * Reformat markdown. Change: 138541907 * Switch callers of tf.pack and tf.unpack to call tf.stack and tf.unstack instead. Change: 138542316 * Remove unsupported footnote Change: 138546162 * Support printing Variables with the contrib print_op. Change: 138546512 * Remove uses and deprecate listdiff (use setdiff1d) Change: 138548514 * Fix forward reference. Change: 138549657 * Disable frustum culling for all primitives. Change: 138549699 * Serve tensors from server as bytes to avoid the ~300MB browser string limit. Also store data points as Float32Array[] instead of number[][], i.e. each data point is Float32Array. This reduces memory usage by 2x. Change: 138550170 * Minor docs improvement. Change: 138550740 * Give a friendly error message if parallel_iterations is set to be less than 1. Change: 138551385 * Optimize construction of tril ids. The complexity remains O(n^2) but the constant is now at least half as large. Change: 138551758 * Update generated Python Op docs. Change: 138553113 * Changes `zip(*gradients)[0]` to `list(zip(*gradients))[0]` for Python 3 compatability. Change: 138556978 * Add docs for the estimators with examples. Change: 138557107 * Adds feature column ops functions to exported docs. Change: 138557455 * Update generated Python Op docs. Change: 138558155 * Seal queue_runner's interface. Change: 138568627 * Render scatter plot later, when the container has height > 0. Change: 138568629 * Autogenerated Change: Change TensorBoard TAG to 39 Change: 138568722 * Minor formatting fixes. Change: 138569121 * Minor fixes to tbdbg codelab to improve clarity. Change: 138571194 * Handle partitioning of incomplete graphs gracefully. The added test would lead to a SIGSEGV without the accompanying changes in graph_partition.cc. In practice, this code path can be triggered by invoking: Session::Extend(graph_def); followed by: Session::Run() when graph_def incompletely specifies a graph (such as the one in the test, where not all inputs of the target node have been specified). Change: 138573807 * Use the native Eigen round() method instead of std::function<float(float)>(round) which doesn't get vectorized and doesn't compile with gcc 6.2 (see tensorflow#5419) Change: 138575304 * Add documentation for TensorForestEstimator. Change: 138582357 * Cleanup: Consolidate specification of what constitutes a crossable column into a single function. Change: 138582455 * Rename tf.Tensor to tf.Output - Swap the alias direction and fix tests broken by the swap - Export Output in tensorflow.__all__ Change: 138583261 * Update generated Python Op docs. Change: 138586427 * Add a WriteTextProto method that is analogous to WriteBinaryProto but for textual protobufs. Change: 138586949 * Fix race conditions in tests due to using same folders for saver. Change: 138587097 * Refactor fold_constants to make reuse easier. Change: 138588267 * tfdbg CLI: enable highlighting of tensor elements by range in print_tensor Command example: $ blaze build -c opt third_party/tensorflow/python/debug:debug_mnist && \ blaze-bin/third_party/tensorflow/python/debug/debug_mnist --debug tfdbg> run tfdbg> pt hidden/Relu:0[0,:] -r [0.25, 0.75] tfdbg> pt hidden/Relu:0[0,:] -r [0.75, inf] tfdbg> pt hidden/Relu:0[0,:] -r "[[0, 0.25], [0.75, inf]]" Change: 138590602 * Optimize slice for 1D tensors. Avoid copy for aligned slices of 1D tensors. The previous alignment check returned false for 1D aligned slices, resulting in an unnecessary copy. Change: 138637436 * Fixes bug in _ResizeNearestNeighborGrad Prevents `ValueError: Cannot convert a partially known TensorShape to a Tensor...` if any value of `image.get_shape()[1:3]` is None. * C API: Rename TF_SessionWithGraph to TF_Session. This makes the preferred API for graph construction and execution have the simpler, more sensible names (TF_Session, TF_NewSession etc.) and the older one which we hope to eventually remove have the more awkward ones (TF_DeprecatedSession, TF_NewDeprecatedSession etc.) Change: 138641615 * Add support for sparse access to resource variables via embedding_lookup. Change: 138642972 * C API: Slight re-organization of code that deletes input tensors. This change itself is a no-op. However, my plan is to change the API contract with the TF_Session*Run functions in a follow up change so that they do NOT take ownership of the input tensor values. This just makes it slightly easier to do so. Change: 138646376 * bayesflow: replace SampleAndReshapeValue with SampleValue() Change: 138649779 * Update generated Python Op docs. Change: 138651316 * A few cleanups, now that V2 format has been switched on: * Switch BaseSaverBuilder's default write_version to V2 as well. This shouldn't affect users since most use Saver directly. * Doc cleanups. Change: 138653674 * Update generated Python Op docs. Change: 138657291 * Fix minor documentation in iOS build instructions. Thanks @JosiahKane. Fixes tensorflow#5324 Change: 138659892 * More descriptive errors when passing something of the wrong type to an op function. Change: 138661931 * Control the number of rows (tensor size) we send to the client via a query parameter. Let the client decide the size of tensors it can handle, instead of the server having that control. Change: 138665737 * gradient for ExtractImagePatches when batch size is None * Fix call_cpp_shape_fn's option for comparing to python (needs to convert result from proto TensorShape before comparison). Change: 138668146 * Adds detail and example to LogisticRegressor docstring Change: 138669271 * s/sample_n/sample/ in most unittests. Change: 138670422 * Update bower dependencies to match internal versions. Change: 138673511 * Merge changes from github. Change: 138675832 * Enable reuse of a DirectSession after a run times out. Fixes tensorflow#5115. Change: 138676008 * Update ops-related pbtxt files. Change: 138676315 * Add a unit-test to check that multiple signatures objects in ContainerDef is not a valid configuration. Change: 138676614 * De-flake tests. Change: 138678513 * Make notification messages use <paper-toast> to be consistent with material design. Also if the original tensor is bigger than what the browser can handle, notify the user that the tensor was trimmed. Change: 138679248 * Update generated Python Op docs. Change: 138679407 * Enable check of solution in underdetermined case for matrix_solve_ls. We have switched to using CompleteOrthogonalDecomposition in Eigen, which computes the minimum-norm solution. Change: 138679456 * Added missing license information Change: 138679970 * Fix byte_size_load_fn for scalar variables Change: 138680427 * Create tf.Summary.FileWriter, the replacement for tf.train.SummaryWriter In a future CL, all example usage will be migrated, and tf.train.SummaryWriter will be deprecated. Change: 138683254 * Explicitly set `zero_debias` in moving averages to the default. This CL is a noop. Change: 138684361 * Update generated Python Op docs. Change: 138685441 * Deprecate the old tf.X_summary ops. There is now a deprecation warning for each one, saying they will be removed by 11/30/2016, with instructions on migration. Change: 138687264 * Added a new rule to handle the OpenCL backend: we comment it out in google3, and enable it in github. This is because we haven't imported the backed in google3 just yet. Change: 138689620 * Remove stray text from index.md (tensorflow#5506) * Update generated Python Op docs. Change: 138691541 * More efficient implementation of tf.einsum(). The current implementation generates intermediate tensors whose size grows exponentially as a function of the sum of the ranks of the input tensors. The new implementation reduces to batch matrix multiplication, which limits the size of intermediate tensors to the size of the intermediate products. This also allows the function to benefit from GPU acceleration. Benchmarking: The following einsum() multiplies two 1000 x 1000 matrices: m = tf.random_normal((1000, 1000)) x = tf.einsum('ij,jk->ik', m, m) Timing results on a Z440 workstation: Before: 7s, 4GB of ram After: 0.04s, 80MB of ram Change: 138693729 * Update generated Python Op docs. Change: 138697361 * Expose new division operators into the Python API i.e. truncatediv, floor_div (will be renamed to floordiv once that is deleted), truncatemod, floormod, realdiv. Change: 138697422 * List all the files in third_party/sycl in a new "all_files" filegroup, which is then added to the list of dependencies in the main "all_opensource_files" filegroup Change: 138698669 * Fixed formatting and lint issues introduced with the last pull from OSS (cl/138675832) Change: 138699007 * Alias summary-related protos as tf.summary.Event, tf.summary.Summary, etc This is part of the general Summary-related API reorganization leading up to the TensorFlow 1.0 release. We're moving all Summary-related functionality to the tf.summary submodule. After this change, we will have TensorFlow use the new endpoint, and deprecate tf.Summary, tf.Event, etc. Full list of aliases created: tf.summary.Summary tf.summary.SummaryDescription tf.summary.SessionLog tf.summary.TaggedRunMetadata tf.summary.Event I updated the usage in a few tests, just to verify the change works. Change: 138701324 * Fix metagemm crash, make sure scratch memory is aligned to 32 bytes. Change: 138701878 * Update generated Python Op docs. Change: 138702302 * also allow the number of channels to be unknown in the gradient computation of ExtractImagePatches * Internal changes. Change: 138710941 * Restoring the camera from a bookmark requires restoring the orbit controls as well, otherwise the orbit controls will try to 'look at' the target of (0,0,0) and introduce an undesired rotation. This is extremely non-obvious in the 2d (orthographic) projection case. Fix is to always initialize position0, zoom0, and target0 in the orbit controls when creating a camera. Extract makeOrbitControls out from duplicated code in makeCamera2D and makeCamera3D. Change: 138710960 * Improve tfprof doc. Support binary GraphDef. Change: 138711410 * Add an error message if the sprite image exceeds 8192px, which is empirically the max size for the sprite image texture. Change: 138711577 * C++ gradients: Add a few more c++ gradient functions. MatrixBandPart, GatherNd, CheckNumerics, Reshape, ExpandDims, Squeeze, Transpose. Change: 138712881 * Making the cancel op for shared queue a no-op so that we can recover from PS crashes / restarts without crashing all workers. Change: 138722939 * Update generated Python Op docs. Change: 138723228 * Fix incorrect gradient w.r.t. A for matrix_solve_ls in the underdetermined case. Add missing name to gradient test that caused most tests for matrix_solve_ls_grad to be skipped. Set proper initial values in linalg_grad_test and tighten test tolerances for float64. Change: 138725515 * Refactors dynamic_rnn_estimator to use a model_fn and Estimator, rather than inheriting from BaseEstimator. Also adds the capability to predict probabilities. Change: 138761836 * Make sure to uninstall the old pip package and install the newly built one when running windows tests on TF CI. * Stop remembering when the readahead buffer reached EOF. This change reverts a previously added optimization for small files and thus unblocks the scenario used by TensorBoard when the same RandomAccessFile object is read for fetching new content appended to the file. Change: 138775790 * changing the api for ones initializer to be consistent with the other callable initializers BREAKING CHANGE!! Change: 138778571 * Deprecate tf.learn.Monitors. Change: 138781079 * Make Python _TileShape op handle case where input's rank is not known. Change: 138783199 * Update generated Python Op docs. Change: 138783558 * zeroslike/int64 on gpu Change: 138787117 * Allow strings to be read in tensor_slice_reader Change: 138793614 * Fix bug introduced in manual merge. Change: 138793949 * [Windows/CMake] Enable avgpooling_op.cc. Fixes tensorflow#5517. * Fix inconsequential compiler warnings (tensorflow#5535) * Use the same integer types in 'for' statements and other comparisons * Initialize unused values on failure to avoid compiler warnings * Add dummy returns on failure to avoid compiler warnings about missing return values * Removed line-continuation tokens from commented-out code * Avoid unused variable warning for 'parsed_colon' * add support for fetch and feed session.run conversion functions (tensorflow#5094) * get rid of ambiguous operator and enable avgpooling for gpu builds on windows * Fix CMake config if used as a subproject With this fix we don't assume that the tensorflow project is the build root. Instead CMAKE_CURRENT_SOURCE_DIR/CMAKE_CURRENT_BINARY_DIR is used instead of just CMAKE_SOURCE_DIR/CMAKE_BINARY_DIR, which now builds successfully, if it is built inside another, larger project. * Minor corrections to "Adding a New Op" docs (tensorflow#5579)

aselle added bug type:build/install Build and install issues and removed bug labels Nov 3, 2016

aselle added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Nov 3, 2016

aselle assigned zheng-xq Nov 3, 2016

vrv closed this as completed in cfccd7c Nov 4, 2016

Mistobaan mentioned this issue Nov 22, 2016

Improve CUDA peer to peer access to support Amazon P2 instances #5789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA_TOO_MANY_PEERS exception when launching on p2.16xlarge on aws #5362

CUDA_TOO_MANY_PEERS exception when launching on p2.16xlarge on aws #5362

alexatknit commented Nov 2, 2016 •

edited

aselle commented Nov 3, 2016

alexatknit commented Nov 3, 2016

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016

benbarsdell commented Nov 3, 2016

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016 •

edited

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016

benbarsdell commented Nov 3, 2016

alexatknit commented Nov 3, 2016

vrv commented Nov 3, 2016

vrv commented Nov 3, 2016

alexatknit commented Nov 3, 2016 •

edited

vrv commented Nov 3, 2016

CUDA_TOO_MANY_PEERS exception when launching on p2.16xlarge on aws #5362

CUDA_TOO_MANY_PEERS exception when launching on p2.16xlarge on aws #5362

Comments

alexatknit commented Nov 2, 2016 • edited

aselle commented Nov 3, 2016

alexatknit commented Nov 3, 2016

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016

benbarsdell commented Nov 3, 2016

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016 • edited

zheng-xq commented Nov 3, 2016

alexatknit commented Nov 3, 2016

benbarsdell commented Nov 3, 2016

alexatknit commented Nov 3, 2016

vrv commented Nov 3, 2016

vrv commented Nov 3, 2016

alexatknit commented Nov 3, 2016 • edited

vrv commented Nov 3, 2016

alexatknit commented Nov 2, 2016 •

edited

alexatknit commented Nov 3, 2016 •

edited

alexatknit commented Nov 3, 2016 •

edited