Do anyone run successfully on a single gpu GTX 1080? I tried it and out of memory. #23

jiafeixiaoye · 2018-05-18T11:22:03Z

I add
tfconfig.gpu_options.per_process_gpu_memory_fraction = 0.05
to let it run, but I got error information like follow:
...
2018-05-18 19:14:25.380430: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 58.69MiB. Current allocation summary follows.
2018-05-18 19:14:25.380546: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (256): Total Chunks: 38, Chunks in use: 37. 9.5KiB allocated for chunks. 9.2KiB in use in bin. 7.6KiB client-requested in use in bin.
...
4] 1 Chunks of size 91656192 totalling 87.41MiB
2018-05-18 19:14:25.404137: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 374.93MiB
2018-05-18 19:14:25.404163: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 425407283
InUse: 393138944
MaxInUse: 393138944
NumAllocs: 1096
MaxAllocSize: 91656192

2018-05-18 19:14:25.404278: W tensorflow/core/common_runtime/bfc_allocator.cc:279] **********************************************************_____******************xxxxxxx
2018-05-18 19:14:25.404328: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_ops.cc:672 : Resource exhausted: OOM when allocating tensor with shape[1,64,400,601] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "test.py", line 244, in
eval_all(args)
File "test.py", line 137, in eval_all
result_dict = inference(func, inputs, data_dict)
File "test.py", line 69, in inference
_, scores, pred_boxes, rois = val_func(feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,64,400,601] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: resnet_v1_101/conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resnet_v1_101/conv1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, resnet_v1_101/conv1/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: resnet_v1_101_5/concat_3/_1133 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2610_resnet_v1_101_5/concat_3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

The text was updated successfully, but these errors were encountered:

nguyeho7 · 2018-05-18T11:32:54Z

Well, if you try to fit the entire network on 0.05 * 8Gb (5%), it can't work. Why not 0.5, as in half of the GPU memory? I have run it on a 1080TI successfully

jiafeixiaoye · 2018-05-21T01:21:56Z

@nguyeho7 thanks for your suggestion, I change it to 0.5 and it runs normally.

wm10240 · 2018-06-21T09:02:19Z

hi @nguyeho7 @jiafeixiaoye I also get the same error, I add the code like @jiafeixiaoye , but it don't work(my GPU is four nvidia 1080ti ), the error like follow and if you have any suggestion, very grateful :

2018-06-21 08:53:49.853249: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats:
Limit:                  5856854016
InUse:                  5832717824
MaxInUse:               5845060608
NumAllocs:                    2163
MaxAllocSize:           1121255424

2018-06-21 08:53:49.853344: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****************************************************************************************************
2018-06-21 08:53:49.853378: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[2,50,50,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
    return fn(*args)
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
    status, run_metadata)
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: tower_4/gradients/tower_4/resnet_v1_101_3/block3/unit_11/bottleneck_v1/conv2/Conv2D_grad/tuple/control_dependency_1/_6953 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_33582_tower_4/gradients/tower_4/resnet_v1_101_3/block3/unit_11/bottleneck_v1/conv2/Conv2D_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 265, in <module>
    train(args)
  File "train.py", line 213, in train
    sess_ret = sess.run(sess2run, feed_dict=feed_dict)
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

jiafeixiaoye · 2018-06-22T00:59:54Z

Hi, @wm10240
do you change the compute capability in all of the make.sh in lib_kernel? GTX 1080 is sm_62, the default value in make.sh is sm_52.

karansomaiah · 2018-11-19T17:00:28Z

Hey @jiafeixiaoye I tried the changes suggested and I still see these errors. Any other suggestions for me?
Also, @wm10240 were you able to resolve the issue?

karansomaiah · 2018-11-19T21:37:38Z

Update:
I tried it on the GTX 1080 Ti. I didn't have to change sm_52 to sm_62.
I was running the training script as it is (not have realized the argument 0-7 is for the 8 GPUs to use.) I have changed it to the number of GPUs I have and now it works perfectly fine.

jiafeixiaoye closed this as completed May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do anyone run successfully on a single gpu GTX 1080? I tried it and out of memory. #23

Do anyone run successfully on a single gpu GTX 1080? I tried it and out of memory. #23

jiafeixiaoye commented May 18, 2018 •

edited

nguyeho7 commented May 18, 2018

jiafeixiaoye commented May 21, 2018

wm10240 commented Jun 21, 2018

jiafeixiaoye commented Jun 22, 2018

karansomaiah commented Nov 19, 2018

karansomaiah commented Nov 19, 2018

Do anyone run successfully on a single gpu GTX 1080? I tried it and out of memory. #23

Do anyone run successfully on a single gpu GTX 1080? I tried it and out of memory. #23

Comments

jiafeixiaoye commented May 18, 2018 • edited

nguyeho7 commented May 18, 2018

jiafeixiaoye commented May 21, 2018

wm10240 commented Jun 21, 2018

jiafeixiaoye commented Jun 22, 2018

karansomaiah commented Nov 19, 2018

karansomaiah commented Nov 19, 2018

jiafeixiaoye commented May 18, 2018 •

edited