Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log. #28

Open
ziyeshanwai opened this issue Dec 29, 2017 · 0 comments

Comments

@ziyeshanwai
Copy link

hello, I have successfully iterated 100 times,the following mistake suddenly appeared, anyone have idea about this?

iter: 0 / 200000, total loss: 18.0627, rpn_loss_cls: 0.9430, rpn_loss_box: 10.3524, loss_cls: 4.4159, loss_box: 2.3514, lr: 0.000500
speed: 3.120s / iter
2017-12-29 13:22:04.715605: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2001 get requests, put_count=1760 evicted_count=1000 eviction_rate=0.568182 and unsatisfied allocation rate=0.670165
2017-12-29 13:22:04.715654: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
image: 025553069_K1210503_T001_1_10.jpg iter: 20 / 200000, total loss: 3.2476, rpn_loss_cls: 0.2447, rpn_loss_box: 2.8628, loss_cls: 0.0771, loss_box: 0.0630, lr: 0.000500
speed: 1.036s / iter
image: 030446539_K1221297_419_1_05.jpg iter: 40 / 200000, total loss: 4.8578, rpn_loss_cls: 0.1838, rpn_loss_box: 3.7386, loss_cls: 0.4705, loss_box: 0.4649, lr: 0.000500
speed: 1.033s / iter
2017-12-29 13:22:46.655261: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3000 get requests, put_count=3099 evicted_count=1000 eviction_rate=0.322685 and unsatisfied allocation rate=0.308
2017-12-29 13:22:46.655287: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
image: 030454261_K1221455_T001_5_04.jpg iter: 60 / 200000, total loss: 2.1451, rpn_loss_cls: 0.1146, rpn_loss_box: 1.7099, loss_cls: 0.2562, loss_box: 0.0643, lr: 0.000500
speed: 0.708s / iter
image: 025913309_K1214499_161_1_07.jpg iter: 80 / 200000, total loss: 2.7814, rpn_loss_cls: 0.2832, rpn_loss_box: 1.0201, loss_cls: 0.6792, loss_box: 0.7989, lr: 0.000500
speed: 1.077s / iter
image: 030141742_K1217637_285_1_28.jpg iter: 100 / 200000, total loss: 2.7696, rpn_loss_cls: 0.2172, rpn_loss_box: 0.8301, loss_cls: 0.7361, loss_box: 0.9862, lr: 0.000500
speed: 0.945s / iter
Traceback (most recent call last):
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "faster_rcnn/../lib/rpn_msr/anchor_target_layer.py", line 151, in anchor_target_layer
argmax_overlaps = overlaps.argmax(axis=1) # (A)
ValueError: attempt to get argmax of an empty sequence
2017-12-29 13:23:49.293478: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
2017-12-29 13:23:49.330989: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
Traceback (most recent call last):
File "faster_rcnn/train_net.py", line 106, in
restore=bool(int(args.restore)))
File "faster_rcnn/../lib/fast_rcnn/train.py", line 407, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "faster_rcnn/../lib/fast_rcnn/train.py", line 261, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Caused by op u'RPN/rpn-data/PyFunc', defined at:
File "faster_rcnn/train_net.py", line 98, in
network = get_network(args.network_name)
File "faster_rcnn/../lib/networks/factory.py", line 22, in get_network
return FPN_train()
File "faster_rcnn/../lib/networks/FPN_train.py", line 25, in init
self.setup()
File "faster_rcnn/../lib/networks/FPN_train.py", line 418, in setup
.anchor_target_layer(_feat_stride[2:], anchor_size[2:], name = 'rpn-data'))
File "faster_rcnn/../lib/networks/network.py", line 34, in layer_decorated
layer_output = op(self, layer_input, *args, **kwargs)
File "faster_rcnn/../lib/networks/network.py", line 380, in anchor_target_layer
[tf.float32,tf.float32,tf.float32,tf.float32])
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/spaci/anaconda3/envs/FPN_py2.7_tf1.1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
[[Node: RPN/rpn-data/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](RPN/rpn_cls_score/BiasAdd/_587, RPN/rpn_cls_score_1/BiasAdd/_589, RPN/rpn_cls_score_2/BiasAdd/_591, RPN/rpn_cls_score_3/BiasAdd/_593, RPN/rpn_cls_score_4/BiasAdd/_595, _recv_gt_boxes_0, _recv_gt_ishard_0, _recv_dontcare_areas_0, _recv_im_info_0, RPN/rpn-data/PyFunc/input_9, RPN/rpn-data/PyFunc/input_10)]]
[[Node: gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10/_663 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_379_gradients/RPN/rpn_cls_score_reshape_reshape_concat_grad/Gather_10", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]]

Command exited with non-zero status 1
167.60user 9.99system 2:55.72elapsed 101%CPU (0avgtext+0avgdata 2398032maxresident)k
0inputs+3640outputs (0major+1128621minor)pagefaults 0swaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant