Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train.py error: InvalidArgumentError (see above for traceback): assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] #2737

Closed
sxr3455 opened this issue Nov 8, 2017 · 19 comments

Comments

@sxr3455
Copy link

sxr3455 commented Nov 8, 2017

Hello, I am trying to perform a training job with the help of the model (specified below) on my own dataset which contains only 2 classes.

System Info and things completed so far:

  1. Ubuntu 16.04 (using python2.7)
  2. models repo fetched from:- https://github.com/tensorflow/models/tree/0375c800c767db2ef070cee1529d8a50f42d1042
  3. tf-nightly:- tf_nightly-1.5.0.dev20171107-cp27-cp27mu-manylinux1_x86_64.whl (md5)
  4. I generated tfrecords using instructions provided in this repo:- https://github.com/balancap/SSD-Tensorflow/tree/master/ since create_pet_record_tf.py throws attribute errors.
  5. followed "running locally" instructions and arranged label and tfrecord files accordingly.
  6. Model used:- faster_rcnn_resnet101_coco
  7. All my images are of high resolution i.e, 1080x1920, created bounding box classes with the help of labelImg tool available here:- https://github.com/tzutalin/labelImg, and each image has more than 2 instances of the classes. **for example, my xmls have two or more instances of <object> s'.
  8. I am currently just attempting to do the training job on cpu since I do not have a gpu. Will switch to cloud machine if using cpu is the concern.

Error Description: (sorry, its a long post)

When I run "python train.py --logtostderr --train_dir=$TRAIN_DIR --pipeline_config_path=$PATH_TO_CONFIG, it gives the following errors:

INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From build/bdist.linux-x86_64/egg/object_detection/trainer.py:210: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1670: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2017-11-08 13:31:27.026402: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO:tensorflow:Restoring parameters from /home/srinath/models/research/object_detection/models/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path object_detection/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

Caused by op u'Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert', defined at:
File "object_detection/train.py", line 165, in
tf.app.run()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 228, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "build/bdist.linux-x86_64/egg/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 167, in _create_losses
losses_dict = detection_model.loss(prediction_dict)
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1305, in loss
groundtruth_masks_list,
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1463, in _loss_box_classifier
groundtruth_boxlists, groundtruth_classes_with_background_list)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 444, in batch_assign_targets
anchors, gt_boxes, gt_class_targets)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 149, in assign
message='Groundtruth boxes and labels have incompatible shapes!')
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 324, in assert_equal
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 112, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 128, in Assert
condition, data, summarize, name="Assert")
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 47, in _assert
name=name)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3073, in create_op
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1524, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]
Traceback (most recent call last):
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 295, in stop_on_exception
yield
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 492, in run
self.run_loop()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 1022, in run_loop
self._sv.global_step])
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

Caused by op u'Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert', defined at:
File "object_detection/train.py", line 165, in
tf.app.run()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 228, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "build/bdist.linux-x86_64/egg/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 167, in _create_losses
losses_dict = detection_model.loss(prediction_dict)
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1305, in loss
groundtruth_masks_list,
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1463, in _loss_box_classifier
groundtruth_boxlists, groundtruth_classes_with_background_list)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 444, in batch_assign_targets
anchors, gt_boxes, gt_class_targets)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 149, in assign
message='Groundtruth boxes and labels have incompatible shapes!')
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 324, in assert_equal
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 112, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 128, in Assert
condition, data, summarize, name="Assert")
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 47, in _assert
name=name)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3073, in create_op
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1524, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

Traceback (most recent call last):
File "object_detection/train.py", line 165, in
tf.app.run()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 332, in train
saver=saver)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 782, in train
ignore_live_threads=ignore_live_threads)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 992, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 820, in stop
ignore_live_threads=ignore_live_threads)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 387, in join
six.reraise(*self._exc_info_to_raise)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 295, in stop_on_exception
yield
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 492, in run
self.run_loop()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 1022, in run_loop
self._sv.global_step])
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]]

Caused by op u'Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert', defined at:
File "object_detection/train.py", line 165, in
tf.app.run()
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 228, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "build/bdist.linux-x86_64/egg/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 167, in _create_losses
losses_dict = detection_model.loss(prediction_dict)
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1305, in loss
groundtruth_masks_list,
File "build/bdist.linux-x86_64/egg/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1463, in _loss_box_classifier
groundtruth_boxlists, groundtruth_classes_with_background_list)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 444, in batch_assign_targets
anchors, gt_boxes, gt_class_targets)
File "build/bdist.linux-x86_64/egg/object_detection/core/target_assigner.py", line 149, in assign
message='Groundtruth boxes and labels have incompatible shapes!')
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 324, in assert_equal
return control_flow_ops.Assert(condition, data, summarize=summarize)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 112, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 128, in Assert
condition, data, summarize, name="Assert")
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 47, in _assert
name=name)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3073, in create_op
op_def=op_def)
File "/home/srinath/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1524, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [Groundtruth boxes and labels have incompatible shapes!] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/strided_slice_1:0) = ] [0] [y (Loss/BoxClassifierLoss/strided_slice_2:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_1/All, Loss/RPNLoss/assert_equal_1/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_2, Loss/BoxClassifierLoss/strided_slice_1, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_4, Loss/RPNLoss/strided_slice)]

I would be glad if someone provides a workaround for the above issue. Ready to provide further details about the errors above. Thanks in advance.

@angerson
Copy link

angerson commented Nov 8, 2017

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

@angerson angerson closed this as completed Nov 8, 2017
@akhiljain100
Copy link

I am having the same problem and couldn't find the question in Stack Overflow. Did you find any solution for it ?

@sxr3455
Copy link
Author

sxr3455 commented Dec 1, 2017

Yes. It worked when I tried with python 3 installation of tensorflow 1.2.

@akhiljain100
Copy link

No, this doesn't fix my problem. I think its related with the shapes of the bounding box and bounding box labels. Did you changed your training tfrecord format?

@sxr3455
Copy link
Author

sxr3455 commented Dec 1, 2017

No, worked with the same old tf record files.

@soumenms2015
Copy link

I am having same problem. I don't think that it is problem with tensorflow version.

@nhorro
Copy link

nhorro commented Dec 28, 2017

EDIT:

in my case, this was causing the problem:

I had multiple objects/bounding boxes in the same example, but only one entry in the classes lists (when it should be one per box).

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),

@abhiishekpal
Copy link

Check if the number of classes and labels count are equal in your generated tf record file.

@artapova
Copy link

artapova commented Jan 15, 2018

Edit structure of tf.train.Example in code of creating tf records works for me.

@soumenms2015
Copy link

@verabeldev : What modification did you do?? Could you please elaborate in details?

@offbye
Copy link

offbye commented Mar 8, 2018

I met this error too. Any ideas to solve it ?

@amanmeetgarg
Copy link

I had the same error as outlined above. my customization for a personal dataset was written in error as the tfrecord file.

It turns out there are multiple versions of dataset_util.py files in the tensorflow model repository.

I followed the explanation here for the object detection task and the major changes are as below.


from object_detection.utils import dataset_util

tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example


with the change above, I recreated the tfrecord files and was able to successfully train the model.

I hope this helps

Best
Aman

@kirk86
Copy link

kirk86 commented Mar 27, 2018

@nhorro thanks for the info I get the same exact error. In my case for each image I have multiple bounding boxes but they all belong to the same class since I have only one class in my dataset. When I check the classes list during the creation of the tfrecord files. I have the following structure:

image1: 10 masks, i.e. 10 bounding boxes, classes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

My classes list contains an entry for each bounding box and still I face the same problem. Should the entries be of different value. Meaning should we assign a different class to each bounding box even thought in our dataset we only have one true class? The documentation here regarding google/tensorfow/models leaves a lot to be desired.

@Abduoit
Copy link

Abduoit commented May 28, 2018

@nhorro
I tried to comment out the following two lines in file create_pet_tf_record.py but I still have same issue !! would u plz clarify your solution.

'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),

@amanmeetgarg
I tried to do modifications in file create_pet_tf_record.py then regenerate TFRecord files, but I failed, still same issue during training process.
Could you please tell me how exactly modify it ??

@Abduoit
Copy link

Abduoit commented May 29, 2018

I had this problem, I solved as follow:

The name of the TFRecords files should be pet_train/val.record. I changed it by editing the faces_only from True to False

check the line here
https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pet_tf_record.py#L49

Then, I regenerated TFRecord files by this

python object_detection/dataset_tools/create_pet_tf_record.py
 --label_map_path=object_detection/data/two_label_map.pbtxt 
--data_dir=`pwd`     --output_dir=`pwd` --include_masks=True

Then, I got two TFRecords files with names pet_train/val.record, then I used them for training process with mask_rcnn_inception_v2_coco

Hope this helps

@guanghuixu
Copy link

Just check your data!!!
I trained the Mask RCNN with my own dataset.
I can train the faster RCNN, but failed in Mask RCNN with the same dataset.
I fixed the bug because I found the mask group-truth can't map the image.
I dropped the image and everything is ok now.

@kulsemig
Copy link

kulsemig commented Sep 3, 2018

Hi @guanghuixu, could you please describe your actions in detail?

@datianshi21
Copy link

datianshi21 commented Nov 21, 2018

nvalidArgumentError (see above for traceback): assertion failed: [predictions must be in [0, 1]] [Condition x <= y did not hold element-wise:x,have no idea

@zarabozdag
Copy link

@nhorro thanks for the info I get the same exact error. In my case for each image I have multiple bounding boxes but they all belong to the same class since I have only one class in my dataset. When I check the classes list during the creation of the tfrecord files. I have the following structure:

image1: 10 masks, i.e. 10 bounding boxes, classes: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

My classes list contains an entry for each bounding box and still I face the same problem. Should the entries be of different value. Meaning should we assign a different class to each bounding box even thought in our dataset we only have one true class? The documentation here regarding google/tensorfow/models leaves a lot to be desired.

@kirk86 my dataset is same as yours and getting this error. Did you solve? if yes, how?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests