Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in run eval.py WARNING:root:The following classes have no ground truth examples: 0 #1696

Closed
YanLiang0813 opened this issue Jun 20, 2017 · 36 comments

Comments

@YanLiang0813
Copy link

YanLiang0813 commented Jun 20, 2017

when I running the tensorflow object detection API locally just as https://github.com/tensorflow/models/blob/9c17823e147ff2893427b47cb57d171da9350d20/object_detection/g3doc/running_locally.md suggest, it goes well when I run

$ python object_detection/train.py -logtostderr --pipeline_config_path=object_detection/mymodels/model/faster_rcnn_resnet101_voc07.config --train_dir=object_detection/mymodels/model/train/

and it can train correctly, but when I try to eval,and run

python object_detection/eval.py --logtostderr --pipeline_config_path=object_detection/mymodels/model/faster_rcnn_resnet101_voc07.config --checkpoint_dir=object_detection/mymodels/model/train/ --eval_dir=object_detection/mymodels/model/eval/

it show:
WARNING:root:The following classes have no ground truth examples: 0
/home/yanliang/.conda/envs/tensorflow/models/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)
^CTraceback (most recent call last):
File "object_detection/eval.py", line 162, in
tf.app.run()
File "/home/yanliang/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/eval.py", line 158, in main
FLAGS.checkpoint_dir, FLAGS.eval_dir)
File "/home/yanliang/.conda/envs/tensorflow/models/object_detection/evaluator.py", line 211, in evaluate
save_graph_dir=(eval_dir if eval_config.save_graph else ''))
File "/home/yanliang/.conda/envs/tensorflow/models/object_detection/eval_util.py", line 524, in repeated_checkpoint_run
time.sleep(time_to_next_eval)
KeyboardInterrupt

The dataset I use is pascal_voc_2012, I follow the tutorial as well.
+data
-pascal_label_map.pbtxt
-pascal_train.record
-pascal_voc.record
+models

  • model
    -faster_rcnn_resnet101_voc07.config
    +train
    +eval

Are there any body give me some suggest? thanks!

@YanLiang0813 YanLiang0813 changed the title WARNING:root:The following classes have no ground truth examples: 0 Error in run eval.py WARNING:root:The following classes have no ground truth examples: 0 Jun 20, 2017
@ahmetkucuk
Copy link

I have the same issue.

@YanLiang0813
Copy link
Author

@ahmetkucuk did your training works well? This is partial of my training log:
INFO:tensorflow:Restoring parameters from /home/yanliang/.conda/envs/tensorflow/models/object_detection/mymodels/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path object_detection/mymodels/model/train/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 1: loss = 4.3562 (6.369 sec/step)
2017-06-20 10:50:49.153778: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2383 get requests, put_count=1971 evicted_count=1000 eviction_rate=0.507357 and unsatisfied allocation rate=0.634494
2017-06-20 10:50:49.153983: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:global step 2: loss = 4.5299 (1.051 sec/step)
INFO:tensorflow:global step 3: loss = 4.3959 (0.363 sec/step)
INFO:tensorflow:global step 4: loss = 5.5421 (0.799 sec/step)
INFO:tensorflow:global step 5: loss = 3.9413 (1.042 sec/step)
INFO:tensorflow:global step 6: loss = 3.6625 (0.354 sec/step)
INFO:tensorflow:global step 7: loss = 3.6821 (0.364 sec/step)
INFO:tensorflow:global step 8: loss = 3.4374 (0.355 sec/step)
INFO:tensorflow:global step 9: loss = 3.3901 (0.359 sec/step)
INFO:tensorflow:global step 10: loss = 3.1503 (1.024 sec/step)
INFO:tensorflow:global step 11: loss = 3.2978 (0.360 sec/step)
INFO:tensorflow:global step 12: loss = 2.8448 (1.055 sec/step)
INFO:tensorflow:global step 13: loss = 3.2599 (0.470 sec/step)
INFO:tensorflow:global step 14: loss = 2.5151 (0.359 sec/step)
INFO:tensorflow:global step 15: loss = 2.2614 (0.358 sec/step)
INFO:tensorflow:global step 16: loss = 2.2486 (0.355 sec/step)
INFO:tensorflow:global step 17: loss = 2.2398 (0.810 sec/step)
2017-06-20 10:50:58.253875: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2110 get requests, put_count=2065 evicted_count=1000 eviction_rate=0.484262 and unsatisfied allocation rate=0.506161
2017-06-20 10:50:58.253938: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
INFO:tensorflow:global step 18: loss = 2.1277 (0.360 sec/step)
INFO:tensorflow:global step 19: loss = 2.9921 (0.349 sec/step)
INFO:tensorflow:global step 20: loss = 2.0339 (0.353 sec/step)
INFO:tensorflow:global step 21: loss = 2.6191 (0.347 sec/step)
INFO:tensorflow:global step 22: loss = 3.0585 (0.359 sec/step)
INFO:tensorflow:global step 23: loss = 1.1144 (0.976 sec/step)
INFO:tensorflow:global step 24: loss = 1.7001 (0.382 sec/step)
INFO:tensorflow:global step 25: loss = 1.3169 (0.347 sec/step)
INFO:tensorflow:global step 26: loss = 1.2461 (0.368 sec/step)
INFO:tensorflow:global step 27: loss = 1.9536 (0.370 sec/step)
INFO:tensorflow:global step 28: loss = 1.7631 (0.376 sec/step)
INFO:tensorflow:global step 29: loss = 2.2164 (1.042 sec/step)
INFO:tensorflow:global step 30: loss = 0.9388 (0.353 sec/step)
INFO:tensorflow:global step 31: loss = 2.1595 (0.362 sec/step)
INFO:tensorflow:global step 32: loss = 1.9991 (0.352 sec/step)
INFO:tensorflow:global step 33: loss = 2.1409 (0.365 sec/step)
INFO:tensorflow:global step 34: loss = 3.0498 (0.361 sec/step)
INFO:tensorflow:global step 35: loss = 1.7767 (0.355 sec/step)
INFO:tensorflow:global step 36: loss = 1.3106 (0.354 sec/step)
INFO:tensorflow:global step 37: loss = 1.3067 (0.357 sec/step)
INFO:tensorflow:global step 38: loss = 4.0444 (0.785 sec/step)
INFO:tensorflow:global step 39: loss = 1.9622 (1.082 sec/step)
INFO:tensorflow:global step 40: loss = 2.8836 (1.094 sec/step)
INFO:tensorflow:global step 41: loss = 2.6982 (0.382 sec/step)
INFO:tensorflow:global step 42: loss = 1.6046 (0.359 sec/step)
INFO:tensorflow:global step 43: loss = 1.1759 (1.070 sec/step)
INFO:tensorflow:global step 44: loss = 0.9371 (0.377 sec/step)
INFO:tensorflow:global step 45: loss = 1.4666 (0.377 sec/step)
INFO:tensorflow:global step 46: loss = 2.4793 (1.080 sec/step)
INFO:tensorflow:global step 47: loss = 2.8852 (0.379 sec/step)
INFO:tensorflow:global step 48: loss = 1.8985 (0.380 sec/step)
INFO:tensorflow:global step 49: loss = 1.8162 (0.638 sec/step)
INFO:tensorflow:global step 50: loss = 0.9691 (0.357 sec/step)
INFO:tensorflow:global step 51: loss = 1.2954 (0.437 sec/step)
INFO:tensorflow:global step 52: loss = 2.8442 (0.644 sec/step)

@ahmetkucuk
Copy link

@YanLiang0813 Yes, the total loss decreases gradually in my case as well.

@jaydee713
Copy link

Having the same issue as well!

@YanLiang0813
Copy link
Author

@sguada I really need your help, could'd you give some suggestion on how to solve this problem? Thanks!!!

@derekjchow
Copy link
Contributor

@YanLiang0813 You can ignore the error. The class at index 0 is 'none_of_the_above' for both PASCAL and pet datasets and is a placeholder index. The TFRecords will contain no instances of this placeholder class.

@YanLiang0813
Copy link
Author

YanLiang0813 commented Jun 21, 2017

@derekjchow how to ignore the error, I comment the lines in object_detection_evaluation.py

'The following classes have no ground truth examples: %s',

if (self.num_gt_instances_per_class == 0).any():
  logging.warn(
      'The following classes have no ground truth examples: %s',
      np.squeeze(np.argwhere(self.num_gt_instances_per_class == 0)))

but it doesn't work, the error still exist:

/home/yanliang/.conda/envs/tensorflow/models/object_detection/utils/metrics.py:144: RuntimeWarning: invalid value encountered in true_divide
num_images_correctly_detected_per_class / num_gt_imgs_per_class)

could you give me some suggestion, how can i ignore the error? And are there any one solved it ?

@YanLiang0813
Copy link
Author

@jaydee713 did you solve this problem?

@jaydee713
Copy link

@YanLiang0813 I didn't, decided I would just ignore it since it is just a warning :P doesn't seem to have caused me any problems yet...

@YanLiang0813
Copy link
Author

@jaydee713 Yes, I now know it, we just ignore it and run train.py and eval.py concurrently, so we can see the precision on tensorboard

@KleinYuan
Copy link

@YanLiang0813 but after this warning, the eval.py seems hanging. or it just takes long time??

@YanLiang0813
Copy link
Author

YanLiang0813 commented Jun 25, 2017 via email

@ali01
Copy link

ali01 commented Jun 30, 2017

Looks like this is resolved. This is just a warning that is safe to ignore. Closing this issue.

@ali01 ali01 closed this as completed Jun 30, 2017
@alexiskattan
Copy link

I'm getting this same error. I think it crashes it.

@alexiskattan
Copy link

alexiskattan commented Jun 30, 2017

@ali01 The eval directory is being populated with new tfrecords up until this warning/error comes up. Maybe reopen the issue?

@KleinYuan
Copy link

@alexalemi It's warning and just wait for a it completes. Takes a while. Don't think this will crash the app.

@SriramGS
Copy link

I am encountering the same issue, but mine does not wait but exits after giving traceback. How did you ignore the error(what changes if any)

@YanLiang0813
Copy link
Author

YanLiang0813 commented Jul 20, 2017 via email

@SriramGS
Copy link

Oh, My run does the training successfully, but when i run eval.py, I get the warning and program quits itself, does not continue. Any idea why.

@slandersson
Copy link

Can I label objects with the placeholder class 0, and treat these images as true negatives to improve my model?

@szymonk92
Copy link

@SriramGS Did you solve the problem? I have this same issue

@YanLiang0813
Copy link
Author

YanLiang0813 commented Aug 4, 2017 via email

@szymonk92
Copy link

I made another try with just few iterations it took a minute and I left my computer for 30minutes, nothing happened. I will try again. Thanks!

@SriramGS
Copy link

SriramGS commented Aug 4, 2017

@szymonk92 I was not able to solve it. I am still looking for a solution. Let me know if you find anything.

@DanMossa
Copy link

I have also received this error. I'm waiting to see if it continues after the message

@Abduoit
Copy link

Abduoit commented Aug 22, 2017

Some people in this solve the issue by running train.py and eval.py at the same time. I also have tried this suggestion but it fails, cuz there is no enough memory. However, I have 8 GB GPU memory.

@szymonk92
Copy link

I built TensorFlow from source and I still have this same problem. On both computers. I can see the evaluation results (images) after few seconds but terminal is frozen for an hour.

Any ideas? Can I force close the terminal?

I would like to run training and evaluation at this same time, however my computer (GPU 12GB ) doesn't have enough memory to run them simultaneously using Faster RCNN with Inception v2.

@Abduoit
Copy link

Abduoit commented Sep 1, 2017

@szymonk92

U need to divide your gpu to two parts, 50% for running training and 50% for evaluation.

and don't worry about this warning, see this discussion

Add those lines to the train.py file. The first 2 lines in main...

def main(_):
  gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)  
  sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))  
  assert FLAGS.train_dir, '`train_dir` is missing.'

@szymonk92
Copy link

@Abduoit Thanks for the tip. I tried with 6GB and it seems that I don't have enough memory. I will try again at Monday with 12GB

@Abduoit
Copy link

Abduoit commented Sep 2, 2017

@szymonk92

even if u tried with 6GB, it should allocate 50% of gpu for train.py and the second 50% will be for eval.py.

plz make sure that u add the following lines correctly in file train.py. the two lines should be after def main(_):

def main(_):
  gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)  
  sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

@sidthekid402
Copy link

I have 2 classes in my label_map.pbtxt, yet I get the warning:

The following classes have no ground truth examples: [ 0 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255]

Also, the precision when I evaluate is also always 0 (Precision/mAP@0.5IOU: 0.000000) after 500k training steps. I couldn't find any solutions so far, so any help would be appreciated. Thanks.

@ghost
Copy link

ghost commented Nov 9, 2017

@Abduoit My train.py take 50% but eval.py take almost 100 % of my memory GPU and run out of memory. It is possible to limit the allocation of memory for train.py but how to do it for eval.py ? Thanks.

@PythonImageDeveloper
Copy link

PythonImageDeveloper commented Feb 23, 2018

@YanLiang0813 , what's your GPU ? i can't fine-tune faster_rcn_res101_coco for pascal 2007 with 1080.

@psdas
Copy link

psdas commented Jun 14, 2018

I used transfer learning to detect my own dataset using the ssd_mobilenet_v1_coco_11_06_2017 model.
I trained my model on Google Cloud using its training job through The cloud shell. My training was successful and I exported the model onto my local machine. I decided to run the evaluation using eval.py on my local machine but the eval.py command stuck after this:
image
I have only 3 classes:
Here's my object-detection.pbtxt file:

 {
  id: 1
  name: 'tree'

  id: 2
  name: 'water body'

  id: 3
  name: 'building'
}

Please help.

@psdas
Copy link

psdas commented Jun 15, 2018

Hey, I was able to resolve the error and hence successfully run my model by changing my label pbtxt file (object-detection.pbtxt in my case).
Earlier my file was:

{
  id: 1
  name: 'tree'

  id: 2
  name: 'water body'

  id: 3
  name: 'building'
}

I changed that to:

item {
  id: 1
  name: 'tree'
     }

item {
  id: 2
  name: 'water body'
     }
  
item {
  id: 3
  name: 'building'
     }

@mrainezty
Copy link

l have the same issue, you need to check your .txt file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests