Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard does not display more then 100 bounding boxes #30464

Closed
wkoziej opened this issue Jul 7, 2019 · 9 comments
Closed

Tensorboard does not display more then 100 bounding boxes #30464

wkoziej opened this issue Jul 7, 2019 · 9 comments
Assignees
Labels
comp:model Model related issues comp:tensorboard Tensorboard related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 1.12 Issues related to TF 1.12 type:support Support issues

Comments

@wkoziej
Copy link

wkoziej commented Jul 7, 2019

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No. (I've prepared pipeline.config, classes.bptxt and one tfrecord
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.01 (Linux smok 4.15.0-45-generic Object Detection #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux)
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below):
    v1.12.0-0-ga6d8ffae09 1.12.0
    (I've also tried b'v1.12.0-6120-gdaab2673f2' 1.13.0-dev20190116 and results are the same)
  • Python version: 3.5.2
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:10.0
  • GPU model and memory: GTX 1080Ti, RTX 2080

Describe the current behavior
I run training

python object_detection/model_main.py --pipeline_config_path=/mnt/data/environments/bag/100bboxes/pipeline.config --model_dir=/mnt/data/environments/bag/100bboxes/exp

This is only one step training and its goal is to visualize bounding boxes in tensorboard images tam.
On the "IMAGES" tab in tensorboard I don't see all bounding boxes which are in the tfrecord file.
Options provided in the pipeline.config

num_visualizations: 200                                                                                                                   
max_num_boxes_to_visualize: 200

have no effect on this.

Describe the expected behavior
I see all bounding boxes (ground thruth) in the tensorboard>images (pictures on the right)

Code to reproduce the issue
I've created repo where you can find

  • pipeline.config - one step training, the same tfrecord for training and evaluation
  • classes.pbtxt - two sample classes
  • t.tfrecord - containg two white images: one with 100 bounding boxes and second with 200 bounding boxes
  • notebook-images-view.png - view from notebook when I've tried visualise bboxes
  • tensorboard-images-view.png - tensorboard images view
@ravikyram ravikyram self-assigned this Jul 8, 2019
@ravikyram ravikyram added comp:tensorboard Tensorboard related issues TF 1.12 Issues related to TF 1.12 type:support Support issues labels Jul 8, 2019
@ravikyram
Copy link
Contributor

For faster resolution please post the issue on TF-tensorboard
repository.Thanks!

@stephanwlee
Copy link
Contributor

@ravikyram I believe this is an issue with the tensorflow/models/research/object_detection. I tried to debug the issue but I do lack a lot of context here. Can you please reassign it to the person from the owners listed here? Thanks!

@ravikyram ravikyram added the comp:model Model related issues label Jul 9, 2019
@ravikyram ravikyram assigned jvishnuvardhan and unassigned ravikyram Jul 9, 2019
@jvishnuvardhan jvishnuvardhan added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Jul 9, 2019
@wkoziej
Copy link
Author

wkoziej commented Jul 10, 2019

Today I made also simple experiments.

  1. In first experiment I prepared 100 images with less then 100 objects on every and labeled it automaticaly. It was training set. Similarly I prepared tfrecord with 30 images for evaluation
  2. In second experiment I prepared 100 images with more then 150 objects on every and labeled it automaticaly. It was training set. Similarly I prepared tfrecord with 30 images for evaluation

I trained faster rcnn (resnet101) (10000 steps) on each set.
On this image Here you can see comparison of mAP for first (red one) and second (rose one) experiment. I'm almost sure that there is also problem with training/evaluation.

On second image Here you can see sample images from tensorboard images tab for set with more then 150 objects and on third here for set with less then 100.

Will it helps if I upload somewhere this tfrecords and pipelines (its about 50M) ?

@wkoziej
Copy link
Author

wkoziej commented Jul 31, 2019

Hi,

I've made a config and tfrecord's for this issue on my drive You can download it, change paths in pipeline.config and run training (run.sh - set CUDA_VISIBLE_DEVICES and paths).

Can I help you in some way to solve this issue?

@wkoziej
Copy link
Author

wkoziej commented Jul 31, 2019

I've tried tensorpack lib and succesfully got results for image with more then 100 bboxes

@k-lyda
Copy link

k-lyda commented Aug 8, 2019

I’ve struggle the same issue and done some investigation around this topis. It looks like the issue is only in the evaluation step of a training process.

Experiment 1 - training on pictures with less than 100 objects

I’ve done an experiment where training set had only images with less than 100 objects. I’ve done the training, which went fine - mAP around 0.82.

Then, when I was exporting the model for inference, I’ve changed in pipeline.config, in section second_stage_post_processing { batch_non_max_suppression { }}value of parameter max_detections_per_class to 300.

I’ve checked the predictions for a photo where there was over 100 objects in the picture. The number of predictions was correct - it marked all objects.

Experiment 2 - training on pictures with more than 100 objects

I’ve then done the same experiment, but with a training set having some of the images with over 100 objects, and parameter max_detections_per_class set to 300 from the beginning. The final mAP was around a 0.3 - much less than in the first experiment. But, the training loss curves looked similar. This means for me, that the training process is going fine, only the evaluation step has some issues. 

Metrics

Legend:
Orange - evaluation, images with less than 100 objects
Red - evaluation, images with more than 100 objects

Gray - training, images with less than 100 objects
Blue - training, images with more than 100 objects

Evaluation mAP
Screenshot 2019-08-08 at 08 48 08

Evaluation loss
Screenshot 2019-08-08 at 08 48 33

Training loss
Screenshot 2019-08-08 at 08 48 28

Evaluator Issue

I’ve checked evaluator sorce code. The COCO evaluator has a fixed value of 100 maximum detections (link to source code). This causes, that mAP and other metrics calculations are incorrect, because in the evaluation dataset there are examples, where in Ground Truth photos there is over 100 objects, and during evaluation only 100 detections are done.

Any ideas how to tackle the problem with maxDets limit in COCO evaluator?

I've already tried setting the maxDets parameter to [1, 10, 300] for box_evaluator (after this line), but this caused, that mAP was calculated as -1.000, so something were not working fine.

@lejekjr
Copy link

lejekjr commented Oct 1, 2019

tensorflow/models#5465

@tensorflowbutler
Copy link
Member

Hi There,

We are checking to see if you still need help on this, as you are using an older version of tensorflow which is officially considered end of life . We recommend that you upgrade to the latest 2.x version and let us know if the issue still persists in newer versions. Please open a new issue for any help you need against 2.x, and we will get you the right help.

This issue will be closed automatically 7 days from now. If you still need help with this issue, please provide us with more information.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:model Model related issues comp:tensorboard Tensorboard related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 1.12 Issues related to TF 1.12 type:support Support issues
Projects
None yet
Development

No branches or pull requests

8 participants