API ObjectDetection size of input images issues #1876

chenyuZha · 2017-07-06T14:06:43Z

GPU : GeForce GTX 1080 Ti/PCIe/SSE2 (11 GB)
Tensorflow version: 1.1.0
Python version: 2.7.12
Model checkpoint : ssd_mobilenet_v1_coco_11_06_2017

Context :After read the tutorial of ObjectDetection, I converted my own image data sets to tfRecord files by create the *.xml file to each image. As the sizes of images of my own data sets are very large : about width=4000pixels and height=2000pixels for each , the train.tfRecord is about 55G.

Issues: When I began to train with train.py by using the lightest model ssd_mobilenet_v1_coco_11_06_2017, after 4-5 steps, it crashed by error OOM.
The error message is below:

Il seems like that the OOM error happened when allocating tensor with shape[1,2969,3546,3],
As the capacity of my GPU is 11GB, I didn't understand why it causes this problem..

jch1 · 2017-07-06T17:04:26Z

Hi @chenyuZha - These resolutions are problematic because we keep an entire queue of images in memory, not just the batch that you are currently training on. See e.g. the queue_capacity and min_after_dequeue parameters in https://github.com/tensorflow/models/blob/master/object_detection/protos/input_reader.proto

Though note that we typically resize in SSD immediately to 300x300. So given this, it makes sense to just resize your input images to be smaller.

chenyuZha · 2017-07-06T20:43:52Z

@jch1
Thanks for your response !

yeephycho · 2017-08-01T04:08:12Z

Hi @jch1 .
I'm training SSD on my own dataset, the result I think is good but not as good as I expected, I'm trying to finetune the model to increase the performace.
I noticed that SSD resize the image to 300 by 300 using API tf.image.resize_images()
which, according to official document,

Resized images will be distorted if their original aspect ratio is not the same as size.

And I did some experiments, it's true.

So, I changed resize API to resize_image_with_crop_or_pad().
Result is that the LOSS gets very high and very hard to converge.

My question is did you take resize distortion into consideration during the development of the object detection API?
If you did, whether the mismatch of the aspect ratio of original image will affect the final result?

With thanks and regards!

Luonic · 2017-10-30T10:26:46Z

@yeephycho you should first resize image and keep original aspect ratio to e.g. 300px by larger side and then you shoudl pad image to square by resizing it with resize_image_with_crop_or_pad(). Code of aspect ratio preserving resize with TF's ops you can find in magenta repository, where neural style transfer is done

scotthong · 2018-01-04T19:32:43Z

Hi @yeephycho,
I am also trying to train an object detector using my own images with various sizes and aspect ratio. I tried the following two image_resizer configurations in the pipeline.config and a pre-trained model from the model zoo, and it seems that model is not converging with the TotalLoss fluctuating around 4.0. Which image_resizer configuration did you use to make the model converge faster? Or do you pre-scale the dataset and bounding boxed as suggested by @Luonic to train your object detector.

Thanks!

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }

    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 300
        max_dimension: 300
      }
    }

yeephycho · 2018-01-05T01:26:45Z

Hi @scotthong,
I really cannot remember how I solved this problem. But I can provide some experience which may not be correct but works for me.

I used fixed_shape_resizer
Total loss around 4.0 is OK, try to evaluate your model and see how it works with your validation set.
If image is too small after resize, it is hard to detect small objects.

With regards!
Yeephycho

Tsuihao · 2018-01-20T07:45:11Z

Hi all,

Maybe someone can help me with this issue:

#3196

Regards,
Hao

syndec · 2018-02-01T08:56:08Z

Hi @yeephycho @scotthong ,
I am encountering the same problem that the TotalLoss fluctuating around a high value(4.0). And is there anything I can do to avoid small objects miss with high resolution training images?

Regards,
Lee

scotthong · 2018-02-01T15:56:02Z

Hi @syndec
Try to study the network and especially the file "core/preprocessor.py". You should be able to find clues and determine if it is possible to resolve the problem you are facing.

willSapgreen · 2018-04-27T06:08:06Z

Hello @yeephycho and @syndec ,
Is there a definition of "small objects"?
For example, if the ratio of the object's size to the whole image is 0.2, we consider it is a "small object".
I am facing the same issue, the total loss fluctuating around 4.0.
Also, the localization_loss/classification_loss is affected by the number of ground-truth in the image.
For example, the loss increases when the number of ground-truth in an image increases.
Because those ground-truth boxes are small, I wonder if I should exclude them from my training set.
I use the pre-trained SSD-InceptionV2-Coco model to train with my dataset for vehicle detection.
The performance of the model trained with my dataset is worse than the pre-trained SSD-InceptionV2-Coco.

Thank you for precious time on my question.

funkysandman · 2018-05-03T17:14:03Z

I am facing similar issues but only when I use the ssd_random_crop data augmentation option in the pipeline config. My total loss converges much lower when I don't use the crop preprocessing. Also, I'm finding that the position of the object in the image has a huge impact on score - the detector behaves like it has blind spots. My training images are 1920x1200 and I'm using the fixed_shape_resizer to 300x300. My training runs have about 100 images and it converges to a total loss of about 0.26 after 10K steps. When I add the ssd_random_crop the total loss is around 2.5 even after 20K steps. I've tried variations with random_crop_image as well with no luck.
edit: the same training set with ssd_random_crop produced a lower mAP, even with twice as many steps - this doesn't seem right -is there a bug in the cropping algorithm?

willSapgreen · 2018-05-05T20:42:23Z

Hello @funkysandman ,
Thank you for sharing "ssd_random_crop" idea,
unfortunately, it does not resolve my issue.
My total loss still remains around 4.

Would you mind to share what "good data" for SSD-InceptionV2 training in your opinion?

Does "resolution" matter? ( my image from Udacity is kind of blurred )
Will the number of objects per image affect the Box predictor?

Because SSD-InceptionV2 has limitation to detect the small objects,
I think to provide a 300x300 image within the target object should be "big" enough to allow the model learn the features.
So I prepare the image data from Udacity:

300x300( crop the car object from Udacity image dataset )
the object to be classified at least occupy 10% in the image. ( make sure the car is big enough )
Only one object per image.

The total loss is close to 1.0 after 200k iteration.

But after training with this kind of data,
the trained model cannot detect any object in my evaluation set.

I am working on the root cause.
So I wonder what the good data should be for SSD-InceptionV2's training.

P.S.
My Tensflow is 1.8.0 and the Tensorflow Object Detection repo is pulled from 2018.05.03

Thank you.

funkysandman · 2018-05-07T15:20:14Z

Make sure your pipeline.config is matched perfectly to the model you are using. I've accidentally run training with the wrong config file. Maybe share your pipeline.config file? Also, the detection program must be tailored to the specific model as well and the images normalized properly for the detector or you may get weird results

willSapgreen · 2018-05-08T16:34:09Z

Hello @funkysandman ,
I made a mistake that in the label map the item's "name' is capitalized but in my transfer-tfrecord script,
I use the non-capitalized character.
Now I am working on the bad performance on SSD-InceptionV2 now.

cmbowyer13 · 2018-05-10T13:40:13Z

Question. Say all the train and test images from the raw data is of size (512, 7000). Can anyone who works on the object detection api tell me if it is ok to leave the image_resizer set to the 300x300 setting? Or should i change it to the following to match the dimension of my images. I'm not sure what the purpose of that piece of code is. Currently training a model with the setting as:
image_resizer {
fixed_shape_resizer {
height: 512
width: 7000
}
}

willSapgreen · 2018-05-10T22:06:04Z

Hello @cmbowyer13 ,

If you use SSD from the model zoo,

you need either resize your image to 300x300 before feeding to the model or let the model does it for you.

Good luck!

cmbowyer13 · 2018-05-11T02:42:01Z

I don't know what youre saying at willSapgreen. Why can't i do as i am doing or why cannot the model accept arbitrary sized images?

funkysandman · 2018-05-11T14:57:33Z

You can provide any size image for detection. I would not resize it so small that you cannot see the features you're after (this depends on what your objects are). During training the images are resized to 300x300. If the images you are using to train your model lose their features at that resolution, you can increase it to something like 600x600 in the pipeline.config file. This has an effect on training time as more memory is needed. Resizing images at detection time can help speed things up.

cmbowyer13 · 2018-05-11T15:32:45Z

But what does resizing to 300x300 in the config file do to all my images during training which are of size (512x7000)? You're saying it's fine to leave it as 300x300, and then be able to detect any size image I want as long as it contains the same class of objects? And I will need to be detection matrices of raw data in real time. Are there any guidelines for doing realtime object-detection with our trained object detection model? Any good links or references are appreciated. Also, do you know of the best way to convert the model to a C++ format once it has created and performs as I like?

funkysandman · 2018-05-11T15:41:48Z

@cmbowyer13 I would guess you can leave it at 512x7000 for training, as long as you have the memory + gpu to handle it. As long as the model loss converges it should be ok. When it comes to using the trained model, you can give it pictures in their original size I believe. You can resize them if it is too slow.

skhater · 2018-05-17T16:01:13Z

Hi. I have a huge set of images for training that ranges in size from 300x300 to 1184x1410 (different sizes). I'm training a fastRcnn model with the pipline configuration file resizing images to 600x1024

My question is: is it better to resize all the images to 600 x1024 before training using the preprocessing script found in

models/research/object_detection/core/preprocessor.py

Line 1426 in f87a58c

def resize_image(image,

or it's Ok to train with different sizes? and is it too hight to resize the images to 600x1024 when i have small images (300x300)

Also will this resizing affect the bounding boxes in the annotations or they will be resized accordingly?

Thanks

funkysandman · 2018-05-17T16:54:02Z

are you sure it resizes to 600x1024? I thought it was making sure it was resizing so that the longest side is no smaller than 600 and no bigger than 1024 . It the pipeline.config it just says min_dimension=600 max_dimension=1024 without being specific to width or height. So, your 300x300 would be resized to 600x600 and 1184x1410 would be resized to 864x1024. I could be wrong

skhater · 2018-05-17T17:00:43Z

@funkysandman Thanks for your reply, and yes, you are right it's about the min and max dimension.
my question is does this resizing affect the bounding boxes (annotations) accordingly in the training set? or do i have to preprocess them before training?

funkysandman · 2018-05-17T17:09:35Z

I would leave the images as they are, training should respect the bounding boxes that you specify in xml. The bounding boxes are converted to percentages i think so they're not affected by image resize

skhater · 2018-05-17T17:36:36Z

Thanks @funkysandman

Djacob502 · 2018-05-19T15:27:23Z

I'm also questioning if it is better to resize prior to creating the bounding boxes as opposed to using larger images, sometimes much larger images, create the bounding boxes, and then allow the training to resize the photos to 300x300. In my use case, the photos are also resized in the same manner to 300x300 prior to sending them through the model for object detection in production.
Has anyone done any testing to determine which is better? Also if anyone knows of a white paper that discusses this issue please post the link.

funkysandman · 2018-05-19T19:24:11Z

I've tested this and it seems that whatever size I train it on is the size it works best on... when I cropped my large pics down to 300px the model was lousy at detecting in the larger images. I ended up including original and resized images to ensure better detection. I'm still experimenting. I've also found the reducing the batch size can impact the overall training accuracy. I've tried batch sizes of 5,10,12,20,24...5 seems to work pretty good for my data

rajatsc · 2018-05-28T20:14:29Z

I am failing to wrap my head around why resizing would be an issue to begin with unless the object is too small to detect. In all other cases, won't the distortion due to resizing be taken care of because we would be resizing the test image too ?

Djacob502 · 2018-05-28T23:38:15Z

The original pics I'm using are 4,000 x 3,000 approximately. Drawing boundary boxes on such large images and then training works well but the training is very slow and requires a lot of memory. If I resize the same images to 300x300 and then draw the boundary boxes - like funkysandman - I found that the image detection was terrible. I ended up cropping the images to approximately 800x800, some smaller and some larger, that held my objects and then drew the boundary boxes. The objects I am looking for filled most if not all the 800x800 cropped pictures. This works well. However it seems like I need more negative space. I was thinking of including pics with just negative space (no known objects) just to help the training.

CeeBeeEh · 2018-07-14T16:10:18Z

"However it seems like I need more negative space. I was thinking of including pics with just negative space (no known objects) just to help the training."

This is something that I was thinking about also, I would like to know the answer to this.

bharat77s · 2018-08-27T17:43:37Z

i am using labellmg for annotating the images for traffic lights for training on tensorflow using ssd_inception_coco model, how can i add color of the traffic lights while creating the xml file as same option is not available on labellmg. kindly help

MittalNeha · 2018-09-27T08:54:54Z

@bharat77s when you create a rectangle(shortcut: w), a list of labels comes up. you can just add the name of your label there. So basically the color of the traffic light can be a label.
Alternatively, you can add your labels names in data/predefined_classes.txt such that your labels will appear in the list of labels in labelImg.
I hope that answers your question.

bharat77s · 2018-09-28T09:57:57Z

But is this the only way as I have to create thousands of such labels to train my desire model

…

On Thu, 27 Sep 2018, 14:27 MittalNeha, ***@***.***> wrote: @bharat77s <https://github.com/bharat77s> when you create a rectangle(shortcut: w), a list of labels comes up. you can just add the name of your label there. So basically the color of the traffic light can be a label. Alternatively, you can add your labels names in data/predefined_classes.txt such that your labels will appear in the list of labels in labelImg. I hope that answers your question. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1876 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMZzWmvPQzmpW3byfFl13tBDMqGf1G7Mks5ufJMGgaJpZM4OPr2I> .

giridhar13 · 2018-11-27T10:08:09Z

I have a query regarding the image size used for training.I try training SSD_inception_v2 using a train dataset. I have the original images(1280/720)´and cropped images (640/640) in the training set.My models seem not to converge and the losses fluctuate.Should i train the network only on the crops?

hoss24 · 2018-11-30T18:12:58Z

@funkysandman @skhater it is my understanding from reading docs and forums, it resize image to min dimension and then makes sure other dimension is less than max. Ex:If keeping aspect ratio 300x300 would be resized to 600x600 and 1184x1410 would be resized to 600x714

hoss24 · 2018-11-30T18:14:56Z

@giridhar13 Would suggest having your training images as close to the dimension of the images you plan to run through the model. Ex: If you are going to test/utilize cropped images, train on cropped images.

hoss24 · 2018-11-30T18:20:00Z

@bharat77s Yes would do 4 different labels red, yellow, green, off. Make sure you edit label map and config file to reflect changes. If this presents issues my train model to pull out stop lights and then retrain on the cropped images to detect color.

austinmw · 2019-04-22T20:27:43Z

Does anyone know how fixed_shape_resizer interacts with ssd_random_crop? Does it take a random crop of size defined in fixed_shape_resizer?

thusinh1969 · 2019-05-21T19:02:17Z

These SSD512 pre-trained would help you to start with larger images though.
https://github.com/lambdal/lambda-deep-learning-demo
OR (this is what I use...)
https://github.com/balancap/SSD-Tensorflow

intelltech · 2019-09-24T15:50:10Z

From all this I conclude that the original images to be labeled must be in proportion, close to 1: 1 (ssd_mobilenet model ... and their resolutions must be for example: 900x1000 or 1200x1300, etc.).
That's right ?

tensorflowbutler · 2020-01-29T23:35:32Z

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Kmarconi · 2020-07-07T14:37:25Z

Hi everyone, was just wondering if the image resizer param from the config file had an impact only on the training phase or also on the inference phase ? By that I mean that do my model, once trained, will resize each images to 300*300 for example if I use SSD with it's first convolution layers, or not ?

Thanks !

johannesk94 · 2020-12-03T17:36:38Z

Hi everyone, I am facing a problem which could have a similar reason: tensorflow/tensorflow#45148

chenyuZha changed the title ~~API ObjectDetectionmaximum size of input images issues~~ API ObjectDetection size of input images issues Jul 6, 2017

chenyuZha closed this as completed Jul 6, 2017

chenyuZha reopened this Sep 18, 2017

aselle assigned jch1 Sep 20, 2017

aselle added stat:awaiting model gardener Waiting on input from TensorFlow model gardener type:bug Bug in the code labels Sep 20, 2017

tensorflowbutler closed this as completed Feb 7, 2020

API ObjectDetection size of input images issues #1876

API ObjectDetection size of input images issues #1876

Comments

chenyuZha commented Jul 6, 2017 • edited Loading

jch1 commented Jul 6, 2017

chenyuZha commented Jul 6, 2017

yeephycho commented Aug 1, 2017

Luonic commented Oct 30, 2017

scotthong commented Jan 4, 2018

yeephycho commented Jan 5, 2018

Tsuihao commented Jan 20, 2018

syndec commented Feb 1, 2018 • edited Loading

scotthong commented Feb 1, 2018

willSapgreen commented Apr 27, 2018

funkysandman commented May 3, 2018 • edited Loading

willSapgreen commented May 5, 2018

funkysandman commented May 7, 2018

willSapgreen commented May 8, 2018

cmbowyer13 commented May 10, 2018

willSapgreen commented May 10, 2018

cmbowyer13 commented May 11, 2018

funkysandman commented May 11, 2018 • edited Loading

cmbowyer13 commented May 11, 2018 • edited Loading

funkysandman commented May 11, 2018

skhater commented May 17, 2018

funkysandman commented May 17, 2018

skhater commented May 17, 2018

funkysandman commented May 17, 2018

skhater commented May 17, 2018

Djacob502 commented May 19, 2018 • edited Loading

funkysandman commented May 19, 2018

rajatsc commented May 28, 2018

Djacob502 commented May 28, 2018

CeeBeeEh commented Jul 14, 2018

bharat77s commented Aug 27, 2018

MittalNeha commented Sep 27, 2018

bharat77s commented Sep 28, 2018 via email

giridhar13 commented Nov 27, 2018

hoss24 commented Nov 30, 2018

hoss24 commented Nov 30, 2018

hoss24 commented Nov 30, 2018

austinmw commented Apr 22, 2019

thusinh1969 commented May 21, 2019 • edited Loading

intelltech commented Sep 24, 2019

tensorflowbutler commented Jan 29, 2020

Kmarconi commented Jul 7, 2020

johannesk94 commented Dec 3, 2020

chenyuZha commented Jul 6, 2017 •

edited

Loading

syndec commented Feb 1, 2018 •

edited

Loading

funkysandman commented May 3, 2018 •

edited

Loading

funkysandman commented May 11, 2018 •

edited

Loading

cmbowyer13 commented May 11, 2018 •

edited

Loading

Djacob502 commented May 19, 2018 •

edited

Loading

thusinh1969 commented May 21, 2019 •

edited

Loading