Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API ObjectDetection size of input images issues #1876

Closed
chenyuZha opened this issue Jul 6, 2017 · 43 comments
Closed

API ObjectDetection size of input images issues #1876

chenyuZha opened this issue Jul 6, 2017 · 43 comments
Assignees
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener type:bug Bug in the code

Comments

@chenyuZha
Copy link

chenyuZha commented Jul 6, 2017

GPU : GeForce GTX 1080 Ti/PCIe/SSE2 (11 GB)
Tensorflow version: 1.1.0
Python version: 2.7.12
Model checkpoint : ssd_mobilenet_v1_coco_11_06_2017

Context :After read the tutorial of ObjectDetection, I converted my own image data sets to tfRecord files by create the *.xml file to each image. As the sizes of images of my own data sets are very large : about width=4000pixels and height=2000pixels for each , the train.tfRecord is about 55G.

Issues: When I began to train with train.py by using the lightest model ssd_mobilenet_v1_coco_11_06_2017, after 4-5 steps, it crashed by error OOM.
The error message is below:
125
Il seems like that the OOM error happened when allocating tensor with shape[1,2969,3546,3],
As the capacity of my GPU is 11GB, I didn't understand why it causes this problem..

@chenyuZha chenyuZha changed the title API ObjectDetectionmaximum size of input images issues API ObjectDetection size of input images issues Jul 6, 2017
@jch1
Copy link
Member

jch1 commented Jul 6, 2017

Hi @chenyuZha - These resolutions are problematic because we keep an entire queue of images in memory, not just the batch that you are currently training on. See e.g. the queue_capacity and min_after_dequeue parameters in https://github.com/tensorflow/models/blob/master/object_detection/protos/input_reader.proto

Though note that we typically resize in SSD immediately to 300x300. So given this, it makes sense to just resize your input images to be smaller.

@chenyuZha
Copy link
Author

@jch1
Thanks for your response !

@yeephycho
Copy link

Hi @jch1 .
I'm training SSD on my own dataset, the result I think is good but not as good as I expected, I'm trying to finetune the model to increase the performace.
I noticed that SSD resize the image to 300 by 300 using API tf.image.resize_images()
which, according to official document,

Resized images will be distorted if their original aspect ratio is not the same as size.

And I did some experiments, it's true.

So, I changed resize API to resize_image_with_crop_or_pad().
Result is that the LOSS gets very high and very hard to converge.

My question is did you take resize distortion into consideration during the development of the object detection API?
If you did, whether the mismatch of the aspect ratio of original image will affect the final result?

With thanks and regards!

@chenyuZha chenyuZha reopened this Sep 18, 2017
@aselle aselle added stat:awaiting model gardener Waiting on input from TensorFlow model gardener type:bug Bug in the code labels Sep 20, 2017
@Luonic
Copy link

Luonic commented Oct 30, 2017

@yeephycho you should first resize image and keep original aspect ratio to e.g. 300px by larger side and then you shoudl pad image to square by resizing it with resize_image_with_crop_or_pad(). Code of aspect ratio preserving resize with TF's ops you can find in magenta repository, where neural style transfer is done

@scotthong
Copy link

Hi @yeephycho,
I am also trying to train an object detector using my own images with various sizes and aspect ratio. I tried the following two image_resizer configurations in the pipeline.config and a pre-trained model from the model zoo, and it seems that model is not converging with the TotalLoss fluctuating around 4.0. Which image_resizer configuration did you use to make the model converge faster? Or do you pre-scale the dataset and bounding boxed as suggested by @Luonic to train your object detector.

Thanks!

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 300
        max_dimension: 300
      }
    }

@yeephycho
Copy link

Hi @scotthong,
I really cannot remember how I solved this problem. But I can provide some experience which may not be correct but works for me.

  1. I used fixed_shape_resizer
  2. Total loss around 4.0 is OK, try to evaluate your model and see how it works with your validation set.
  3. If image is too small after resize, it is hard to detect small objects.

With regards!
Yeephycho

@Tsuihao
Copy link

Tsuihao commented Jan 20, 2018

Hi all,

Maybe someone can help me with this issue:

#3196

Regards,
Hao

@syndec
Copy link

syndec commented Feb 1, 2018

Hi @yeephycho @scotthong ,
I am encountering the same problem that the TotalLoss fluctuating around a high value(4.0). And is there anything I can do to avoid small objects miss with high resolution training images?

Regards,
Lee

@scotthong
Copy link

Hi @syndec
Try to study the network and especially the file "core/preprocessor.py". You should be able to find clues and determine if it is possible to resolve the problem you are facing.

@willSapgreen
Copy link

Hello @yeephycho and @syndec ,
Is there a definition of "small objects"?
For example, if the ratio of the object's size to the whole image is 0.2, we consider it is a "small object".
I am facing the same issue, the total loss fluctuating around 4.0.
Also, the localization_loss/classification_loss is affected by the number of ground-truth in the image.
For example, the loss increases when the number of ground-truth in an image increases.
Because those ground-truth boxes are small, I wonder if I should exclude them from my training set.
I use the pre-trained SSD-InceptionV2-Coco model to train with my dataset for vehicle detection.
The performance of the model trained with my dataset is worse than the pre-trained SSD-InceptionV2-Coco.

Thank you for precious time on my question.

@funkysandman
Copy link

funkysandman commented May 3, 2018

I am facing similar issues but only when I use the ssd_random_crop data augmentation option in the pipeline config. My total loss converges much lower when I don't use the crop preprocessing. Also, I'm finding that the position of the object in the image has a huge impact on score - the detector behaves like it has blind spots. My training images are 1920x1200 and I'm using the fixed_shape_resizer to 300x300. My training runs have about 100 images and it converges to a total loss of about 0.26 after 10K steps. When I add the ssd_random_crop the total loss is around 2.5 even after 20K steps. I've tried variations with random_crop_image as well with no luck.
edit: the same training set with ssd_random_crop produced a lower mAP, even with twice as many steps - this doesn't seem right -is there a bug in the cropping algorithm?

@willSapgreen
Copy link

Hello @funkysandman ,
Thank you for sharing "ssd_random_crop" idea,
unfortunately, it does not resolve my issue.
My total loss still remains around 4.

Would you mind to share what "good data" for SSD-InceptionV2 training in your opinion?

  1. Does "resolution" matter? ( my image from Udacity is kind of blurred )
  2. Will the number of objects per image affect the Box predictor?

Because SSD-InceptionV2 has limitation to detect the small objects,
I think to provide a 300x300 image within the target object should be "big" enough to allow the model learn the features.
So I prepare the image data from Udacity:

  1. 300x300( crop the car object from Udacity image dataset )
  2. the object to be classified at least occupy 10% in the image. ( make sure the car is big enough )
  3. Only one object per image.

The total loss is close to 1.0 after 200k iteration.

But after training with this kind of data,
the trained model cannot detect any object in my evaluation set.

I am working on the root cause.
So I wonder what the good data should be for SSD-InceptionV2's training.

P.S.
My Tensflow is 1.8.0 and the Tensorflow Object Detection repo is pulled from 2018.05.03

Thank you.

@funkysandman
Copy link

Make sure your pipeline.config is matched perfectly to the model you are using. I've accidentally run training with the wrong config file. Maybe share your pipeline.config file? Also, the detection program must be tailored to the specific model as well and the images normalized properly for the detector or you may get weird results

@willSapgreen
Copy link

Hello @funkysandman ,
I made a mistake that in the label map the item's "name' is capitalized but in my transfer-tfrecord script,
I use the non-capitalized character.
Now I am working on the bad performance on SSD-InceptionV2 now.

@cmbowyer13
Copy link

Question. Say all the train and test images from the raw data is of size (512, 7000). Can anyone who works on the object detection api tell me if it is ok to leave the image_resizer set to the 300x300 setting? Or should i change it to the following to match the dimension of my images. I'm not sure what the purpose of that piece of code is. Currently training a model with the setting as:
image_resizer {
fixed_shape_resizer {
height: 512
width: 7000
}
}

@willSapgreen
Copy link

Hello @cmbowyer13 ,

If you use SSD from the model zoo,

you need either resize your image to 300x300 before feeding to the model or let the model does it for you.

Good luck!

@cmbowyer13
Copy link

I don't know what youre saying at willSapgreen. Why can't i do as i am doing or why cannot the model accept arbitrary sized images?

@funkysandman
Copy link

funkysandman commented May 11, 2018

You can provide any size image for detection. I would not resize it so small that you cannot see the features you're after (this depends on what your objects are). During training the images are resized to 300x300. If the images you are using to train your model lose their features at that resolution, you can increase it to something like 600x600 in the pipeline.config file. This has an effect on training time as more memory is needed. Resizing images at detection time can help speed things up.

@cmbowyer13
Copy link

cmbowyer13 commented May 11, 2018

But what does resizing to 300x300 in the config file do to all my images during training which are of size (512x7000)? You're saying it's fine to leave it as 300x300, and then be able to detect any size image I want as long as it contains the same class of objects? And I will need to be detection matrices of raw data in real time. Are there any guidelines for doing realtime object-detection with our trained object detection model? Any good links or references are appreciated. Also, do you know of the best way to convert the model to a C++ format once it has created and performs as I like?

@funkysandman
Copy link

@cmbowyer13 I would guess you can leave it at 512x7000 for training, as long as you have the memory + gpu to handle it. As long as the model loss converges it should be ok. When it comes to using the trained model, you can give it pictures in their original size I believe. You can resize them if it is too slow.

@skhater
Copy link

skhater commented May 17, 2018

Hi. I have a huge set of images for training that ranges in size from 300x300 to 1184x1410 (different sizes). I'm training a fastRcnn model with the pipline configuration file resizing images to 600x1024

My question is: is it better to resize all the images to 600 x1024 before training using the preprocessing script found in

def resize_image(image,
or it's Ok to train with different sizes? and is it too hight to resize the images to 600x1024 when i have small images (300x300)

Also will this resizing affect the bounding boxes in the annotations or they will be resized accordingly?

Thanks

@funkysandman
Copy link

are you sure it resizes to 600x1024? I thought it was making sure it was resizing so that the longest side is no smaller than 600 and no bigger than 1024 . It the pipeline.config it just says min_dimension=600 max_dimension=1024 without being specific to width or height. So, your 300x300 would be resized to 600x600 and 1184x1410 would be resized to 864x1024. I could be wrong

@skhater
Copy link

skhater commented May 17, 2018

@funkysandman Thanks for your reply, and yes, you are right it's about the min and max dimension.
my question is does this resizing affect the bounding boxes (annotations) accordingly in the training set? or do i have to preprocess them before training?

@funkysandman
Copy link

I would leave the images as they are, training should respect the bounding boxes that you specify in xml. The bounding boxes are converted to percentages i think so they're not affected by image resize

@skhater
Copy link

skhater commented May 17, 2018

Thanks @funkysandman

@Djacob502
Copy link

Djacob502 commented May 19, 2018

I'm also questioning if it is better to resize prior to creating the bounding boxes as opposed to using larger images, sometimes much larger images, create the bounding boxes, and then allow the training to resize the photos to 300x300. In my use case, the photos are also resized in the same manner to 300x300 prior to sending them through the model for object detection in production.
Has anyone done any testing to determine which is better? Also if anyone knows of a white paper that discusses this issue please post the link.

@funkysandman
Copy link

I've tested this and it seems that whatever size I train it on is the size it works best on... when I cropped my large pics down to 300px the model was lousy at detecting in the larger images. I ended up including original and resized images to ensure better detection. I'm still experimenting. I've also found the reducing the batch size can impact the overall training accuracy. I've tried batch sizes of 5,10,12,20,24...5 seems to work pretty good for my data

@rajatsc
Copy link

rajatsc commented May 28, 2018

I am failing to wrap my head around why resizing would be an issue to begin with unless the object is too small to detect. In all other cases, won't the distortion due to resizing be taken care of because we would be resizing the test image too ?

@Djacob502
Copy link

The original pics I'm using are 4,000 x 3,000 approximately. Drawing boundary boxes on such large images and then training works well but the training is very slow and requires a lot of memory. If I resize the same images to 300x300 and then draw the boundary boxes - like funkysandman - I found that the image detection was terrible. I ended up cropping the images to approximately 800x800, some smaller and some larger, that held my objects and then drew the boundary boxes. The objects I am looking for filled most if not all the 800x800 cropped pictures. This works well. However it seems like I need more negative space. I was thinking of including pics with just negative space (no known objects) just to help the training.

@CeeBeeEh
Copy link

"However it seems like I need more negative space. I was thinking of including pics with just negative space (no known objects) just to help the training."

This is something that I was thinking about also, I would like to know the answer to this.

@bharat77s
Copy link

i am using labellmg for annotating the images for traffic lights for training on tensorflow using ssd_inception_coco model, how can i add color of the traffic lights while creating the xml file as same option is not available on labellmg. kindly help

@MittalNeha
Copy link

@bharat77s when you create a rectangle(shortcut: w), a list of labels comes up. you can just add the name of your label there. So basically the color of the traffic light can be a label.
Alternatively, you can add your labels names in data/predefined_classes.txt such that your labels will appear in the list of labels in labelImg.
I hope that answers your question.

@bharat77s
Copy link

bharat77s commented Sep 28, 2018 via email

@giridhar13
Copy link

I have a query regarding the image size used for training.I try training SSD_inception_v2 using a train dataset. I have the original images(1280/720)´and cropped images (640/640) in the training set.My models seem not to converge and the losses fluctuate.Should i train the network only on the crops?

@hoss24
Copy link

hoss24 commented Nov 30, 2018

@funkysandman @skhater it is my understanding from reading docs and forums, it resize image to min dimension and then makes sure other dimension is less than max. Ex:If keeping aspect ratio 300x300 would be resized to 600x600 and 1184x1410 would be resized to 600x714

@hoss24
Copy link

hoss24 commented Nov 30, 2018

@giridhar13 Would suggest having your training images as close to the dimension of the images you plan to run through the model. Ex: If you are going to test/utilize cropped images, train on cropped images.

@hoss24
Copy link

hoss24 commented Nov 30, 2018

@bharat77s Yes would do 4 different labels red, yellow, green, off. Make sure you edit label map and config file to reflect changes. If this presents issues my train model to pull out stop lights and then retrain on the cropped images to detect color.

@austinmw
Copy link

Does anyone know how fixed_shape_resizer interacts with ssd_random_crop? Does it take a random crop of size defined in fixed_shape_resizer?

@thusinh1969
Copy link

thusinh1969 commented May 21, 2019

These SSD512 pre-trained would help you to start with larger images though.
https://github.com/lambdal/lambda-deep-learning-demo
OR (this is what I use...)
https://github.com/balancap/SSD-Tensorflow

@intelltech
Copy link

From all this I conclude that the original images to be labeled must be in proportion, close to 1: 1 (ssd_mobilenet model ... and their resolutions must be for example: 900x1000 or 1200x1300, etc.).
That's right ?

@tensorflowbutler
Copy link
Member

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

@Kmarconi
Copy link

Kmarconi commented Jul 7, 2020

Hi everyone, was just wondering if the image resizer param from the config file had an impact only on the training phase or also on the inference phase ? By that I mean that do my model, once trained, will resize each images to 300*300 for example if I use SSD with it's first convolution layers, or not ?

Thanks !

@johannesk94
Copy link

Hi everyone, I am facing a problem which could have a similar reason: tensorflow/tensorflow#45148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting model gardener Waiting on input from TensorFlow model gardener type:bug Bug in the code
Projects
None yet
Development

No branches or pull requests