Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance better on smaller image sizes (compared to training size) on some images #9294

Closed
1 task done
agentmorris opened this issue Sep 5, 2022 · 18 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@agentmorris
Copy link

Search before asking

Question

I have a YOLOv5x6 model trained to detect animals, trained at the default YOLOv5x6 size of 1280. YOLOv5 has been amazing for this task, and on nearly every dataset we've tried, YOLOv5 approaches human recall.

However, we've come across one slightly pathological dataset that by every visual metric should be completely in-domain, and in fact images from the same dataset are included in the model's training data. But some images in this dataset that are visually unremarkable are giving wild results, and I was hoping that the symptoms on this dataset might remind someone of a familiar issue and point to a way to at least detect this pathology, if not fix it.

As per below, I can't precisely replicate the issue on the base YOLOv5x6 weights, but I may be able to replicate a proxy for this issue.

Our weights are publicly available (link), although I don't expect anyone to debug our trained weights; I'm only hoping the symptoms are familiar to either the YOLOv5 developers or other users.

All images I'm using for debugging are public, and links are included.

Specifically, on an image like this one, we essentially never see the trained model miss. This is the result from the YOLO5x6 base model on this image, for example:

S6_B06_B06_R1_S6_B06_R1_IMAG0109

Ignore the classification, of course, but the boxes are clearly correct. Our trained model also does fine on this image.

But on maybe 1 out of 20 images in this dataset, like this image, our trained model puts boxes in inexplicable places; I've never seen anything like this in literally millions of images that I've reviewed from other datasets:

SER_S11_C01_C01_R1_SER_S11_C01_R1_IMAG1384

It's not that there are never false detections, there are... but not in a random place in the image like that, and not with false negatives like that. Though it's hard to verify this precisely, the boxes appear to be in the right aspect ratio for a nearby object, just mysteriously shifted; this is generally the case for all of these pathological images.

But... if we run this image at a size of 640, rather than 1280, everything works great:

SER_S11_C01_C01_R1_SER_S11_C01_R1_IMAG1384

This is true of every one of these pathological images we've come across. Reminder that the training size is in fact 1280.

Here's another example (original image); again, it's hard not to notice that the aspect ratio of the box matches that of a nearby object:

SER_S11_C01_C01_R1_SER_S11_C01_R1_IMAG1330

And again, running this at a size of 640 fixes things:

SER_S11_C01_C01_R1_SER_S11_C01_R1_IMAG1330

I mentioned that I can't precisely replicate this issue on the base YOLOv5x6 weights, but I can say that other measures of performance on the pathological images are definitely worse at size=1280 than at size=640 with the base weights. E.g. here's another pathological image (original image) run through the base weights at 1280:

S9_L08_L08_R2_S9_L08_R2_IMAG1372

...and at 640:

S9_L08_L08_R2_S9_L08_R2_IMAG1372

It's hard to say whether the missed animals and lower confidences at 1280 in this case are actually related to the "spurious boxes" issue we see on the trained model, or whether this is just a coincidence, but it does seem to be the case that performance with the base weights is better at 640 on all of the images that create spurious boxes with our trained model.

All of these images are natively >2000 pixels on the long size, so both the 1280 and 640 cases involve resizing.

My first thought was some kind of image corruption, so I re-saved to various formats, and took a literal screenshot of the image to rule out anything related to the image container. Moving the same pixels to a different container had no effect.

My second thought was some kind of issue in the resizing code, so I visually inspected pixels after resizing, and everything looks fine. I also swapped out the resizing code for every OSS Python image resizing library I could find; no effect.

My third thought was some kind of high-frequency content that is aliasing at the high resolution to create "ghost objects" that are not visible to humans, although this would only explain the spurious boxes, not the failure to detect the actual object. But this should also go away with a Gaussian blur with radius "a few pixels", and this didn't do the trick either.

I'm stumped!

The good news is that in the grand scheme of things, this impacts a vanishingly small number of images, and in this one case where it seems to be an issue, it's an issue on a large enough number of images to be easily detectable. I can't quite pin it down to a particular camera, but all signs point to something about a subset of these cameras being pathological. But in any case, I'd like to be able to better characterize where this occurs.

Has anyone seen any cases like this where on certain images either (a) detections come up in spurious places on an otherwise-well-performing model, or (b) a model performs better at half the training resolution than at the training resolution?

Thanks!

Additional

No response

@agentmorris agentmorris added the question Further information is requested label Sep 5, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2022

👋 Hello @agentmorris, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 6, 2022

@agentmorris since the loss function assigns IoU to objectness, confidence will take time to increase on detections, so I would simply train longer and/or gather more data in instances when this occurs.

As confidence threshold reduces to zero more FPs will appear, therefore if you prioritize Precision over Recall as your comments suggest then simply increasing your confidence threshold will eliminate these FPs (since they have low confidence compared to the TPs), just as in any other dataset/model.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 6, 2022

@agentmorris also of course make sure you've reviewed Tips for Best Training Results:

Tutorials

Good luck 🍀 and let us know if you have any other questions!

@agentmorris
Copy link
Author

Thanks for the quick reply... I'm almost 100% sure there's something else at work here beyond just typical accuracy issues; we have gobs of training data from similar images, we have no issues with accuracy or confidence on millions of similar images, and it's only this particular subset of images (which are well-represented in training) that yields this very odd pathology.

But if the specific symptoms (related to shifted boxes on the training size that disappear at a smaller image size) aren't ringing any bells for you, I may just have to live with being stumped. :) I'll close this issue, but if anyone else has experienced anything like this, let me know! Thanks.

@glenn-jocher
Copy link
Member

@agentmorris the only thing I can think of is that if you are only interested in large objects or if your dataset only contains large objects, then small-object output layers like P3 may cause issues if their anchors have been evolved with AutoAnchor, which does not factor in a layer's visual field size when computing anchors.

But since you do have small objects in your images then I assume your dataset has very small objects too with anchors evolved to those and everything should be fine.

@agentmorris
Copy link
Author

Good idea, but you're right; we have a pretty wide distribution of object sizes, ranging from just a few pixels to literally the whole image. And the objects in these examples are right in the middle of the size distribution.

But object size as a variable of interest is a good note to take here... I'm going to try to collect as many of these pathological cases as I can to see if anything reliably separates them from visually similar non-pathological cases, and based on your suggestion, I'll track object size as part of this analysis. Thanks!

@yeldarby
Copy link

yeldarby commented Sep 7, 2022

This looks suspiciously like an EXIF orientation issue where the data being passed into the model is transposed 90 degrees from what you’re visually seeing on the screen (and often from your labels if the annotation software you used was unaware of this issue).

I’ve most commonly see it happen on landscape mode photos from phones but in theory any JPEG could exhibit this behavior.

Perhaps the resize step is also re-encoding the pixels into a logical x/y coordinate system and that’s why it’s not seen on other sizes?

@agentmorris
Copy link
Author

Thanks for the suggestion, @yeldarby . An EXIF metadata issue (either rotation, scaling, or just corruption) was my first intuition also, but if I save to another format and/or literally screenshot the pixels, removing any possibility of EXIF issues, I get the same results. I'm almost positive now that it's a very esoteric (and still-unidentified) issue with the pixels themselves, i.e. something unique about the camera(s). Still a mystery!

@ameicler
Copy link

ameicler commented Sep 8, 2022

Something silly, but did you make sure you passed the argument --imgsz (to 1280 or 640) when running the inference (and during training also)?

@agentmorris
Copy link
Author

@ameicler Yes, the model is trained at 1280, and the above scenarios represent two --imgsz arguments when performing inference: 640 (working) and 1280 (not working). That's the strange part!

@valentinitnelav
Copy link

valentinitnelav commented Sep 28, 2022

@agentmorris, this issue seems closed, but did you solve the problem? I just saw your post in the "AI for Conservation" slack channel. I also just encountered EXIF orientation issues when using the VGG VIA annotator to create the bounding boxes. We have phone cameras taking images of pollinators and depending on how the phone was set, the width can be bigger that the height of the captured images or vice-versa.

I am curious to know how you solved the issue. One approach seems to be to delete the EXIF metadata from the images before annotating, but this didn't do the trick for the VGG VIA annotator. I read with PIL or CV2 the image width and height but they can report sometimes flipped values which made me even more confused.

Is YOLO reading image width and height using CV2 or PIL? I see them both used in the pipeline and it is too complex for me to follow all the code (I am rather new to Python). For example, in YOLOv5 I saw that the function exif_size defined in utils/dataloaders.py.
There I saw that the rotation of a image is read with rotation = dict(img._getexif().items())[orientation], where orientation gets values from:

from PIL import ExifTags, Image, ImageOps
...
# Get orientation exif tag
for orientation in ExifTags.TAGS.keys():
    if ExifTags.TAGS[orientation] == 'Orientation':
        break

Facebook's detectron2 deals with reading the orientation like this: orientation = exif.get(_EXIF_ORIENT) , where _EXIF_ORIENT = 274 the exif 'Orientation' tag. See detectron2's detection_utils.py. This seems also based on PIL (from PIL import Image).

I am now confused about how to tackle this, so I am curious if you found a solution.

@agentmorris
Copy link
Author

@valentinitnelav No, I haven't solved the problem. I'm about 99.999% sure it's not an EXIF orientation problem. Here's the evidence against this being an EXIF orientation problem:

  • Saving to other formats, or literally screenshotting the image and saving to a new image, does not fix the problem.
  • The main symptom here isn't just "weird boxes", it's "weird boxes at the model's native resolution (1280x1280) that completely resolve when running the model at 640x640", which doesn't represent a change in the image itself, the image loader, the EXIF data, etc.

What I'm doing now is running the entire affected dataset (Snapshot Serengeti) through the model at both resolutions. This will take a few more days, then I think I'll be able to use some simple heuristics to figure out which images are susceptible to this issue, then I'll be able to go back and ask whether it was, for example, limited to specific camera serial #'s. Will let this thread know if I find anything!

@agentmorris
Copy link
Author

Question for @glenn-jocher: when I specify --imgsz 1280 (or --imgsz 1280 1280) to detect.py, is it the expected behavior that the per-image printouts show that every image is processed at 1280x1280?

For the images discussed on this thread, I generally see inference happening at 1280x960:

(3, 960, 1280)960x1280 2 animals, 59.2ms

I expected that the specified image size would be the image size at which the model is run (the --help text just calls it "inference size"), and in fact - in the pathological first image I show above - if I go into augmentations.py and change:

im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)

...to:

im = cv2.resize(im, (1280,1280), interpolation=cv2.INTER_LINEAR)

I get the correct results (boxes on the animals, instead of floating in space).

That of course may just be a coincidence, and I don't really think there's a bug here, but I think there may be a fundamental misunderstanding on my part (maybe for other users as well) about how --imgsz works, which could lead to different expectations at training and inference time.

With all the other parameters at default values, is there a description of the difference between the "inference size" as describe in the --help text, and the actual inference size I see in the detect.py printouts?

And do you think it means anything that the model produces correct output when I hard-code the size that gets passed to cv2.resize()?

Thanks.

@agentmorris agentmorris reopened this Oct 10, 2022
@glenn-jocher
Copy link
Member

@agentmorris for reproducible bugs please submit a bug report with code to reproduce.

--imgsz specifies the long side for dynamic models like pytorch (short side solved for automatically), or the square shape for fixed size models like CoreML, TensorRT etc.

@agentmorris
Copy link
Author

Thanks... everything is working as expected then, I think it must just be an odd coincidence that my (incorrect) mental model assumed that --imgsz 1280 1280 resulted in inference on a square image, and hard-coding this assumption results in correct inference in this case. This is an entirely plausible coincidence.

I also went back through the history of this function to see if possibly at the time we trained this model (in mid-2021), --imgsz behaved differently than it does now (i.e. yielding square inference), which could also explain this discrepancy, but I couldn't find any evidence of this. If that rings any bells, let me know. But I don't see evidence that this is the case, so I think this is just a coincidence that forcing square inference happens to fix some of these images that are showing pathological behavior at the training size.

I will go back and see whether hard-coding this in fact fixes inference for all of the pathological images (this was only a small test), which would likely still be a coincidence, but may be useful for anyone else encountering similar edge cases.

Closing again, thanks.

@glenn-jocher
Copy link
Member

@agentmorris if you really want to force square pytorch inference you should just be able to set auto=False in the detect.py dataloaders.

@agentmorris
Copy link
Author

Following up on this... sometimes the simplest explanation is the best, and it turns out that in this case, the issue was good-old-fashioned label noise. Bad boxes snuck in to the training data for a very small number of backgrounds, specifically boxes were not scaled properly on some images, giving a "correct" box placement that looks a lot like the ones in these images (the right number of boxes in the right general vicinity as the objects). So when other images from those same backgrounds were run through the trained model, the model appears to have (very reasonably) figured out that "oh, this is the background where nothing makes sense and boxes aren't supposed to be on animals, so I'll just memorize what to do here". The fact that it works at a reduced size was mostly a red herring; my best explanation here is that the very specific features the model was able to memorize in this rare case look different enough after resizing that it no longer goes into "special-case mode".

tl;dr: YOLOv5 did exactly what it should have done given problematic training data.

@glenn-jocher
Copy link
Member

@agentmorris, thank you for the update and for sharing your findings. Label noise can indeed lead to unexpected model behavior, and it's great to hear that you were able to identify the root cause. Your experience serves as a valuable reminder of the importance of clean and accurate training data. If you have any further questions or issues, feel free to reach out. Happy training! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants