Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The autoanchor result is not same as anchors in model #7058

Closed
1 of 2 tasks
PonyPC opened this issue Mar 20, 2022 · 17 comments · Fixed by #7060
Closed
1 of 2 tasks

The autoanchor result is not same as anchors in model #7058

PonyPC opened this issue Mar 20, 2022 · 17 comments · Fixed by #7060
Labels
bug Something isn't working

Comments

@PonyPC
Copy link

PonyPC commented Mar 20, 2022

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

The autoanchor result is not same as anchors in model
The console log is:

�[34m�[1mAutoAnchor: �[0mthr=0.25: 1.0000 best possible recall, 8.71 anchors past thr
�[34m�[1mAutoAnchor: �[0mn=9, img_size=640, metric_all=0.571/0.910-mean/best, past_thr=0.583-mean: 25,97, 95,33, 35,95, 45,90, 86,52, 58,82, 73,68, 52,135, 106,81
�[34m�[1mAutoAnchor: �[0mReversing anchor order
�[34m�[1mAutoAnchor: �[0mDone  (optional: update model *.yaml to use these anchors in the future)

But get this from model:

tensor([[[ 18.29688,  17.03125],
         [ 12.88281,  33.62500],
         [ 26.46875,  20.29688]],

        [[ 44.90625,  90.37500],
         [ 85.81250,  51.71875],
         [ 58.37500,  81.87500]],

        [[100.62500, 388.50000],
         [381.25000, 132.25000],
         [139.75000, 379.50000]]], dtype=torch.float16)

Note: only P4 layer same, P3/P5 not.

Environment

  • YOLO: YOLOv5 2022-3-15 torch 1.8.1+cu111 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11264MiB)
  • OS: Win10
  • Python:3.8.5

Minimal Reproducible Example

python train.py --device 0 --img 640 --data mydata.yaml --cfg yolov5n.yaml --hyp hyp.scratch-low.yaml --weights '' --batch-size 64 --workers 8 --epochs 300
All is default value except class count equals 1.

yolov5n.yaml
# Parameters
nc: 1
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.25  # layer channel multiple
anchors: 3

You can valid quickly by set epochs to 1.

Additional

I know autoanchor, and what I means the autoanchor result is not same as the model anchor.
Just the follow picture, two circled anchors should be same from P3 to P5 in trains. But in my case they are not.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@PonyPC PonyPC added the bug Something isn't working label Mar 20, 2022
@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

#6966 @glenn-jocher

@glenn-jocher
Copy link
Member

@PonyPC your console output shows �[34m�[1mAutoAnchor: �[0mReversing anchor order after your anchors are displayed. Therefore your correct anchors are attached to your model. There is no bug.

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 20, 2022

@PonyPC also your example is not reproducible since no one but you has access to your dataset. If you want our team to investigate the issue must be reproducible by us.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

  • Minimal – Use as little code as possible to produce the problem
  • Complete – Provide all parts someone else needs to reproduce the problem
  • Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

  • Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
  • Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

Can I understand the console output anchors is not reliable?
What does Reversing anchor order mean?

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 20, 2022

@PonyPC anchor order is reversed when it is detected that your model outputs are not ordered in the same direction as the anchor sizes. Some model outputs (FPN heads) are ordered large to small, whereas others (i.e. PANet) heads are ordered small to large. Anchors are always evolved sorted small to large, and then they are reversed if model incompatibility is detected.

Either way if you want us to investigate you must supply a fully reproducible example on a common dataset like COCO128, VOC, GlobalWheat, etc. Your example is not reproducible by anyone but yourself since no one knows what mydata.yaml is so there is no action for us to take here.

@glenn-jocher
Copy link
Member

@PonyPC

def check_anchor_order(m):
# Check anchor order against stride order for YOLOv5 Detect() module m, and correct if necessary
a = m.anchors.prod(-1).view(-1) # anchor area
da = a[-1] - a[0] # delta a
ds = m.stride[-1] - m.stride[0] # delta s
if da.sign() != ds.sign(): # same order
LOGGER.info(f'{PREFIX}Reversing anchor order')
m.anchors[:] = m.anchors.flip(0)

@glenn-jocher
Copy link
Member

@PonyPC good news 😃! Your original issue may now be fixed ✅ in PR #7060.

I investigated AutoAnchor behavior when started with --weights and when starting from scratch with --weights '' --cfg yolov5s.yaml and observed that in the second cases check_anchor_order() was running with grid-space anchors (pixel-space divided by stride) rather than pixel-space anchors. This is a silent-error bug (this is my fault) that may have caused some trainings from scratch to accidentally reverse their anchor order, resulting in lower recall and lower mAP. This should all be resolved now in #7060.

To receive this update:

  • Gitgit pull from within your yolov5/ directory or git clone https://github.com/ultralytics/yolov5 again
  • PyTorch Hub – Force-reload model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • Notebooks – View updated notebooks Open In Colab Open In Kaggle
  • Dockersudo docker pull ultralytics/yolov5:latest to update your image Docker Pulls

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

This is my dataset:
dataset_2.zip
To run:
python train.py --device 0 --img 640 --data ../mydata.yaml --cfg ../myyolov5n.yaml --hyp ../hyp.scratch-low.yaml --weights '' --batch-size 64 --workers 8 --epochs 1

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

Glad to see the solution

@glenn-jocher
Copy link
Member

@PonyPC please git pull to update your code and see if #7060 solves the problem.

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

Hi, thank you @glenn-jocher .
I trained again and it still mismatches unfortunately.

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

Only P4 layer are same. Please consider using my dataset to reproduce. I trimmed every picture so don't worry about the training speed.

@glenn-jocher
Copy link
Member

@PonyPC got it. I downloaded your zip, will take a look.

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

@glenn-jocher put yolov5-master in unzipped folder, replace '\' with '/' in check.txt, test.txt, train.txt if you are running under linux. Thanks for taking hours in Sunday.

@glenn-jocher
Copy link
Member

@PonyPC I was able to reproduce the issue on your dataset and implement a fix in #7067. Please git pull and see if this resolves your error.

@PonyPC
Copy link
Author

PonyPC commented Mar 20, 2022

@glenn-jocher It is resolved now, you do a great job. Thanks again, have a nice day.

@glenn-jocher
Copy link
Member

@PonyPC awesome, glad everything is fixed!! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants