Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between YOLOv5 models #7152

Closed
1 task done
Averen19 opened this issue Mar 26, 2022 · 10 comments
Closed
1 task done

Differences between YOLOv5 models #7152

Averen19 opened this issue Mar 26, 2022 · 10 comments
Labels
question Further information is requested Stale

Comments

@Averen19
Copy link

Search before asking

Question

What is the difference between the YOLOv5s, YOLOv5m, and YOLOv5l? I know that the mAP, number of layers and the depth_multiple and width_multiple in the yolov5.yaml files are different between , but is there any documentation that states what are the differences in layers?
Does the width and depth multiple affect the train.py file?
Im trying to do a research paper on the YOLOv5 models and would like to get any kind of help if possible.

Additional

No response

@Averen19 Averen19 added the question Further information is requested label Mar 26, 2022
@yonghi
Copy link

yonghi commented Mar 26, 2022

you can see the affect of depth_multiple and width_multiple in this function

def parse_model(d, ch): # model_dict, input_channels(3)

@glenn-jocher
Copy link
Member

@Averen19 yes the YOLOv5 models are all compound-scaled variants of the same architecture. I did this following the EfficientDet compound scaling model, minus the image scaling.

@bryanbocao
Copy link

bryanbocao commented Mar 28, 2022

Dear @glenn-jocher,

I am doing similar experiments that also need to vary the model size. I see that what yolov5* models (e.g. yolov5n.yaml, yolov5s.yaml, etc.) differ are depth_multiple and width_multiple for scaling but follow the same architecture with 3 heads. So my question are: (1) is the 3-head architecture the smallest one, or "atomic" block that we can use?; (2) If not, can we use even a smaller model, such as with only one head? Thanks!

@glenn-jocher
Copy link
Member

@bryanbo-cao yes of course. You can modify each model infinitely by removing/adding heads, layers, modules etc. That's the main idea behind the yaml files, to make them easy to modify and view.

@bryanbocao
Copy link

bryanbocao commented Mar 29, 2022

@bryanbo-cao yes of course. You can modify each model infinitely by removing/adding heads, layers, modules etc. That's the main idea behind the yaml files, to make them easy to modify and view.

Yup. Suppose I am trying to design my own architecture in model/custom_yolov5.yaml file, I guess my previous question was that directly deleting some layers caused some problems. For example, when I tried to simplify the architecture by only deleting some layers for p3 directly in the backbone, I got the following bugs and tried to get some helps:

Traceback (most recent call last):
  File "train.py", line 643, in <module>
    main(opt)
  File "train.py", line 539, in main
    train(opt.hyp, opt, device, callbacks)
  File "train.py", line 124, in train
    model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create
  File "/home/<user>/yolov5/models/yolo.py", line 103, in __init__
    self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])  # model, savelist
  File "/home/<user>/yolov5/models/yolo.py", line 291, in parse_model
    args.append([ch[x] for x in f])
  File "/home/<user>/yolov5/models/yolo.py", line 291, in <listcomp>
    args.append([ch[x] for x in f])
IndexError: list index out of range

or

File "/home/<user>/yolov5/models/common.py", line 275, in forward
    return torch.cat(x, self.d)
RuntimeError: Sizes of tensors must match except in dimension 1. Got 32 and 64 in dimension 2 (The offending index is 1)

But later after some investigations of the code and architecture, I realized that the number in the second column [from, number, module, args] in custom_yolov5.yaml means layer number displayed on the leftmost column in the model architecture printed in the command line, and it has to be checked carefully. The main reasons mainly include (1) adding/deleting some layers can change the layer number previously; (2) there are some skip connections between the backbone and head layers in the same scale, specifically in the concatenation layers. Just need to make sure the number refers to the correct one and the dimensions are correct. PS: it might be more clear to use **layer_number**?

                 from  n    params  module                                  arguments
  0                -1  1      1760  models.common.Conv                      [3, 16, 6, 2, 2]
  1                -1  1      4672  models.common.Conv                      [16, 32, 3, 2]
  2                -1  1      4800  models.common.C3                        [32, 32, 1]
  3                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]
  4                -1  2     29184  models.common.C3                        [64, 64, 2]
  5                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  6                -1  3    156928  models.common.C3                        [128, 128, 3]
  7                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  8                -1  1    296448  models.common.C3                        [256, 256, 1]
  9                -1  1    164608  models.common.SPPF                      [256, 256, 5]
 10                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 12           [-1, 6]  1         0  models.common.Concat                    [1]
 13                -1  1     90880  models.common.C3                        [256, 128, 1, False]
 14                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 16           [-1, 4]  1         0  models.common.Concat                    [1]
 17                -1  1     22912  models.common.C3                        [128, 64, 1, False]
 18                -1  1     36992  models.common.Conv                      [64, 64, 3, 2]
 19          [-1, 14]  1         0  models.common.Concat                    [1]
 20                -1  1     74496  models.common.C3                        [128, 128, 1, False]
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1    296448  models.common.C3                        [256, 256, 1, False]
 24      [17, 20, 23]  1      9471  models.yolo.Detect                      [2, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [64, 128, 256]]
Model Summary: 270 layers, 1766623 parameters, 1766623 gradients

Anyway it was fixed. I liked the yaml style that it is very flexible for changing model architecture!

@glenn-jocher
Copy link
Member

@bryanbo-cao yes that's right! You can delete some layers, but be careful that later layers that use skip connections from earlier in the model then must also be updated to the new layer index they are coming from.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 29, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@Robotatron
Copy link

@glenn-jocher
The "X" model uses depth_multiple of 1.33.
Is going higher not recommended? Say depth_multiple: 2?
There is probably a reason YOLO5 configs end with "X" and 1.33 and dont go higher, maybe it does not improve performance that much?

@bryanbocao
Copy link

@glenn-jocher The "X" model uses depth_multiple of 1.33. Is going higher not recommended? Say depth_multiple: 2? There is probably a reason YOLO5 configs end with "X" and 1.33 and dont go higher, maybe it does not improve performance that much?

@Robotatron To me it's just a scaling factor in network depth (# layers). You can try whatever you want YOLOv5 scales from the base network, depending on your customized settings, e.g. use case, hardware constraints of cloud/edge GPU, GPU memory, inference time etc. In general, the gain of detection performance (mAP) will diminish when the network goes deeper.

@Robotatron
Copy link

@bryanbocao
Thanks. Yes, it was my understanding as well. I guess I just have to try different values for the depth and width and see if training time vs mAP bring any benefit if going bigger then the eXtra large config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

5 participants