Skip to content

How to increase number of anchors? #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
universewill opened this issue Sep 28, 2020 · 22 comments
Closed

How to increase number of anchors? #1057

universewill opened this issue Sep 28, 2020 · 22 comments
Labels
question Further information is requested

Comments

@universewill
Copy link

I trained yolov5 on a custom dataset. However, the result turns out bad on long and thick targets like 213x5 or 6x194.
I want to try to increase the anchor number to improve the detect result.
How to increase anchors number?

@universewill universewill added the question Further information is requested label Sep 28, 2020
@Aktcob
Copy link

Aktcob commented Sep 30, 2020

simply add anchors in here: https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml#L8

pls try it before open issue.

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 4, 2020

@universewill yes just do what @Aktcob recommends. Only constraint is that each output layer must have same number of anchors. You can populate with an integer as well to use autoanchor, i.e. in your yaml you can just write:

anchors: 10

to force 10 auto-computed anchors per output layer (30 total).

@glenn-jocher
Copy link
Member

One note is that extreme aspect ratios like your 200x5 example will probably not work very well, as the convolution kernels are for the most part squares rather than rectangles.

@HaolyShiit
Copy link

One note is that extreme aspect ratios like your 200x5 example will probably not work very well, as the convolution kernels are for the most part squares rather than rectangles.

After training, where are the auto-computed anchors saved?

@glenn-jocher
Copy link
Member

Anchors are stored as Detect() layer parameters.

m = model.model[-1]  # Detect()
m.anchors  # in stride units
m.anchor_grid  # in pixel units

print(m.anchor_grid.view(-1,2))
tensor([[ 10.,  13.],
        [ 16.,  30.],
        [ 33.,  23.],
        [ 30.,  61.],
        [ 62.,  45.],
        [ 59., 119.],
        [116.,  90.],
        [156., 198.],
        [373., 326.]])

@xinxin342
Copy link

@glenn-jocher Could you tell me more about it? I have set hpy.scratch.yaml [anchors]=5, but it didn't create any new anchors.
Where should I add the code you wrote? thankyou!

@glenn-jocher
Copy link
Member

@xinxin342

# anchors: 0 # anchors per output grid (0 to ignore)

@xinxin342
Copy link

@glenn-jocher I mean that where should I add these codes : m = model.model[-1] print(m.anchor_grid.view(-1,2))

@glenn-jocher
Copy link
Member

@xinxin342 you can use this code anywhere you want, only you can answer that question.

model = torch.load('yolov5s.pt')['model']

m = model.model[-1]  # Detect()
m.anchors  # in stride units
m.anchor_grid  # in pixel units

print(m.anchor_grid.view(-1,2))
tensor([[ 10.,  13.],
        [ 16.,  30.],
        [ 33.,  23.],
        [ 30.,  61.],
        [ 62.,  45.],
        [ 59., 119.],
        [116.,  90.],
        [156., 198.],
        [373., 326.]])

@HaolyShiit
Copy link

Anchors are stored as Detect() layer parameters.

m = model.model[-1]  # Detect()
m.anchors  # in stride units
m.anchor_grid  # in pixel units

print(m.anchor_grid.view(-1,2))
tensor([[ 10.,  13.],
        [ 16.,  30.],
        [ 33.,  23.],
        [ 30.,  61.],
        [ 62.,  45.],
        [ 59., 119.],
        [116.,  90.],
        [156., 198.],
        [373., 326.]])

Thank you !

@alicera
Copy link

alicera commented Oct 20, 2020

Hi
Does the detect.py use the anchors ?
If it does, where can I find it ?
I cant find the code, I only know the yolov5s.yaml

@glenn-jocher
Copy link
Member

@alicera this is answered above: #1057 (comment)

@Edwardmark
Copy link

@glenn-jocher what does the anchor pixel value mean? Does it mean the width and height of the anchor?

@glenn-jocher
Copy link
Member

Yes

@aidevmin
Copy link

aidevmin commented Nov 6, 2023

Anchors are stored as Detect() layer parameters.

m = model.model[-1]  # Detect()
m.anchors  # in stride units
m.anchor_grid  # in pixel units

print(m.anchor_grid.view(-1,2))
tensor([[ 10.,  13.],
        [ 16.,  30.],
        [ 33.,  23.],
        [ 30.,  61.],
        [ 62.,  45.],
        [ 59., 119.],
        [116.,  90.],
        [156., 198.],
        [373., 326.]])

@glenn-jocher
We define anchor boxes in this file.

anchors:

anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

I know that at the neck we have 3 feature map of neck:

80x80x256 # P3/8
40x40x512 #P4/16
20x20x1024 #P5/32

How to I map from anchor boxes to feature map of neck? I mean that the location of anchor boxes on feature map. Thanks.

@glenn-jocher
Copy link
Member

@aidevmin the mapping of anchor boxes to the feature map is determined by the anchor stride and anchor scale. In YOLOv5, the anchor stride is defined in the architecture file (yolov5l.yaml in this case) and specifies the stride or downsampling factor of each feature map. The anchor scale is defined in the same file and specifies the width and height of each anchor box in pixel units.

To calculate the position of anchor boxes on the feature map, you can divide the spatial dimensions of the feature map by the anchor stride. This will give you the grid size of the anchor boxes on the feature map. The x and y coordinates of an anchor box within this grid can then be multiplied by the anchor stride to obtain the pixel coordinates on the original image.

For example, for the P3/8 feature map (80x80x256) with an anchor stride of 8, the anchor grid size would be 10x10. Each anchor box in this feature map would span a 10x10 grid cell. To find the pixel coordinates of an anchor box within this grid, you can multiply the x and y coordinates by the anchor stride (8).

I hope this clarifies the mapping of anchor boxes to the feature map. Let me know if you have any further questions.

@aidevmin
Copy link

aidevmin commented Nov 6, 2023

@glenn-jocher
Thanks for quick respone. I have read your comment, but I dont't understand clear. I need more times to investigate. I will inform you later.
I have one more question.
We have anchor boxes

anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

We have 3 neck feature maps with input size = 640

80x80x256 # P3/8
40x40x512 #P4/16
20x20x1024 #P5/32

How can we specify ground-truth mask of objects corespond 3 above feature map?
Let me give an example to clarify my question:

  • I have 1 image with size w = 400, h=500. There is 2 objects in image with location:
    • Object 1: x_left = 200, y_left=300, w=20, h=30
    • Object 2: x_left = 220, y_left=270, w=34, h=65
      How I can specify location of object 1 and object 2 in 3 above feature maps based on anchor boxes (map location of object from original image to location of object on neck feature map)? I am confusing about that 1 feature map is repsonsible for one object or 3 feature maps can be responsible for one object. The reason that I want to specify ground-truth mask on neck feature map is applying ground-truth mask for KD (knowledge distillation).

Sorry for my not good English.

@glenn-jocher
Copy link
Member

@aidevmin the anchor boxes in YOLOv5 are used to detect objects at different scales and aspect ratios. The anchor boxes are assigned to the grid cells of the corresponding feature maps based on their spatial location and size. Each feature map is responsible for detecting objects of certain scales.

To specify the ground-truth masks of objects on the feature maps, you need to map the location and size of each object from the original image to the corresponding grid cells on the feature maps.

In your example, you have two objects in an image with specific locations and sizes. To map these objects to the feature maps, you can follow these steps:

  1. Compute the center coordinates (cx, cy) of each object by adding half of its width and height to its top-left coordinates.
  2. Calculate the relative coordinates (rx, ry) of the object centers within the grid cell of each feature map. Divide the absolute coordinates (cx, cy) by the width and height of the respective feature map cells (e.g., 80x80 for P3/8).
  3. Assign each object to the grid cell on the feature map that contains its center coordinates, using the relative coordinates obtained in the previous step.
  4. Calculate the relative width and height (rw, rh) of each object by dividing its width and height by the total width and height of the feature map cells.

By performing these steps, you can assign the ground-truth masks of the objects to the corresponding grid cells on the feature maps. Each feature map will be responsible for detecting objects of certain scales and aspect ratios.

Please note that in YOLO models, each grid cell can be responsible for detecting multiple objects. Therefore, the three feature maps (P3/8, P4/16, P5/32) can collectively be responsible for detecting all the objects present in the image.

I hope this clarifies the process of specifying ground-truth masks on the feature maps. If you have further questions or need more clarification, please let me know.

@aidevmin
Copy link

aidevmin commented Nov 6, 2023

@glenn-jocher
Thanks.
Let assume I have image with size (640, 640). There is one object in the image with information of the object: x_topleft=160, y_topleft=200, w=24, h=40. Follow your instruction:
1: Compute the center coordinates (cx, cy) of each object by adding half of its width and height to its top-left coordinates.
I have (cx, cy) = (172, 220)
2. Calculate the relative coordinates (rx, ry) of the object centers within the grid cell of each feature map. Divide the absolute coordinates (cx, cy) by the width and height of the respective feature map cells (e.g., 80x80 for P3/8). You mean rx = cx / 80 = 2.15; ry = cy/80 = 2.75
3. Assign each object to the grid cell on the feature map that contains its center coordinates, using the relative coordinates obtained in the previous step. With rx=2.15 and ry=2.75, so center of objet will be at grid cell (2, 2)
4. Calculate the relative width and height (rw, rh) of each object by dividing its width and height by the total width and height of the feature map cells. rw = w/80 = 24/80 = 0.3 ry = 40/80 = 0.5

I think, your calculation is not correct. This is my thought:

  1. Compute the center coordinates (cx, cy) of each object by adding half of its width and height to its top-left coordinates.
    I have (cx, cy) = (172, 220)
  2. Calculate the relative coordinates (rx, ry) of the object centers within the grid cell of each feature map. Divide the absolute coordinates (cx, cy) by the anchor stride (e.g., 80x80 for P3/8 - anchor stride = 8). rx = cx/8 = 172/8 = 21.5; ry = cy/80 = 220/8 = 27.5
  3. Assign each object to the grid cell on the feature map that contains its center coordinates, using the relative coordinates obtained in the previous step. With rx=21.5 and ry=27.5, so center of objet will be at grid cell (21, 27)
  4. Calculate the relative width and height (rw, rh) of each object by dividing its width and height by the anchor stride: rw = w/80 = 24/8 = 3 ry = 40/8 = 5

So that the mask of object in the feature map 80x80 has center as (21, 27) and width=3, height=5. How do you think about my thought @glenn-jocher?

I have one more questions. The defined anchor boxes in the configs

anchors:
is for input size = 640? Is that right?

@glenn-jocher
Copy link
Member

@aidevmin hi,

Regarding the calculation of the object's location on the feature map, your understanding is correct. The relative coordinates should be computed by dividing the object's center coordinates by the anchor stride, not by the size of the feature map cells. So, your calculations of rx=21.5 and ry=27.5 as the object's center location on the feature map are accurate.

For the relative width and height, dividing by the anchor stride (8, in this case) is also correct. So, rw=3 and rh=5 would represent the object's dimensions on the feature map.

Regarding your second question, the defined anchor boxes in the YAML configuration file you mentioned (yolov5m.yaml) are designed for an input image size of 640x640. These anchor boxes are specific to the YOLOv5m model and are chosen based on the target object scales and aspect ratios. Keep in mind that these anchor boxes work best for images of size 640x640. If you use a different input image size, the anchor boxes may not be optimal. Adjusting the anchor boxes according to your specific input size may be necessary for better performance.

I hope this clarifies your doubts. If you have any further questions or need additional information, feel free to ask.

@aidevmin
Copy link

aidevmin commented Nov 8, 2023

@glenn-jocher
Thank you so much.

@glenn-jocher
Copy link
Member

@aidevmin thank you for reaching out. We appreciate your interest in YOLOv5. The anchor boxes specified in the configuration file (yolov5m.yaml) are indeed optimized for an input size of 640x640. These anchor boxes are carefully chosen to best capture the scales and aspect ratios of target objects for this specific input size. However, the performance of the anchor boxes may vary for different input image sizes. It is recommended to adjust the anchor boxes according to your specific input size for optimal results.

If you have any further questions or need additional assistance, feel free to ask. We are here to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

8 participants