How to fuse the results of different scales? #37

liqikai9 · 2022-01-26T18:23:25Z

Thanks for your great work. And I wonder during the inference, how you combine the results of 4 different output grids? Will there be some special fusion? Looking forward to your reply.

wmcnally · 2022-01-26T18:36:03Z

Thanks! There’s no grid fusion. Just like in YOLO, all the grids are passed to the NMS function. Each grid point represents a unique prediction. Larger objects are usually detected in the smaller grids.

liqikai9 · 2022-01-26T18:40:34Z

I see. And did you do the ablation study on these scales? How does the number of different scales affect the results?

wmcnally · 2022-01-26T18:43:25Z

Personally, I did not. But I asked the same question in ultralytics/yolov5, and I was informed that an evolutionary algorithm was used to find the optimal configuration. However, the optimization was performed for object detection, not human pose estimation.

liqikai9 · 2022-01-26T18:44:27Z

Larger objects are usually detected in the smaller grids.

I also wonder why these conclusions are derived as you stated in your paper: The receptive field of an output grid increases with s, so smaller output grids are better suited for detecting larger objects.

wmcnally · 2022-01-26T18:50:39Z

I also wonder why these conclusions are derived as you stated in your paper: The receptive field of an output grid increases with s, so smaller output grids are better suited for detecting larger objects.

If you’re asking why the receptive field increases: that’s how CNNs work - the deeper you go, the larger the effective receptive field. If you’re asking why larger receptive fields are better for large objects: it’s because larger receptive fields can “see” a larger portion of the input image. The anchor boxes were also defined such that larger anchors are used with smaller grids. Again, none of this has been optimized for human pose estimation specifically, just object detection .

liqikai9 · 2022-01-26T18:59:22Z

I see. I didn't know much detail about yolov5 before as well as that the anchor boxes were also defined such that larger anchors are used with smaller grids.

Could you explain the meaning of the following lines? How does the [19,27, 44,40, 38,94] represent exactly?

kapao/models/yolov5l6.yaml

Lines 7 to 11 in f7bd62d

    
           anchors: 
        
             - [19,27,  44,40,  38,94]  # P3/8 
        
             - [96,68,  86,152,  180,137]  # P4/16 
        
             - [140,301,  303,264,  238,542]  # P5/32 
        
             - [436,615,  739,380,  925,792]  # P6/64

Again, Thanks for your quick reply!

wmcnally · 2022-01-26T19:03:25Z

Those are the width, height (in pixels) of the three anchor boxes used with the largest output grid (1/8th the size of the original image).

liqikai9 · 2022-01-26T19:07:45Z

I see. And I wonder where is the center point of the anchor boxes? Is it the center of the grid by default?

Those are the width, height (in pixels) of the three anchor boxes used with the largest output grid (1/8th the size of the original image).

wmcnally · 2022-01-26T19:25:03Z

No each grid point predicts an offset for the center of a box! I suggest going through the paper section 3 in more detail to get a better understanding!

liqikai9 · 2022-01-26T19:31:35Z

No each grid point predicts an offset for the center of a box! I suggest going through the paper section 3 in more detail to get a better understanding!

I will go through this part in detail. Thanks.

liqikai9 closed this as completed Jan 26, 2022

liqikai9 reopened this Jan 26, 2022

liqikai9 closed this as completed Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fuse the results of different scales? #37

How to fuse the results of different scales? #37

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022 •

edited

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022

liqikai9 commented Jan 26, 2022 •

edited

wmcnally commented Jan 26, 2022 •

edited

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022

liqikai9 commented Jan 26, 2022

How to fuse the results of different scales? #37

How to fuse the results of different scales? #37

Comments

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022 • edited

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022

liqikai9 commented Jan 26, 2022 • edited

wmcnally commented Jan 26, 2022 • edited

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022

liqikai9 commented Jan 26, 2022

wmcnally commented Jan 26, 2022 •

edited

liqikai9 commented Jan 26, 2022 •

edited

wmcnally commented Jan 26, 2022 •

edited