Some confusion in the paper #30

HongqingThomas · 2023-09-19T22:13:55Z

Hi,

I have a question about the paper.

In the section IV.C, when you discuss about Table I, you said:

Next, we compare the results of GIGA-Aff with GIGA. In the pile scenario, the gain from geometry supervision is relatively small (around 2% grasp success rate). However, in the packed scenario, GIGA outperforms GIGA-Aff by a large margin of around 5%. We believe this is due to the different characteristics of these two scenarios. From Figure 3, we can see that in the packed scene, some tall objects standing in the workspace would occlude the objects behind them and the oc�cluded objects are partially visible. We hypothesize that in this case, the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects. Such occlusion is, however, less frequent in pile scenarios.

To summarize, you think pile scenarios has less occlusion than packed scenario.

However, in the same section, when you discuss about Fig 4, you said:

The last two rows show the affordance landscape and top grasps for two pile scenes. We see that baselines without the multi-task training of 3D reconstruction tend to generate failed or no grasp, whereas GIGA produces more diverse and accurate grasps due to the learned geometrically�aware representations.

It feels like you are saying that the reason why we do not have this property in packed scenario is because piled scenario needs geometrically aware feature more, which means it has more occlusions.

I'm very confused, why you have opposite conclusion for this? Also, if GIGA helps to have better grasp prediction in pile senario, like what you said in the anaylse of Fig 4, why the quantitive result in Table 1 does not have a significant improvement for GSR and DR from GIGA-AFF to GIGA in pile senario?

Thanks for your time and contribution again!

Steve-Tod · 2023-09-19T23:26:58Z

In short, in the packed scenario, more graspable regions are occluded compared with the piled scenario, where occluded regions are usually also not graspable. So the GAP between GIGA and GIGA-Aff is larger in the packed scenario.

Even in the piled scenario, GIGA is still better, although not as much as in the packed scenario. And we assume this is also due to the geometric understanding from multi-task training.

These two points do not contradict each other. Because when we discussed the piled scenario, we didn't say we do not have this property in the packed scenario. On the contrary, as you mentioned, we did say the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects for packed scenario.

HongqingThomas · 2023-09-19T23:29:17Z

Thanks for your explaination, it's very useful!

Steve-Tod closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some confusion in the paper #30

Some confusion in the paper #30

HongqingThomas commented Sep 19, 2023 •

edited

Loading

Steve-Tod commented Sep 19, 2023

HongqingThomas commented Sep 19, 2023

Some confusion in the paper #30

Some confusion in the paper #30

Comments

HongqingThomas commented Sep 19, 2023 • edited Loading

Steve-Tod commented Sep 19, 2023

HongqingThomas commented Sep 19, 2023

HongqingThomas commented Sep 19, 2023 •

edited

Loading