You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the section IV.C, when you discuss about Table I, you said:
Next, we compare the results of GIGA-Aff with GIGA. In the pile scenario, the gain from geometry supervision is relatively small (around 2% grasp success rate). However, in the packed scenario, GIGA outperforms GIGA-Aff by a large margin of around 5%. We believe this is due to the different characteristics of these two scenarios. From Figure 3, we can see that in the packed scene, some tall objects standing in the workspace would occlude the objects behind them and the oc�cluded objects are partially visible. We hypothesize that in this case, the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects. Such occlusion is, however, less frequent in pile scenarios.
To summarize, you think pile scenarios has less occlusion than packed scenario.
However, in the same section, when you discuss about Fig 4, you said:
The last two rows show the affordance landscape and top grasps for two pile scenes. We see that baselines without the multi-task training of 3D reconstruction tend to generate failed or no grasp, whereas GIGA produces more diverse and accurate grasps due to the learned geometrically�aware representations.
It feels like you are saying that the reason why we do not have this property in packed scenario is because piled scenario needs geometrically aware feature more, which means it has more occlusions.
I'm very confused, why you have opposite conclusion for this? Also, if GIGA helps to have better grasp prediction in pile senario, like what you said in the anaylse of Fig 4, why the quantitive result in Table 1 does not have a significant improvement for GSR and DR from GIGA-AFF to GIGA in pile senario?
Thanks for your time and contribution again!
The text was updated successfully, but these errors were encountered:
In short, in the packed scenario, more graspable regions are occluded compared with the piled scenario, where occluded regions are usually also not graspable. So the GAP between GIGA and GIGA-Aff is larger in the packed scenario.
Even in the piled scenario, GIGA is still better, although not as much as in the packed scenario. And we assume this is also due to the geometric understanding from multi-task training.
These two points do not contradict each other. Because when we discussed the piled scenario, we didn't say we do not have this property in the packed scenario. On the contrary, as you mentioned, we did say the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects for packed scenario.
Hi,
I have a question about the paper.
In the section IV.C, when you discuss about Table I, you said:
Next, we compare the results of GIGA-Aff with GIGA. In the pile scenario, the gain from geometry supervision is relatively small (around 2% grasp success rate). However, in the packed scenario, GIGA outperforms GIGA-Aff by a large margin of around 5%. We believe this is due to the different characteristics of these two scenarios. From Figure 3, we can see that in the packed scene, some tall objects standing in the workspace would occlude the objects behind them and the oc�cluded objects are partially visible. We hypothesize that in this case, the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects. Such occlusion is, however, less frequent in pile scenarios.
To summarize, you think pile scenarios has less occlusion than packed scenario.
However, in the same section, when you discuss about Fig 4, you said:
The last two rows show the affordance landscape and top grasps for two pile scenes. We see that baselines without the multi-task training of 3D reconstruction tend to generate failed or no grasp, whereas GIGA produces more diverse and accurate grasps due to the learned geometrically�aware representations.
It feels like you are saying that the reason why we do not have this property in packed scenario is because piled scenario needs geometrically aware feature more, which means it has more occlusions.
I'm very confused, why you have opposite conclusion for this? Also, if GIGA helps to have better grasp prediction in pile senario, like what you said in the anaylse of Fig 4, why the quantitive result in Table 1 does not have a significant improvement for GSR and DR from GIGA-AFF to GIGA in pile senario?
Thanks for your time and contribution again!
The text was updated successfully, but these errors were encountered: