Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some confusion in the paper #30

Closed
HongqingThomas opened this issue Sep 19, 2023 · 2 comments
Closed

Some confusion in the paper #30

HongqingThomas opened this issue Sep 19, 2023 · 2 comments

Comments

@HongqingThomas
Copy link

HongqingThomas commented Sep 19, 2023

Hi,

I have a question about the paper.

In the section IV.C, when you discuss about Table I, you said:

Next, we compare the results of GIGA-Aff with GIGA. In the pile scenario, the gain from geometry supervision is relatively small (around 2% grasp success rate). However, in the packed scenario, GIGA outperforms GIGA-Aff by a large margin of around 5%. We believe this is due to the different characteristics of these two scenarios. From Figure 3, we can see that in the packed scene, some tall objects standing in the workspace would occlude the objects behind them and the oc�cluded objects are partially visible. We hypothesize that in this case, the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects. Such occlusion is, however, less frequent in pile scenarios.

To summarize, you think pile scenarios has less occlusion than packed scenario.

However, in the same section, when you discuss about Fig 4, you said:

The last two rows show the affordance landscape and top grasps for two pile scenes. We see that baselines without the multi-task training of 3D reconstruction tend to generate failed or no grasp, whereas GIGA produces more diverse and accurate grasps due to the learned geometrically�aware representations.

It feels like you are saying that the reason why we do not have this property in packed scenario is because piled scenario needs geometrically aware feature more, which means it has more occlusions.

I'm very confused, why you have opposite conclusion for this? Also, if GIGA helps to have better grasp prediction in pile senario, like what you said in the anaylse of Fig 4, why the quantitive result in Table 1 does not have a significant improvement for GSR and DR from GIGA-AFF to GIGA in pile senario?

Thanks for your time and contribution again!

@Steve-Tod
Copy link
Collaborator

In short, in the packed scenario, more graspable regions are occluded compared with the piled scenario, where occluded regions are usually also not graspable. So the GAP between GIGA and GIGA-Aff is larger in the packed scenario.

Even in the piled scenario, GIGA is still better, although not as much as in the packed scenario. And we assume this is also due to the geometric understanding from multi-task training.

These two points do not contradict each other. Because when we discussed the piled scenario, we didn't say we do not have this property in the packed scenario. On the contrary, as you mentioned, we did say the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects for packed scenario.

@HongqingThomas
Copy link
Author

Thanks for your explaination, it's very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants