Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26

Closed
oxFFFF-Q opened this issue Aug 18, 2023 · 2 comments
Closed

Comments

@oxFFFF-Q
Copy link

Hi,
Firstly, I would like to express my gratitude for open-sourcing the VIMA work. I am very intrigued by it. However, I encountered several issues during my implementation:

  1. Training with PyTorch:
    While attempting to implement the training part of VIMA using PyTorch, I've faced issues with the trajectory.pkl file. Specifically, some data types within ['obj_id_to_info'] are either None or functools.partial. These types lead to errors when fed into Data_Loading of Pytorch. May I know how you managed such data?
    I observed in other issues that you utilize PyTorch Lightning. If I were to use plain PyTorch for the training phase, applying the same methods for inference to compute predicted actions, followed by loss computation and backpropagation, would this approach be feasible?

  2. RCNN:
    Can the used RCNN model only recognize objects that appear in the simulation or the dataset? For instance, is it possible for VIMA to pick up and place an apple? If it cannot, would I need to retrain the model or switch the object recognition model?

  3. VIMA in the Real World:
    Have you tried deploying VIMA on a physical robotic arm? I'm venturing into this, and any advice or insights you could offer would be greatly appreciated.
    Thank you for your time and looking forward to your response.

Best regards,
Qiao

@yunfanjiang
Copy link
Member

Hi there, thank you for your interest in our work. To answer your questions:

These types lead to errors when fed into Data_Loading of Pytorch. May I know how you managed such data?

obj_id_to_info meant to provide a comprehensive log of all objects. Therefore it also captures properties such as functions used to initialize textures. However, during our data process only the key of dict obj_id_to_info was used.

If I were to use plain PyTorch for the training phase, applying the same methods for inference to compute predicted actions, followed by loss computation and backpropagation, would this approach be feasible?

Yes it's feasible. Actually PyTorch Lightning merely provides handy wrappers over the pipeline you described.

Mask-RCNN

Since we used domain fine-tuned model, it may be biased towards objects appeared in our dataset. You can either fine-tune or use more advanced object detector such as SAM.

VIMA in the Real World

We conducted some follow-up efforts to deploy on a real UR5e arm with a Robotiq parallel gripper. We used SAM to capture in-the-wild objects. It turns out that it can work to certain extent, although some tricks/heuristics are needed, such as filtering out irrelevant objects by their sizes and rough locations. If necessary, we also retrain with data collected in real workspace's digital twin.

Let me know if you have further questions.

@oxFFFF-Q
Copy link
Author

Thank you for your response. I have a few more questions:

Regarding the prompt: For instance, given the example 'This is a dax {dragged_obj_1}. This is a zup {base_obj}. Put a dax into a zup.', I'm unclear on how the object names are associated with their respective segm or center positions. Could you explain this further?

Regarding Mask-RCNN: In another issue #13, you provided the model's checkpoints and a link to Detectron2. Could you elaborate on how to load the RCNN model using those checkpoints? Once the pretrained model is loaded, can it directly generate segmentation from an input RGB image? Are there any specific parameters that I should be aware of or set when doing so?

Looking forward to your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants