Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26

oxFFFF-Q · 2023-08-18T03:11:41Z

Hi,
Firstly, I would like to express my gratitude for open-sourcing the VIMA work. I am very intrigued by it. However, I encountered several issues during my implementation:

Training with PyTorch:
While attempting to implement the training part of VIMA using PyTorch, I've faced issues with the trajectory.pkl file. Specifically, some data types within ['obj_id_to_info'] are either None or functools.partial. These types lead to errors when fed into Data_Loading of Pytorch. May I know how you managed such data?
I observed in other issues that you utilize PyTorch Lightning. If I were to use plain PyTorch for the training phase, applying the same methods for inference to compute predicted actions, followed by loss computation and backpropagation, would this approach be feasible?
RCNN:
Can the used RCNN model only recognize objects that appear in the simulation or the dataset? For instance, is it possible for VIMA to pick up and place an apple? If it cannot, would I need to retrain the model or switch the object recognition model?
VIMA in the Real World:
Have you tried deploying VIMA on a physical robotic arm? I'm venturing into this, and any advice or insights you could offer would be greatly appreciated.
Thank you for your time and looking forward to your response.

Best regards,
Qiao

yunfanjiang · 2023-08-21T02:09:25Z

Hi there, thank you for your interest in our work. To answer your questions:

These types lead to errors when fed into Data_Loading of Pytorch. May I know how you managed such data?

obj_id_to_info meant to provide a comprehensive log of all objects. Therefore it also captures properties such as functions used to initialize textures. However, during our data process only the key of dict obj_id_to_info was used.

If I were to use plain PyTorch for the training phase, applying the same methods for inference to compute predicted actions, followed by loss computation and backpropagation, would this approach be feasible?

Yes it's feasible. Actually PyTorch Lightning merely provides handy wrappers over the pipeline you described.

Mask-RCNN

Since we used domain fine-tuned model, it may be biased towards objects appeared in our dataset. You can either fine-tune or use more advanced object detector such as SAM.

VIMA in the Real World

We conducted some follow-up efforts to deploy on a real UR5e arm with a Robotiq parallel gripper. We used SAM to capture in-the-wild objects. It turns out that it can work to certain extent, although some tricks/heuristics are needed, such as filtering out irrelevant objects by their sizes and rough locations. If necessary, we also retrain with data collected in real workspace's digital twin.

Let me know if you have further questions.

oxFFFF-Q · 2023-08-23T07:22:05Z

Thank you for your response. I have a few more questions:

Regarding the prompt: For instance, given the example 'This is a dax {dragged_obj_1}. This is a zup {base_obj}. Put a dax into a zup.', I'm unclear on how the object names are associated with their respective segm or center positions. Could you explain this further?

Regarding Mask-RCNN: In another issue #13, you provided the model's checkpoints and a link to Detectron2. Could you elaborate on how to load the RCNN model using those checkpoints? Once the pretrained model is loaded, can it directly generate segmentation from an input RGB image? Are there any specific parameters that I should be aware of or set when doing so?

Looking forward to your reply.

yunfanjiang closed this as completed Aug 21, 2023

oxFFFF-Q mentioned this issue Aug 23, 2023

Questions about prompt and Mask-RCNN #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26

Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26

oxFFFF-Q commented Aug 18, 2023

yunfanjiang commented Aug 21, 2023

oxFFFF-Q commented Aug 23, 2023

Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26

Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26

Comments

oxFFFF-Q commented Aug 18, 2023

yunfanjiang commented Aug 21, 2023

oxFFFF-Q commented Aug 23, 2023