-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about VIMA Data Loading, RCNN Models, and Real-World Applications #26
Comments
Hi there, thank you for your interest in our work. To answer your questions:
Yes it's feasible. Actually PyTorch Lightning merely provides handy wrappers over the pipeline you described.
Since we used domain fine-tuned model, it may be biased towards objects appeared in our dataset. You can either fine-tune or use more advanced object detector such as SAM.
We conducted some follow-up efforts to deploy on a real UR5e arm with a Robotiq parallel gripper. We used SAM to capture in-the-wild objects. It turns out that it can work to certain extent, although some tricks/heuristics are needed, such as filtering out irrelevant objects by their sizes and rough locations. If necessary, we also retrain with data collected in real workspace's digital twin. Let me know if you have further questions. |
Thank you for your response. I have a few more questions: Regarding the prompt: For instance, given the example 'This is a dax {dragged_obj_1}. This is a zup {base_obj}. Put a dax into a zup.', I'm unclear on how the object names are associated with their respective segm or center positions. Could you explain this further? Regarding Mask-RCNN: In another issue #13, you provided the model's checkpoints and a link to Detectron2. Could you elaborate on how to load the RCNN model using those checkpoints? Once the pretrained model is loaded, can it directly generate segmentation from an input RGB image? Are there any specific parameters that I should be aware of or set when doing so? Looking forward to your reply. |
Hi,
Firstly, I would like to express my gratitude for open-sourcing the VIMA work. I am very intrigued by it. However, I encountered several issues during my implementation:
Training with PyTorch:
While attempting to implement the training part of VIMA using PyTorch, I've faced issues with the trajectory.pkl file. Specifically, some data types within ['obj_id_to_info'] are either None or functools.partial. These types lead to errors when fed into Data_Loading of Pytorch. May I know how you managed such data?
I observed in other issues that you utilize PyTorch Lightning. If I were to use plain PyTorch for the training phase, applying the same methods for inference to compute predicted actions, followed by loss computation and backpropagation, would this approach be feasible?
RCNN:
Can the used RCNN model only recognize objects that appear in the simulation or the dataset? For instance, is it possible for VIMA to pick up and place an apple? If it cannot, would I need to retrain the model or switch the object recognition model?
VIMA in the Real World:
Have you tried deploying VIMA on a physical robotic arm? I'm venturing into this, and any advice or insights you could offer would be greatly appreciated.
Thank you for your time and looking forward to your response.
Best regards,
Qiao
The text was updated successfully, but these errors were encountered: