[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
-
Updated
May 29, 2025 - Python
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"
✨✨Official implementation of BridgeVLA
Official implementation of paper "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning"
Release of code, datasets and model for our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials
official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method
AGI-Elo: How Far Are We From Mastering A Task?
PickAgent: OpenVLA-powered Pick and Place Agent | Gradio&Simulation | Vision Language Action Model
Track 2: Social Navigation
VLAGen: Automated Data Collection for Generalizing Robotic Policies
Add a description, image, and links to the vision-language-action topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-action topic, visit your repo's landing page and select "manage topics."